Rambling on FSAs in Computer Science

(Note: the stuff here is just for your entertainment, at no additional charge. You will not be responsible for it on any test. This gets more boring the more you read; feel free to stop anytime.)

Finite State Automata (automata is the plural of automaton) are useful in several areas of computer science. For instance, they can theoretically be used to describe the operation of a computer system like runner. Think of every possible configuration of the memory (variables) in your program as being a different state. Then your program is simply the labeled arrows telling the computer where to go on a particular input. However, this is not a very practical way of thinking of your program. Say you have 5 int variables; these take up 20 bytes, or 160 bits of storage (not to mention the memory runner uses to keep track of controlling your program). Thus, there are at least 2 to the 320 power different states the memory can be in. This is 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976 different states; listing them all would take longer than we have until the heat death of the universe.

A more practical example is the lexical analyzer in the C compiler. This is program that reads in your C program and figures out which symbols in it are identifiers, keywords, operators, preprocessor directives, and so forth. An FSA like that would have states that you could write yourself. For instance, in the initial state, receiving a '#' takes us to the "preprocessor directive" set of states. Receiving a space from the initial state should leave us in the initial state since a leading space is ignored in C, and so forth.

The particular kind of FSA you will be writing for this program is called a Deterministic FSA (DFA), because there is only one possible place to go at each step on a given input. Another kind of FSA is the Nondeterministic FSA (NFA). This is where there may be more than one arrow with the same label leading out of a state. The NFA is thought of as following both (or all) possible arrows out of a state for a given input. If, at the end of input, any of the computations has led to the final state, then the string is accepted; otherwise it is rejected. This is a more powerful device in the sense that it can recognize more sets of strings using fewer states than a DFA. However, for any NFA, there exists an equivalent DFA (with possibly exponentially more states) that accepts the same set of strings. Imagine writing a C program to implement an NFA; how would you do that?

The sets of strings that devices like FSAs can accept are called formal languages. For instance, the set of strings you make by repeating "ab" any number of times is one formal language, and it can be recognized (i.e., all the strings in it accepted, and all the strings not in it rejected) by an FSA. Can you draw the FSA that recognizes this language? Some formal languages can't be recognizes by FSAs. For example, consider the language with n a's followed by n b's, where n is a non-negative integer, e.g., "ab", "aabb", "aaabbb", and so forth. Try to design an FSA to recognize this language; you can't do it. The languages that FSAs can recognize are called regular languages; there are other kinds of languages recognized by other kinds of devices you may learn about later. The C language itself can be recognized by a device called a context-free grammar; there is a particular context-free grammar that will accept only strings that are syntactically correct C programs and reject all others. C programs are capable of recognizing a set of languages called the Turing-decidable languages, named after the famous mathematician Alan Turing. Computer scientists have speculated that Turing-decidable languages represent all the languages we could possibly want to do anything with, from the written and spoken natural languages of the world to the artificial language "the set of all strings that represent encodings of legal moves in the game of Chess." This is called the Church-Turing thesis, named after Turing and the mathematician Alonzo Church. The jury is still out on whether this is actually the case.

Recall the relationship of the DFA to the NFA. An NFA can always be reduced to a DFA with possibly exponentially more states. Is the same true of C programs? Instead of talking about the number of states in a C program, which is for all intents infinite, let's talk about the time it takes to run a C program. Now consider a different computer language called Super C. It, like the NFA, can choose to go both places after it has made a decision. For instance, the Super C statement:

        if (input == 'a') either
		try statement 1
	or
		try statement 2
would split the program off into two different worlds, one where the result of the decision is to try statement 1 (whatever that may mean), the other where the result is to try statement 2. If either world ended up the the "final state," the language is accepted. Is there a way to convert a Super C program into an ordinary C program? Yes, you just try both alternatives one at a time. However, the number of alternatives grows exponentially as we encounter these kind of either/or statements, as does the time it takes to execute these alternatives. Is there a way to convert a Super C program to ordinary C without using exponentially more time? Five points extra credit on the next test if you can answer that one :-) (The tentative answer is "nobody knows." If such a method were found, we could increase the efficiency of many programs enormously.)

If you're really interested in this stuff, e-mail the instructor, Daniel Jimenez for more information. If you're still interested, have your head examined :-)