Strings

A string is a sequence of characters inside the computer. A string can hold textual information, like words, names and sentences. In C strings are represented with arrays of characters. The last character in the string is the special character ASCII 0, or the null character. Thus, C represents strings using null-terminated arrays of char. Note that there is an important conceptual difference between a string and an array of char; the array might contain 100 elements, but the string might have only 10 characters plus one terminating null. Arrays must be allocated to hold as large a string as the program is expecting to process, but the string doesn't have to use the whole space.

There are standard string manipulation functions in string.h, one of those header files like stdio.h. Before we look at those, let's try writing some string manipulation functions ourselves to get a feel for how strings work. A string constant in C, like the ones we have been using in printf, is treated by the compiler as a null-terminated string in an array of characters, so if we pass a string constant as an argument to a function, the function should accept it as an array of characters. Here is a simple function to print out the contents of a string:

void print_string (char s[]) {
	int	i;

	for (i=0; s[i] != 0; i++)
		putchar (s[i]);
}
We can use this function to print a string, just like we use printf:
int main () {
	print_string ("Hello, World!\n");
	exit (0);
}
The string constant is passed as an array of char to print_string where it is printed character by character until the 0 character is encountered. Notice that, even though we don't know in advance how long the string is, the for loop still stops when it gets to the end because of the s[i] != 0 condition. Note also that, since 0 is the same thing as "false" in C, we can shorten the loop to this:
	for (i=0; s[i]; i++)
and it will still terminate when it gets to the 0 character in the string.

Now let's look at a function to read in a string from the standard input. It will read into an array of characters using getchar() (just like the FSA program) until it finds a carriage return; it will then place the null terminator into the array:

void read_string (char s[]) {
	int	i;
	char	c;

	i = 0;
	c = getchar ();
	while (c != '\n') {
		s[i++] = c;
		c = getchar ();
	}
	s[i] = 0;
}
We can use this function and print_string in a program to read in and print out someone's name:
int main () {
	char	t[100];

	print_string ("Enter your name: ");
	read_string (t);
	print_string ("Hello, ");
	print_string (t);
	print_string ("!\n");
	exit (0);
}
Note that print_string and read_string are only functions we have written for this lecture; you should really use printf and other functions to do the same thing, but these are good examples of how to write functions that use strings. Also, to be safe, functions like this should include extra parameters giving the maximum length of the string, i.e. the size of the array, so that we don't have segmentation faults.

The include files string.h, stdlib.h and stdio.h contains a number of useful string handling functions. Here are some of them:

Let's put some of these functions together to write a simple program to communiate with the user. The program can respond to a few English words and sentences, and can tell if the user has entered the same string twice:

#include <stdio.h>
#include <string.h>

int main () {
	char	input[100], last[100];

	strcpy (last, "");
	printf ("Hello.  Type in some stuff, 'quit' to exit.\n");
	for (;;) {
		printf ("> ");
		fgets (input, 100, stdin);
		input[strlen(input)-1] = 0;
		if (strcmp (input, "quit") == 0) exit (0);
		if (strcmp (input, "hello") == 0) {
			printf ("Hi!\n");
		} else if (strcmp (input, "how are you?") == 0) {
			printf ("I'm fine, how are you?\n");
		} else if (strcmp (input, "fine") == 0) {
			printf ("That's good.\n");
		} else if (strcmp (input, last) == 0) {
			printf ("You just said that...\n");
		} else printf ("Why do you say '%s'?\n", input);
		strcpy (last, input);
	}
}
Here is a sample session with this program:
Hello.  Type in some stuff, 'quit' to exit.
> hello
Hi!
> how are you?
I'm fine, how are you?
> fine
That's good.
> i am going to repeat myself
Why do you say 'i am going to repeat myself'?
> i am going to repeat myself
You just said that...
> quit
More elaborate versions of this program can be written to analyze the text the user types in and make more meaningful responses. One such program is the "doctor" mode in Emacs. If you type "M-x doctor" in Emacs (i.e., press ALT-x, then type "doctor"), you will be put into the "psychotherapy" mode of the editor where you can type English sentences and the computer will respond with what seems like English converation. In reality, it is just using a simple string handling algorithm that remembers things you said and turns them around, repeating them to you.