More Strings

Let's look at some more sample programs using strings.

Some string-processing programs take other programs as input. One such program is the C compiler, cc, that you use to translate your C programs to machine language. The compiler accepts as input a file full of strings, and produces machine language output. Let's look at a less ambitious program that does one of the steps of the C compiler: a program to remove comments from a C program. This program will read in a C program from the standard input and print out an equivalent C program on the standard output with all the comments removed:

#include <stdio.h>

int main () {
	char	s[1000];
	int	i, commenting;

	/* currently, "commenting" is false, i.e., we're not in a comment */

	commenting = 0;
	for (;;) {
	
		/* get a string from the file */

		fgets (s, 1000, stdin);

		/* if we're at the end of file, get out of the loop */

		if (feof (stdin)) break;

		/* for each character in the string... */

		for (i=0; s[i]; i++) {

			/* if we aren't yet commenting and see "/*", 
			 * then begin we know we're starting a comment 
			 */
			if (!commenting && s[i] == '/' && s[i+1] == '*') 
				commenting = 1;

			/* if we're not commenting, print the current char */

			if (!commenting) putchar (s[i]);

			/* if we are currently commenting and see 
			 * "(you know)", end commenting and increment 
			 * i past the / 
			 */
			if (commenting && s[i] == '*' && s[i+1] == '/') {
				commenting = 0;
				i++;
			}
		}
	}

	/* if we're still in a comment, warn the user */

	if (commenting) fprintf (stderr, "unfinished comment!\n");

	exit (0);
}

Extra for Experts

Note that this program is capable of processing itself. The C language is sufficiently expressive, or capable of expressing computational ideas, that it can answer meaningful questions about other arbitrary C programs themselves. For more on this, see an extended version of this Extra for Experts section.

More Strings cont.

Let's look at another string-processing program. This program accepts a single command line parameter string, then searches for this string in the standard input, printing the line number and line where the string is found. This can be used to search text files for every occurence of certain words, or even your C files to see where a certain word you're looking for occurs. It uses a function called substring_position that returns the index of the first occurence of a certain substring in a given string. A substring is just a little string that is part of another string.
#include <stdio.h>
#include <string.h>

/* 
 * This function returns the position of a substring in a string */

int substring_position (char haystack[], char needle[]) {
	int	i, len;

	/* find the length of the thing we're looking for */

	len = strlen (needle);

	/* search at each position in the string */

	for (i=0; haystack[i]; i++) {

		/* if we find it, return the position */

		if (strncmp (&haystack[i], needle, len) == 0) return i;
	}

	/* didn't find it?  return -1. */
	return -1;
}
	
int main (int argc, char *argv[]) {
	char	s[1000];
	int	line_number;

	/* the command line argument is the string to search for */

	if (argc != 2) {
		fprintf (stderr, "Usage: %s \n");
		exit (1);
	}

	/* start out at line number 1 */

	line_number = 1;

	/* while not at end of file, get some strings */

	for (;;) {
		/* read in a string */

		fgets (s, 1000, stdin);

		/* one more line... */

		line_number++;

		/* end of file? */

		if (feof (stdin)) break;

		/* if we find the substring... */

		if (substring_position (s, argv[1]) != -1) {

			/* print the line number and string */

			printf ("%d: %s", line_number, s);
		}
	}
	exit (0);
}
This program uses a trick we haven't discussed much in class yet; the line where it says:
	if (strncmp (&haystack[i], needle, len) == 0) ...
The &haystack[i] part essentially says "treat the i'th position of haystack as the first position of an array, and pass that array to strncmp." Really, we are passing the address of haystack[i] to strncmp, which will treat that address as a pointer to the first element of an array; the address after that is the second element, and so forth. We can't get very far with strings without talking about pointers, so that's next...