Some string-processing programs take other programs as input. One such program is the C compiler, cc, that you use to translate your C programs to machine language. The compiler accepts as input a file full of strings, and produces machine language output. Let's look at a less ambitious program that does one of the steps of the C compiler: a program to remove comments from a C program. This program will read in a C program from the standard input and print out an equivalent C program on the standard output with all the comments removed:
#include <stdio.h>
int main () {
char s[1000];
int i, commenting;
/* currently, "commenting" is false, i.e., we're not in a comment */
commenting = 0;
for (;;) {
/* get a string from the file */
fgets (s, 1000, stdin);
/* if we're at the end of file, get out of the loop */
if (feof (stdin)) break;
/* for each character in the string... */
for (i=0; s[i]; i++) {
/* if we aren't yet commenting and see "/*",
* then begin we know we're starting a comment
*/
if (!commenting && s[i] == '/' && s[i+1] == '*')
commenting = 1;
/* if we're not commenting, print the current char */
if (!commenting) putchar (s[i]);
/* if we are currently commenting and see
* "(you know)", end commenting and increment
* i past the /
*/
if (commenting && s[i] == '*' && s[i+1] == '/') {
commenting = 0;
i++;
}
}
}
/* if we're still in a comment, warn the user */
if (commenting) fprintf (stderr, "unfinished comment!\n");
exit (0);
}
#include <stdio.h>
#include <string.h>
/*
* This function returns the position of a substring in a string */
int substring_position (char haystack[], char needle[]) {
int i, len;
/* find the length of the thing we're looking for */
len = strlen (needle);
/* search at each position in the string */
for (i=0; haystack[i]; i++) {
/* if we find it, return the position */
if (strncmp (&haystack[i], needle, len) == 0) return i;
}
/* didn't find it? return -1. */
return -1;
}
int main (int argc, char *argv[]) {
char s[1000];
int line_number;
/* the command line argument is the string to search for */
if (argc != 2) {
fprintf (stderr, "Usage: %s \n");
exit (1);
}
/* start out at line number 1 */
line_number = 1;
/* while not at end of file, get some strings */
for (;;) {
/* read in a string */
fgets (s, 1000, stdin);
/* one more line... */
line_number++;
/* end of file? */
if (feof (stdin)) break;
/* if we find the substring... */
if (substring_position (s, argv[1]) != -1) {
/* print the line number and string */
printf ("%d: %s", line_number, s);
}
}
exit (0);
}
This program uses a trick we haven't discussed much in class yet;
the line where it says:
if (strncmp (&haystack[i], needle, len) == 0) ...The &haystack[i] part essentially says "treat the i'th position of haystack as the first position of an array, and pass that array to strncmp." Really, we are passing the address of haystack[i] to strncmp, which will treat that address as a pointer to the first element of an array; the address after that is the second element, and so forth. We can't get very far with strings without talking about pointers, so that's next...