CS 312 Extended Exercise - Parameters, Strings, and Readability

NOTE: This is being posted for practice purposes only.

Not to be turned in.

If the help hours line is long we will not address questions about this exercise.

Description: The purposes of this exericse are:

  1. Write methods that use parameters and return values.
  2. Work with Strings.
  3. Work with user input.

In this exercise you will write a program analyzes text typed in by the user for readability.

There are three sample files: sample1, sample2, and sample3.  

Here are the inputs in a file you can download.

Note, some students have had trouble with the BlueJ IDE in the past when entering large amounts of text.

On this exercise you may only use the features discussed in chapters 1 through 4 of the textbook. (In particular you may not use arrays or methods from the String class not covered in chapter 3.) To get the most out of the exericse, do not use the split method from the String class and do not use a Scanner other than the one connected to System.in.


Introduction: Implement a program that determines the Flesch Readability Index for a pieces of text. This method of calculating the readability of text was devised by Rudolf Flesch, author of Why Johnny Can't Read and The Art of Readable Writing. When you check the spelling and grammar in a Word document you can have the readability statistics displayed, including the Flesch index.  We will modify the algorithm used by Flesch to make it a little easier so our results will be slightly different than what you might see from another implementation of the algorithm.

 The Flesch Readability Index is a number, generally between 0 and 100, that indicates how easy a piece of text should be to read. The lower the number, the harder the text is to read. A general breakdown of reading levels based on Flesch Index is:

 Flesch Score     Approximate grade level

90 to 100           5th grade
80 to 90            6th grade
70 to 80            7th grade
60 to 70            8th to 9th grade
50 to 60            10 to 12th grade (high school)
30 to 50            13th to 16th grade (college level)
0 to 30              college graduate.

The index is calculated by a fixed set of rules for counting the number of sentences, words, and syllables in a piece of text. This can be automated via a computer program. Here is an example. Consider the following sentence:

It was an extraordinarily windy day, and thus the riders were faced with several arduous climbs up the mountain, with the wind trying to push them back down the road.

The Readability Index for that sentence is 60.8 using our modified algorithm. The following conveys almost the same idea,

It was a very windy day. The riders had many hard climbs up mountains. The wind kept pushing them back down the road.

but has a Readability Index of 103.4. This method of determining the readability of  a piece of text does not do any sort of linguistic analysis so the results can be misleading, but the method usually produces a reasonable answer.

  1. The readability index itself is calculated by the following formula:

    Index = 206.835 -  (1.015 *  total words / total sentences) - ( 84.6 * total syllables / total words)  

    The index is rounded to the nearest tenth.

    Note, these rules are a heuristic. Heuristics may not always achieve the desired outcome, but they are extremely valuable to problem-solving processes. Heuristics are valuable because they simplify the problem solving process and usually give a good answer if not always a perfect answer.
  2. The program must count the number of words, number of syllables, and number of sentences. Certain assumptions are made about what is a word, syllable, and sentence in order to make it easier to write a program to do the analysis.

  3. Sentences are the easiest to count. Each occurrence of  a period, colon, semicolon, question mark, and exclamation mark count as a sentence. Thus the String "Gack!!!" has 1 word with 1 syllable, but 3 sentences. (Again this set of rules is a heuristic. A set of rules that often gives a good answer, but occasionally gives bad or nonsensical answers. It is possible per these rules to have a sentence with no words.). If a text has no sentence characters assume it has 1 sentence.

  4. A word is sequence of one or more characters delimited by white space or by a sentence terminators as listed in rule 3, whether or not it is an actual English word. White space is defined as a space, tab ( '\t'), a new line character ('\n'), and the end of the String itself. Again this gives some results that may not make sense. For example the text "I_don't_like_to_use_SPACES-EVER!" has a single word: I_don't_like_to_use_SPACES-EVER

  5. To count the total number of syllables use the following rules.  The following rules will sometimes give you the wrong answer for the number of syllables in a word, but they usually give the right answer and are much easier to implement then storing ALL the words that might be encountered and their syllable count.
    1. Each group of adjacent vowels counts as one syllable. Vowels consist of upper and lower case a, e, i, o, and u. For example, the "ea" in "real" contributes one syllable, but the "e" and the "a" in "regal" count as two syllables. "Happy" comes out with 1 syllable, due to a flaw in our heuristic.
    2. It is possible for a word to have no syllables. So for example hymn and gym have 0 syllables per our heuristic.

Examples:

Test sentence 1: This is a sentence. So is this!

Number of sentences: 2
Number of words: 7
Number of syllables: 9
Flesch readability index: 94.5

Test sentence 2: The following index was invented by Flesch as a simple tool to estimate the legibility of a document without linguistic analysis.

Number of sentences: 1
Number of words: 21
Number of syllables: 39
Flesch readability index: 28.4

Test sentence 3:  Wette. It 'reven hem, or was revenrage. With hey kince kin himply to justron' wer", "stere what willi?

Number of sentences: 3
Number of words: 18
Number of syllables: 27
Flesch readability index: 73.8

This example is merely to show the algorithm works regardless of if the input is standard English or not. You could even run the algorithm on source code, although the answer would not be very helpful or meaningful.

Hints:

  1. Implement a multi-pass algorithm. This means your run through the text three times. Once to count the sentences, once to count the words, and once to count the syllables. Create a different method for each of these passes.
  2. The String methods I used are: int length(), char charAt(int index), String toLowerCase(), int indexOf(char ch), and String concatenation. You should not need to use any other String methods. Recall you will learn the most if you do not use the split method from the String class or create any Scanner objects besides the one connected to System.in.
  3. Counting the number of words and syllables are quite similar. To count the number of words, determine when words (or vowels clusters) start. To do this, loop through the characters of the String. If a character is NOT a word delimiter (\t\n space .;:!?) and the preceding character is a word delimiter then that is the start of a word. To make this easier add a space to the start of the text the user types in.
  4. Counting syllables is similar. Count the number of vowels that are not preceded by a vowel. Getting a lowercase version of the text can make this a little easier.
  5. Use printf to print out the Flesch score.

By way of comparison, the suggested solution consists of 160 lines  and 9 methods including main. Most of the 160 lines are comments, blank lines, or lines with a single }. The number of lines with actual code is about 75. Note, some methods are used to provide structure to the program even though they do not remove any redundancy.

When finished turn in your Flesch.java file via Canvas.

Provided File Responsibility

Sample output files: sample1, sample2, and sample3

Provided by me
Flesch.java (Provided shell) You and me. (Okay, mostly you.)

 

Back to the CS 312 homepage.