Text Generation Using Markov Chains

A Markov chain assigns a probability to each potential next state based solely on the current state. For text generation, the current state is the previous n words in the sentence and the next state would be the next word to add to the sentence. n can be any positive integer, and changing n would affect how closely the generated text resembles the sample text.

To assign a probability to each potential state, a probabilty table needs to be built. Each row of this table would correspond to each unique n-word long sequence in the sample text, and each column would correspond to each unique word. Then probabilities would be placed in the table based on how many times each n-word long sequence occurs in the sample text and how many times a certain word followed that sequence. For large sample texts, this probability table is enormous and very sparse. To save time and space, I implemented my own version of a sparce matrix.

Here are some examples of the final result of this project:

  • Sample text taken from Herman Melville's Bartleby, The Scrivener.

    forbade the supposition that he would prefer so to do. No more then. Since he will not budge. Bribes he leaves under your own paper weight on your table; in short, say now that in a day or two you will begin to be a little reasonable, '' was sure to come; and then, good morning, sir. '' Several days passed, and I think that if he would but have named a single relative or friend, I would not like it at all; though, as I said before, I am waiting. '' I closed the doors, and again advanced towards Bartleby. I had imagined you of such a gentlemanly organization, that in such curiosity I fully share, but am wholly unable to gratify it.

  • Sample text taken from Donald Trump's last 5000 tweets (excluding retweets).

    . New Stock Market RECORD. Congratulations, and thank you! Love our Boaters, Love our Country! https: //t. co/390duZCB6P @ajshaw1003 Great! Biden will bring the Regulation back, but bigger and worse. Wow! Congratulations to the Kansas City Chiefs on a great game, and a great next year! NASDAQ at new record high, the rest to follow. Sit back & watch this happen to a citizen of the United States. Thank you! Also, New York Post The Trump Campaign was not treated fairly by the Commission. Did I show good instincts in being the first to know? Winning Big. Next year will be one of the greatest and fastest medical miracles in modern day history. I don't blame them! SAVE THE POST OFFICE!

    NOTE: This does not reflect any of my political views. I used Trump's tweets due to his recognizable style.