Yes, I Speak Computer Fluently

Rate this post

You’re a data scientist.

https://media.giphy.com/media/8jexbcUv7kqje/giphy.gif

For CPSC 420, Modeling & Simulation, I was given the task of analyzing a huge amount of text, then simulating the author of that text writing some more.  It involved comparing the frequency that words existed next to one another, eventually resulting in some crazy result that initially freaked me out.

I started the program from scratch; I wish I hadn’t.  There currently exists quite a few libraries that are useful for analyzing text based on unigrams, bigrams, trigrams, or even n-grams. Those libraries would certainly be easier to implement than the way I did.  Regardless, it was a fun experience figuring out how to make it work proper!

If you’d like to run it yourself, here’s what you do (for Unix or Mac users, sorry Windows):

  1. Go to https://pastebin.com/cvTXnnm8
  2. Click the Download button.
  3. In your Terminal, cd to your downloads directory.
  4. In your Terminal, type ‘mv cvTXnnm8.txt ngram.py’
  5. In your Terminal, type ‘python ngram.py’
  6. ???
  7. Profit!

Make sure you supply a lengthy text file (.txt) and pass it in when you run the script.  Otherwise, it won’t do anything.

 

 

Leave a Reply

Your email address will not be published / Required fields are marked *