Thursday, 5 January 2012

Caesar cypher solver in Python

An optional exercise in the ai class was to solve a Caesar cypher. The example given was pretty simple with only one shift, so the easiest way to solve it would be by inspection after printing out all 26 options.

But that isn't in the spirit of things, we want to solve it in an a.i. manner using probabilities, here's

the original cypher :

"Esp qtcde nzyqpcpynp zy esp ezatn zq Lcetqtntlw Tyepwwtrpynp hld spwo le Olcexzfes Nzwwprp ty estd jplc."

The first thing I noticed is that it has structure, if we assume that the punctuation has been left alone then we have a set of words. I can look up the probabilities of words in a table -I used the one from the British National Corpus:  ftp://ftp.itri.bton.ac.uk/bnc/all.num.o5

I created a Python class with a dictionary of words and probabilities (word_freqs) and a class method to return the probability of a word :

  @classmethod
  def word_prob(self,word):
    if word in self.word_freqs:
      return math.log(self.word_freqs[word] / self.corpus_freq)
    return math.log(1 / self.corpus_freq)

I take the log because, even with the correct, english, sentence, the probabilities soon get too small for simple arithmetic as they are multiplied together for each word in the sentence to get a 'sentence probability'

Storing and sorting the probabilities stumped me for a bit, but I eventually decided upon a list of tuples :

prob_map.append((prob, string2))


Which I sort, using the operator module, with :
 sorted(prob_map, key=operator.itemgetter(0), reverse=False):


This seems clunky compared to PHP, Perl or even C, but I'm still feeling my way around Python so I'm hoping that there's a better way.

To give me the answer :
(-7308.628573183837, 'the first conference on the topic of artificial intelligence was held at dartmouth college in this year.')

1956 BTW

There we go, a first bit of Natural Language Programming -which I have signed up for this term.

No comments:

Post a Comment

linkedin