SimpleDiff in Python

A while ago I posted a PHP implementation of a Diff algorithm I came up with[1]. Since it was well received, and it’s a useful little algorithm to have, I created a Python version as well.

There are a few performance improvements as well. The PHP version creates an array in memory proportional to the square of the size of the input, while the Python version’s array is directly proportional to the size of the input. I also sped up how the algorithm finds the indexes of the “new” elements in the “old” array.

Download simplediff.txt (re-name the file to simplediff.py to use)

[1] It is probably the same algorithm that others use, but I haven’t gotten around to getting an ACM membership to access the related papers

Posted on Feb 13th, 2008 in Python

Tail recursion in Python

After spending a lot of time in Scheme, it’s hard not to think in recursion. So recently when I started to improve my Python skills, I missed having Scheme optimize my tail recursive calls.

For example, consider the mutually recursive functions even and odd. You know a number, n, is even if it is 0, or if n - 1 is odd. Similarly, you know a number is not odd if it is 0, and that it is odd if n - 1 is even. This translates to the python code:

def even(x):
  if x == 0:
    return True
  else:
    return odd(x - 1)

def odd(x):
  if x == 0:
    return False
  else:
    return even(x - 1)

This code works, but only for x < 1000, because Python limits the recursion depth to 1000. As it turns out, it is easy to get around this limitation. Included below is a generic tail_rec function that could be used for most cases where you need tail recursion, and an example of it used for the odd/even problem.

def tail_rec(fun):
   def tail(fun):
      a = fun
      while type(a) == type(tail):
         a = a()
      return a
   return (lambda x: tail(fun(x)))

def tail_even(x):
  if x == 0:
    return True
  else:
    return (lambda: tail_odd(x - 1))

def tail_odd(x):
  if x == 0:
    return False
  else:
    return (lambda: tail_even(x - 1))

even = tail_rec(tail_even)
odd = tail_rec(tail_odd)

It’s not as pretty as the Scheme version, but it does the trick. Of course, the odd/even functions are just for the sake of a simple example and have no real-world use, but the tail_rec function could be used in practice.

Posted on Dec 13th, 2007 in Python

Garden Path Sentences

I came across a post on the Powerset Blog recently about garden path sentences, or sentences that lead you down the wrong path through a string of words with multiple meanings. For example,

The complex houses married and single students and their families

In this case, most readers would probably think complex was an adjective that modified the plural noun houses. The post ended with a challenge - how easy would it be to create a program to automatically generate these sentences. Since school is out and I have some free time, I tried it myself. I found a decent free xml dictionary, and wrote a Ruby script to parse the important bits (the type of word and alternate forms) into an SQL database. I cross-checked all the words against a word frequency table to make sure there were no obscure words. I then wrote a Python script to put the words together into a (hopefully meaningful, but not often) sentence. I put the Python script onto my server so you can play with it here.

gardenpath.png

As you can see, the sentences that it comes up with are far from meaningful. However, in most cases you can at least see how a reader could be taken down the wrong path (at least in the cases where there is a right path). In the above example, concrete could be an adjective or a noun, and spheres could be a noun or a verb (to form a sphere). Foster could be an adjective or a noun depending on the context, but I couldn’t see the reader seeing it as an adjective here. Certainly the sentence generator leaves a lot to be desired (especially considering that this was one of the better sentences), but I got about as far with it as I expected to. I think it could be improved further with a few modifications:

  • Words in the database are already cross-checked to make sure they aren’t obscure, but often a word will be common as a noun and uncommon as a verb, or vice versa. I didn’t have a dataset that allowed me to determine if this was the case for a particular word.
  • The valency of verbs is ignored. All verbs are assumed to be transitive, even though this information is given in the database.
  • I underestimated the difficulty of having a computer generate a meaningful sentence. It is difficult to determine what verbs are compatible with what nouns, I guess you would need to parse a large amount of English text (perhaps some of Project Gutenberg - I think Wikipedia would not be varied enough but I could be wrong).

I noticed later that Ero Carrera had taken a similar approach to what I did, but with his linguistics experience he better anticipated the problems I ran into. He has some good ideas, and his post is an interesting read.

Posted on Jun 27th, 2007 in Ruby, Python