Garden Path Sentences

I came across a post on the Powerset Blog recently about garden path sentences, or sentences that lead you down the wrong path through a string of words with multiple meanings. For example,

The complex houses married and single students and their families

In this case, most readers would probably think complex was an adjective that modified the plural noun houses. The post ended with a challenge - how easy would it be to create a program to automatically generate these sentences. Since school is out and I have some free time, I tried it myself. I found a decent free xml dictionary, and wrote a Ruby script to parse the important bits (the type of word and alternate forms) into an SQL database. I cross-checked all the words against a word frequency table to make sure there were no obscure words. I then wrote a Python script to put the words together into a (hopefully meaningful, but not often) sentence. I put the Python script onto my server so you can play with it here.

gardenpath.png

As you can see, the sentences that it comes up with are far from meaningful. However, in most cases you can at least see how a reader could be taken down the wrong path (at least in the cases where there is a right path). In the above example, concrete could be an adjective or a noun, and spheres could be a noun or a verb (to form a sphere). Foster could be an adjective or a noun depending on the context, but I couldn’t see the reader seeing it as an adjective here. Certainly the sentence generator leaves a lot to be desired (especially considering that this was one of the better sentences), but I got about as far with it as I expected to. I think it could be improved further with a few modifications:

  • Words in the database are already cross-checked to make sure they aren’t obscure, but often a word will be common as a noun and uncommon as a verb, or vice versa. I didn’t have a dataset that allowed me to determine if this was the case for a particular word.
  • The valency of verbs is ignored. All verbs are assumed to be transitive, even though this information is given in the database.
  • I underestimated the difficulty of having a computer generate a meaningful sentence. It is difficult to determine what verbs are compatible with what nouns, I guess you would need to parse a large amount of English text (perhaps some of Project Gutenberg - I think Wikipedia would not be varied enough but I could be wrong).

I noticed later that Ero Carrera had taken a similar approach to what I did, but with his linguistics experience he better anticipated the problems I ran into. He has some good ideas, and his post is an interesting read.

Posted on Jun 27th, 2007 in Ruby, Python

Endless Google Search

I felt like coding today, so I put together a little hack from an idea I have had for a while. What I came up with is a web search (powered by Google), that loads new search results as you scroll the page down. Try it, it’s actually pretty cool.

Here is how it works: there is a large div element at the bottom of the page just to take up space. When it comes onto the screen, an ajax request is made to the server to get the next 10 results from Google. The requests are made through Google’s SOAP api, which is no longer available, but I had an old API key so I was able to get it to work. I had all the client stuff working within an hour, but Google’s API took a while to figure out. Google uses SOAP, which is powerful but hard to code for compared to a simple GET API. It took me a couple of hours to get the server-side stuff working but it is still a hack, so don’t be surprised if you get an error or some unexpected behaviour.

It was designed for FireFox/Mozilla browsers. The only other browser I have tried it with is IE, which it does not work with. So if you are using Internet Explorer, you won’t see anything interesting.

Try it here

Posted on Jun 06th, 2007 in Web Apps, PHP