A mistake while programming in Python.

I’m working on clustering Web People-Search results. So I have the results and snippets for one query in an XML file, and my program reads that and writes the clusters into another XML file.

I have thirty such XML files to process, and hence in the main program, called the clustering function multiple times.

What I noticed was that the size of the output files and processing times were growing linearly – 4 KB, 8 KB, 12 KB…. and I wondered what had gone wrong with my code. On opening the output files, I saw there were an insane number of clusters; not at all what was expected. Oh, and the number of clusters were growing linearly too.

I wondered what the matter was… and went through my code multiple times.

Then it hit me.

I had a global dictionary and set of vectors on which the clustering function was called. Bad programming practice, I know… I swear to god I’ll never repeat it again.

Now when I was calling the clustering function multiple times, the global data remained the same, so input1 proceeded fine, but input2 processed the data of both input1 and input2, input3 processed input1, input2, input3….. oh hell!

And why this post? So that I’ll never repeat this error again, for one thing. And so that other noobs like me who read this will also keep this in mind. With Python, you can easily switch between OOP and scripting… and it’s too easy to screw up on that.

If you have global data, and want to run the code multiple times, don’t be too lazy to write a shellscript to call the program on each input.


About wanderlust

just your average books-and-music person who wants to change the world.

Posted on January 28, 2009, in coding errors, python and tagged . Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: