Blog Archives

List of things to blog about

I come across a lot of interesting stuff. Every Day. And I haven’t blogged about ANY of them!

The Distinguished Speaker Series from the Center for Machine Learning at UCI has been really interesting so far. And so has the weekly seminars by the same Center.

I quite enjoyed Prof. Peter Stone’s “Learning and Multiagent Reasoning for Autonomous Agents”. Material related to the talk is available here. I haven’t given robotics a serious thought, though if I get more reasonably into Machine Learning, I should want to try out games, strategies, and all that.

Then there was Prof. Doug Oard who talked about extracting identities from a set of emails. So now you refer to a Judy in your email, or even cutie pie, the software should be able to pick out who you’re talking about, just by being given a set of previous correspondences.I was QUITE interested. It didn’t seem that challenging learning-wise or applying concepts-wise, but I guess the challenge is to figure out how to extract features. And it turns out simple methods are the ones that work best.

There was also Judy Olsen from PARC, who talked about identity resolution on the Net, and how that could be used to obtain biases in Amazon.com reviews. So say X reviews a book by Y. And you find that X and Y are strongly related. You can take that into account while giving weight to reviews. Here, they didn’t use anything more than plain keyword matching on Google search. Just googling for X and Y and determining how much of overlap between the results existed. And the results turned out surprisingly good. I’m wondering about also using sentiment analysis here to determine if the bias would be positive or negative.

The most recent one was Prof. Padhraic Smyth, one of the judges of the Netflix contest talking about the nature of the contest, how it went on, the details in the data, some assumptions….. I came in half an hour late, but still managed to enjoy the rest of the talk.

And the reason I missed half of it? I’m currently a Graduate Student Researcher with Prof. Bill Tomlinson. I’m working on visualizations using Google Earth. My most recent working code will always be posted here. I should aggregate all my knowledge about Google Earth’s API and KML here. Sometime Soon. It’s rather fun.

I’m also taking an AI course this quarter, and my class project has to do with cryptogram solving. The approach I will be using is word-based genetic algorithm. You can check the paper out here [pdf]. I will be uploading the code to Google Appengine once I’m done with it, or atleast gotten started.

And then there’s Probability Models, where I’m supposed to read up on how Markov Chains and the like are used in solving problems. I want to check out how it’s done in text mining. More details forthcoming.

That’s it for now. More coming up, hopefully. I should make it a regular habit to blog here every single day so that I don’t lose what I learn.

Blog Gender Analysis

Recently, on my main blog, I found people commenting to say that they were debating my gender going solely by my writing. That brought back an old set of ideas I had.

There’s no dearth of web apps that determine the gender of the writer given a sample piece of writing. But these mostly were erroneous when they started off – Jane Austen was classfied a male writer by one of these, I remember.

Now however, GenderAnalyzer seems to have improved. Guess it’s due to learning, increasing of the sample space, etc etc. Not at all… they have just gone on from randomly tagging things as Male to tagging things Female.

I thought this was strictly for entertainment purposes, until I saw this as one of the possible tasks on the TREC Blog Track. That set me thinking.

The first application of such a technology that came to mind was spawned by Agatha Christie’s novels – determining whether the writer of threatening notes was a man or a woman. It helps narrow down the suspects, look out for possible accomplices… yeah, it can be put to various uses.

So over the next couple of days, I should try reading more on this, and try analyzing the rationale (if any) behind this task. I’m skeptical, as I feel something so inherently biological like gender does not map perfectly to social and culturally influenced things like writing style, and hence any such task is an exercise in futility.

But let’s see.

Watch this space.

%d bloggers like this: