Monthly Archives: June 2009
I didn’t do as much literature survey on this as I’d’ve wanted, but I came across this paper [pdf]. Word frequencies are different among men and women, apparently. That’s the basis of disambiguation. Women use more pronouns than men do, and the frequency compares with that of fiction, while that of men compares with nonfiction.
So I guess it should work like this: identify genre of the piece, and then identify gender.
When I was looking up stuff for Blog Gender Analysis, I came across uClassify.com. Great site. I guess it can be used for rapid prototyping and things. Just to see if a particular approach might work or not. Or something like that.
What is it used for, basically? Please do tell me… I’d like to know.
Recently, on my main blog, I found people commenting to say that they were debating my gender going solely by my writing. That brought back an old set of ideas I had.
There’s no dearth of web apps that determine the gender of the writer given a sample piece of writing. But these mostly were erroneous when they started off – Jane Austen was classfied a male writer by one of these, I remember.
Now however, GenderAnalyzer seems to have improved. Guess it’s due to learning, increasing of the sample space, etc etc. Not at all… they have just gone on from randomly tagging things as Male to tagging things Female.
I thought this was strictly for entertainment purposes, until I saw this as one of the possible tasks on the TREC Blog Track. That set me thinking.
The first application of such a technology that came to mind was spawned by Agatha Christie’s novels – determining whether the writer of threatening notes was a man or a woman. It helps narrow down the suspects, look out for possible accomplices… yeah, it can be put to various uses.
So over the next couple of days, I should try reading more on this, and try analyzing the rationale (if any) behind this task. I’m skeptical, as I feel something so inherently biological like gender does not map perfectly to social and culturally influenced things like writing style, and hence any such task is an exercise in futility.
But let’s see.
Watch this space.
I don’t really agree with the ‘kill’ bit… that’s the way the Net evolves. We need to evolve with it. I also don’t see why Twitter shortening URLs is a problem for the Net at large, apart from indirectly… Twitter is a walled garden anyway.
I however like the suggestion of major search engines offering their own URL shortening services.
I’m not too well-versed with how machine learning works.
I’m quite interested in how Hunch.com works. So what happens? You have a topic, and a list of questions, and the problem is about ordering them in form of a tree so as to reach a solution in the most appropriate way, keeping other things in mind such as not asking someone how much they enjoy steaks if in a previous question they’ve mentioned they are vegetarian?
What are the issues involved here?Ordering of the questions strikes me as one. Another would be keeping a tab on generation of new topics (it seems to be totally user-driven). How do you try eliminating duplicates?
It seems to be a more dynamic and personalized Wikipedia model; So any argument about the unreliability of user-contributed information would be void.
I’d sure like to know more about the challenges involved in creating a site like this. Apart from of course, the computational power etc needed.
The topics now look like WikiHow or BlogThings topics. Sure hope it gets better with time. As I’m sure it will.
A friend of mine wondered if the day would come when the President of the USA would use hunch.com to decide whether to bomb Iraq 😉
It’s something called a ‘Decision Engine’. I didn’t know what that was until I saw it. So you type in a query, something you want to make a decision about. It then asks you questions, and on the basis of your answers gives you the decision. And also a list of pros and cons.
I’m not too impressed… but this is a great beginning.
I’m curious about how it works… still perusing this page. It seems to gloss over too much.
I’ll get home and post more on this.