Blog Archives

[Webapp Idea] Twitter Link Browser

I use Twitter quite some. A lot of the people I follow share quite a lot of links. When I browse twitter on my mobile in the morning, I can’t check out all the links. I usually ‘Favorite’ the links that seem interesting and then browse them later. I’d actually prefer a better interface to this, which enables me to tag these links privately so that I can look for them later as well.

I found one such webapp whose name I now forget. The problem with it was it had a sucky interface and didn’t let me preview all the links properly. Then there’s also Tweetree which offers previews of shared links. I also like the Google Reader/Gmail sort of interface which keeps track of new links and already read links. And also, when multiple people share the same link, I’d like to see it all collapsed as one with “X, Y and Z shared this” next to it. Or something.

So this is one thing I’d like to build using Google App Engine.

The steps to do so would be as follows:

  1. Find a nice Twitter API interface for Python which can preferably be integrated with Google App Engine.
  2. Write code to get tweets from your Twitter timeline.
    2(a) Learn how to use Twitter OAuth.
  3. Detect tweets with links. When they do, extract the unshortened link.
  4. By now, you have a set of links, and can choose to display them as you wish.
  5. Use the App Engine datastore to store previously viewed links. Possible attributes to be stored along with link can include users who shared this link, timestamps of tweets which shared these links, viewed-or-not (when dropping into database after extraction, this attribute should have the value ‘No’), title of linked page. Also store time of last login.
  6. Workflow: On login, extract links from timeline and drop into database until the timestamp of the tweet you’re reading is lesser than the time of last login. Then display those links with ‘viewed-or-not’ value as ‘No’ as ‘Unread items’ and the rest as ‘read’ items. On clicking each link, mark them as read. Also provide checkboxes to mass-markAsRead.
  7.  Basic interface: Gmail HTML sorts. Previews and stuff can come later.
Components to build a basic version:
  • OAuth
  • Tweet-getter
  • Link-extract-and-drop-in-database. This in turn includes Link extractor, unshortener, title-getter, database interface.
  • Database queries to view links and mark them as read/unread.
  • User interface.
Anything missing so far? Loose ends? Anything can be done better? Are you working on this? Any advice on getting started or any of the individual components?

I_Am_Back post… and a webapp idea

I’ve had an extremely nice two weeks, and have come back fully rejuvenated. Not once during the two weeks did I think of what’s on during my quarter. Now I guess I can restart all that.

Now I have an idea for a webapp. Something quite easily implementable on Google Appengine, I guess….

Let’s call it Don’tLiftMyContent!

It’s primarily supposed to be a service that checks if your blog’s or website’s content is being plagiarized elsewhere. Like, you give in your blog’s URL, and it gives you a list of pages that use your images and your text. And for this, it can use existing stuff like Google/Yahoo/Bing for text and TinEye for images. While the web search engines are reasonably good for text, TinEye doesn’t yet have such a comprehensive database of images, and this would probably be the limiting factor of the webapp.

I guess timestamps can be compared in order to eliminate sources your blog has plagiarized/borrowed from 🙂

Since this idea occurred to me just a few minutes back, all the existing work I could find are websites which enable teachers to check if their students are plagiarizing. I haven’t yet found a website which does this for blogs, and will be very glad to know if there is.

What say about the idea? Interested? We can code this together, if you want.

Food for thought: how will I know if someone gets the idea from this post of mine and goes on to create this webapp and not give me any credit at all? 🙂

List of things to blog about

I come across a lot of interesting stuff. Every Day. And I haven’t blogged about ANY of them!

The Distinguished Speaker Series from the Center for Machine Learning at UCI has been really interesting so far. And so has the weekly seminars by the same Center.

I quite enjoyed Prof. Peter Stone’s “Learning and Multiagent Reasoning for Autonomous Agents”. Material related to the talk is available here. I haven’t given robotics a serious thought, though if I get more reasonably into Machine Learning, I should want to try out games, strategies, and all that.

Then there was Prof. Doug Oard who talked about extracting identities from a set of emails. So now you refer to a Judy in your email, or even cutie pie, the software should be able to pick out who you’re talking about, just by being given a set of previous correspondences.I was QUITE interested. It didn’t seem that challenging learning-wise or applying concepts-wise, but I guess the challenge is to figure out how to extract features. And it turns out simple methods are the ones that work best.

There was also Judy Olsen from PARC, who talked about identity resolution on the Net, and how that could be used to obtain biases in reviews. So say X reviews a book by Y. And you find that X and Y are strongly related. You can take that into account while giving weight to reviews. Here, they didn’t use anything more than plain keyword matching on Google search. Just googling for X and Y and determining how much of overlap between the results existed. And the results turned out surprisingly good. I’m wondering about also using sentiment analysis here to determine if the bias would be positive or negative.

The most recent one was Prof. Padhraic Smyth, one of the judges of the Netflix contest talking about the nature of the contest, how it went on, the details in the data, some assumptions….. I came in half an hour late, but still managed to enjoy the rest of the talk.

And the reason I missed half of it? I’m currently a Graduate Student Researcher with Prof. Bill Tomlinson. I’m working on visualizations using Google Earth. My most recent working code will always be posted here. I should aggregate all my knowledge about Google Earth’s API and KML here. Sometime Soon. It’s rather fun.

I’m also taking an AI course this quarter, and my class project has to do with cryptogram solving. The approach I will be using is word-based genetic algorithm. You can check the paper out here [pdf]. I will be uploading the code to Google Appengine once I’m done with it, or atleast gotten started.

And then there’s Probability Models, where I’m supposed to read up on how Markov Chains and the like are used in solving problems. I want to check out how it’s done in text mining. More details forthcoming.

That’s it for now. More coming up, hopefully. I should make it a regular habit to blog here every single day so that I don’t lose what I learn.

%d bloggers like this: