<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Total Recall</title>
	<atom:link href="http://irjejune.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://irjejune.wordpress.com</link>
	<description>Where I regurgitate all the new CS knowledge I acquire</description>
	<lastBuildDate>Sat, 24 Dec 2011 08:27:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='irjejune.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Total Recall</title>
		<link>http://irjejune.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://irjejune.wordpress.com/osd.xml" title="Total Recall" />
	<atom:link rel='hub' href='http://irjejune.wordpress.com/?pushpress=hub'/>
		<item>
		<title>[Webapp Idea] Twitter Link Browser</title>
		<link>http://irjejune.wordpress.com/2011/11/09/webapp-idea-twitter-link-browser/</link>
		<comments>http://irjejune.wordpress.com/2011/11/09/webapp-idea-twitter-link-browser/#comments</comments>
		<pubDate>Wed, 09 Nov 2011 20:10:19 +0000</pubDate>
		<dc:creator>wanderlust</dc:creator>
				<category><![CDATA[ideas]]></category>
		<category><![CDATA[softwares and tools]]></category>
		<category><![CDATA[gmail]]></category>
		<category><![CDATA[google appengine]]></category>
		<category><![CDATA[OAuth]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[webapp]]></category>

		<guid isPermaLink="false">http://irjejune.wordpress.com/?p=125</guid>
		<description><![CDATA[I use Twitter quite some. A lot of the people I follow share quite a lot of links. When I browse twitter on my mobile in the morning, I can&#8217;t check out all the links. I usually &#8216;Favorite&#8217; the links that seem interesting and then browse them later. I&#8217;d actually prefer a better interface to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=125&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I use Twitter quite some. A lot of the people I follow share quite a lot of links. When I browse twitter on my mobile in the morning, I can&#8217;t check out all the links. I usually &#8216;Favorite&#8217; the links that seem interesting and then browse them later. I&#8217;d actually prefer a better interface to this, which enables me to tag these links privately so that I can look for them later as well.</p>
<p>I found one such webapp whose name I now forget. The problem with it was it had a sucky interface and didn&#8217;t let me preview all the links properly. Then there&#8217;s also Tweetree which offers previews of shared links. I also like the Google Reader/Gmail sort of interface which keeps track of new links and already read links. And also, when multiple people share the same link, I&#8217;d like to see it all collapsed as one with &#8220;X, Y and Z shared this&#8221; next to it. Or something.</p>
<p>So this is one thing I&#8217;d like to build using Google App Engine.</p>
<p>The steps to do so would be as follows:</p>
<ol>
<li>Find a nice Twitter API interface for Python which can preferably be integrated with Google App Engine.</li>
<li>Write code to get tweets from your Twitter timeline.<br />
2(a) Learn how to use Twitter OAuth.</li>
<li>Detect tweets with links. When they do, extract the unshortened link.</li>
<li>By now, you have a set of links, and can choose to display them as you wish.</li>
<li>Use the App Engine datastore to store previously viewed links. Possible attributes to be stored along with link can include users who shared this link, timestamps of tweets which shared these links, viewed-or-not (when dropping into database after extraction, this attribute should have the value &#8216;No&#8217;), title of linked page. Also store time of last login.</li>
<li>Workflow: On login, extract links from timeline and drop into database until the timestamp of the tweet you&#8217;re reading is lesser than the time of last login. Then display those links with &#8216;viewed-or-not&#8217; value as &#8216;No&#8217; as &#8216;Unread items&#8217; and the rest as &#8216;read&#8217; items. On clicking each link, mark them as read. Also provide checkboxes to mass-markAsRead.</li>
<li> Basic interface: Gmail HTML sorts. Previews and stuff can come later.</li>
</ol>
<div>Components to build a basic version:</div>
<div>
<ul>
<li>OAuth</li>
<li>Tweet-getter</li>
<li>Link-extract-and-drop-in-database. This in turn includes Link extractor, unshortener, title-getter, database interface.</li>
<li>Database queries to view links and mark them as read/unread.</li>
<li>User interface.</li>
</ul>
<div>Anything missing so far? Loose ends? Anything can be done better? Are you working on this? Any advice on getting started or any of the individual components?</div>
</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irjejune.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irjejune.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irjejune.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irjejune.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irjejune.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irjejune.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irjejune.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irjejune.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irjejune.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irjejune.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irjejune.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irjejune.wordpress.com/125/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irjejune.wordpress.com/125/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irjejune.wordpress.com/125/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=125&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irjejune.wordpress.com/2011/11/09/webapp-idea-twitter-link-browser/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">wanderlust</media:title>
		</media:content>
	</item>
		<item>
		<title>RIP, Reader</title>
		<link>http://irjejune.wordpress.com/2011/11/01/rip-reader/</link>
		<comments>http://irjejune.wordpress.com/2011/11/01/rip-reader/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 21:02:38 +0000</pubDate>
		<dc:creator>wanderlust</dc:creator>
				<category><![CDATA[APIs]]></category>
		<category><![CDATA[collaborative filtering]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[websites and tips]]></category>
		<category><![CDATA[google buzz]]></category>
		<category><![CDATA[google plus]]></category>
		<category><![CDATA[google reader]]></category>

		<guid isPermaLink="false">http://irjejune.wordpress.com/?p=109</guid>
		<description><![CDATA[Yeah, this is yet another one of the funeral dirges for Google Reader. And I post it here instead of on my personal blog because I need to get into the habit of writing about technology here. Google Reader is hardly &#8216;technology&#8217; as I intend it to be&#8230; I want to use this place for [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=109&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Yeah, this is yet another one of the funeral dirges for Google Reader. And I post it here instead of on my personal blog because I need to get into the habit of writing about technology here. Google Reader is hardly &#8216;technology&#8217; as I intend it to be&#8230; I want to use this place for research updates and paper summaries.  But the anxiety about &#8216;not being good enough&#8217; when it comes to all that is so much that I don&#8217;t want to write anything even remotely geeky. I need to snap out of that. And it&#8217;s NaNoWriMo, it&#8217;s about quantity more than quality. So here we go <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>So basically there are two main arguments against Google Reader&#8217;s integration with Google Plus. First is about how the user interface is sucky. And the second is about how the removal of sharing has killed the whole spirit of Reader. A third, if I may add, is that the platform/API is so bad, and everything is so messed up at first look that I can&#8217;t seem to wrap my mind around how to write a wrapper that makes things better.  Oh wait, there&#8217;s a fourth as well &#8211; the &#8216;stream&#8217; format, as opposed to the folders-and-tags format, is the very antithesis of what Reader is supposed to be.</p>
<p>Let&#8217;s start with the appearance. Yes, white space is good. It makes things look &#8216;clean&#8217;. But that&#8217;s only when you have very specific things you want your user to see on your page. It works great for the Google.com homepage, for instance&#8230; all you want is a search bar. But when it&#8217;s a feed reader, it doesn&#8217;t work at all. When I log in, I don&#8217;t want to see half my screen space taken up by needless headers and whatnot. The bar with &#8216;Refresh&#8217;, &#8216;Mark as read&#8217; and &#8216;Feed settings&#8217; are needlessly large and prominent instead of being smaller and not taking up much space. They aren&#8217;t used all that much, to start with, that justifies their large font size. The focus here shouldn&#8217;t be on the options, but on the thing I&#8217;m reading. Fail.</p>
<p>Then everything&#8217;s gray, including links. If something&#8217;s not blue or purple, my mind doesn&#8217;t consider it a link. Sorry, but those are unwritten conventions on the Web. There&#8217;s no reason to change that now, and gray is a horrible color to show that something&#8217;s different from the rest of the black text. And the only spots of color on the page are a tiny dab of red to show the feed you&#8217;re currently reading, and a large button on the top left that says &#8216;Subscribe&#8217;. Dab of red, seriously? I much rather preferred the entire line showing the current feed highlighted instead of that little red bar. And I don&#8217;t add new feeds to read everyday that I need a large &#8216;Subscribe&#8217; button. And when I do add feeds, I don&#8217;t add them using google.com/reader&#8230; I&#8217;m on the website I want to add, usually, and add feeds by clicking on the RSS icon, and then adding to reader.</p>
<p>Then the UI for sharing. It&#8217;s a lot more clicks to share something now. And yeah, the gripe is that whatever I share will be shared only on G+, but we&#8217;ll get to that in a moment. My problem with having to pick what circles I share with each time I share a feed is that it&#8217;s too much decision making too often. Atleast give me a set of check boxes of my circles so that all I need to do is two clicks instead of having to start typing my circle names.</p>
<p>It turned out, if you wanted to share something without publicly +1-ing something, you&#8217;d have to go to the top-right corner and click on &#8216;Share&#8217;. Well, how is that intuitive? And why would anyone design it that way, especially when the previous way to do that was by clicking on &#8216;share&#8217; right below the post? Surely, it could have just had the Circles thing appear when you clicked the &#8216;share&#8217; button, and +1-ing it could be a different button? And keep the top-right Share button if you like?</p>
<p>Now about sharing. I can share something with folks from Google Reader, yes, but they can only read it from Google Plus. Someone said that&#8217;s like retweeting something on Twitter from your client, like say Tweetdeck, but those who follow you can see your RT&#8217;s using only twitter.com. How retarded is that? I want a one-stop shop where I can do all my reading instead of having it spread over a zillion other places.</p>
<p>Due to which one of the things I wanted to do was build a wrapper website that integrated links shared on your G+ stream with your Reader feeds. I can&#8217;t seem to wrap my mind around how exactly it would work, but that&#8217;s one thing I certainly want to do.</p>
<p>The &#8216;stream&#8217; format sucks for reading shared links. I have this problem with Twitter too, but on Twitter, you can &#8216;Favorite&#8217; tweets which contain links and then read them one by one later. In fact, I was wondering about a platform that takes links on your Twitter timeline and puts them together for easy reading, feed reader style. Google Plus however has no such feature which you can use to tuck away stuff for later. If you&#8217;re too busy, you skip over a shared link and it&#8217;s lost. I much preferred the model where your feeds would all accumulate and if it got too much to handle, you could always mark all as read. Even better when your feeds would be properly organized.</p>
<p>And then Google Plus does a bad job of displaying shared links. It shows a small preview, but that&#8217;s more often than not insufficient. Buzz was better in this respect&#8230; atleast your images could be expanded, and posts could be expanded so that you could read it right there. Ha, one positive of this would however be that people would get a lot more hits on their websites. And it is not immediately apparent as to how inconvenient this sort of a visual format is, because people don&#8217;t share so much on Google Plus yet, and they don&#8217;t yet use it as a primary reader or such extensive use that it gets on their nerves.</p>
<p>And finally about the thing that has had the largest impact. Sharing.</p>
<p>Previously, in 2007, when Reader didn&#8217;t yet have sharing, we&#8217;d all come across nice links we&#8217;d want to share with our friends, and then either ping them on IM with it, or mail them the link. Needless to say, it was irksome. For both us and our friends. But somehow, when you shared it on Reader, the intrusiveness of sending links went away. It was just there, and if you liked it, you said so on the comments or by resharing it or referencing it in conversation. It stopped feeling like you were shoving it down someone&#8217;s throat, or someone shoved it down yours.</p>
<p>Sharing was also a nice way to filter content. For example, I loved reading Mental Floss&#8217;s feeds, but couldn&#8217;t stand the feed-puke that were feeds like TechCrunch and Reddit, whereas it was the other way for some of my friends. So we just followed each other, and I read the TechCrunch and Reddit content they deemed good enough to share, while I shared the interesting tidbits from Mental Floss.</p>
<p>Google Reader, I remember feeling, was a nice incubator for observing social network dynamics and introducing social features. It was my first first-hand exposure to recommender systems, before I moved to the USA and could actually shop on Amazon or watch movies on Netflix. It was interesting seeing how the recommendations incorporated stuff from your GTalk chats, your searches, stuff you &#8216;liked&#8217;&#8230; I remember freaking out about how after chatting often with a friend in LA my recommended feeds included a lot of LA-related blogs. And there was a search engine based treasure hunt at my undergrad college, and a friend and I remember saying &#8220;Oh man, googling stuff for this contest is <em>so </em>going to affect our Reader recommendations&#8221;.</p>
<p>It was also where I was recommended tons of blogs on ML and NLP and IR, due to which I went to grad school where I did, and did my thesis in what I did.</p>
<p>Also fun was the &#8216;Share as a note in reader&#8217; bookmarklet. That way, I could share stuff from anywhere on the Internet with people who I knew would appreciate it.</p>
<p>Now it seems as if the Plus team wants to go and prove right that ex-Amazon Googler who said Google can&#8217;t do platforms well. Instead of providing services which can be used in a variety of ways to provide &#8216;just right&#8217; experiences for a variety of people, Plus is trying to do it right all by itself. And failing miserably at that. The reason for Twitter&#8217;s success is the sheer variety of ways you can tweet &#8211; from your browser, from your smartphone, from your not-so-smart phone using Snaptu, from your dumb phone via text, your tablet, your desktop&#8230;. and I just don&#8217;t see that happening with Plus yet.</p>
<p>Maybe I wouldn&#8217;t be so mad if all the folks I share with on Reader were on Plus, but actually, hardly anyone is. And I don&#8217;t check my Plus feed on a regular basis either. I wouldn&#8217;t mind going on Plus to just read what everyone&#8217;s sharing, but the user experience is so bad I wouldn&#8217;t want that.</p>
<p>Google should have learnt from when it integrated Reader with Buzz and a lot of people found that irksome and simply silenced others&#8217; Reader shares from their Buzz feed, that the Reader format doesn&#8217;t go well with the stream format.</p>
<p>There&#8217;s so much quite obviously broken with the product that you wonder if the folks who design and code this up actually use it as extensively as you do. Dogfooding is super-important in products like Google&#8217;s where there are a wide variety of users and user surveys can&#8217;t capture every single aspect.</p>
<p>But given that doing this to Google Reader seems just like when they cancelled <em>Arrested Development</em>, you begin to think they are probably aware of everything, and just don&#8217;t care about you the user and your needs anymore.</p>
<p>PS: Can anyone help me get the Google Plus Python API up and running on Google App Engine? I want to play with it, see what it does, and am not able to get it up and running.</p>
<p>PPS: Does a Greasemonkey script to make G+ more presentable sound like a good idea?</p>
<p>PPPS: Check out the folks at<a href="http://hivemined.org"> HiveMined</a>. They are building a replacement for Google Reader <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irjejune.wordpress.com/109/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irjejune.wordpress.com/109/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irjejune.wordpress.com/109/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irjejune.wordpress.com/109/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irjejune.wordpress.com/109/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irjejune.wordpress.com/109/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irjejune.wordpress.com/109/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irjejune.wordpress.com/109/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irjejune.wordpress.com/109/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irjejune.wordpress.com/109/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irjejune.wordpress.com/109/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irjejune.wordpress.com/109/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irjejune.wordpress.com/109/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irjejune.wordpress.com/109/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=109&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irjejune.wordpress.com/2011/11/01/rip-reader/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">wanderlust</media:title>
		</media:content>
	</item>
		<item>
		<title>Recommender Systems Wiki</title>
		<link>http://irjejune.wordpress.com/2011/02/16/recommender-systems-wiki/</link>
		<comments>http://irjejune.wordpress.com/2011/02/16/recommender-systems-wiki/#comments</comments>
		<pubDate>Wed, 16 Feb 2011 22:27:21 +0000</pubDate>
		<dc:creator>wanderlust</dc:creator>
				<category><![CDATA[collaborative filtering]]></category>
		<category><![CDATA[for novices]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[websites and tips]]></category>

		<guid isPermaLink="false">http://irjejune.wordpress.com/?p=105</guid>
		<description><![CDATA[Use and contribute and link to: http://www.recsyswiki.com/wiki/Main_Page Now should have one such for ML methods in NLP and my life will be great<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=105&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Use and contribute and link to:<a href="http://www.recsyswiki.com/wiki/Main_Page"> http://www.recsyswiki.com/wiki/Main_Page</a></p>
<p>Now should have one such for ML methods in NLP and my life will be great</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irjejune.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irjejune.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irjejune.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irjejune.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irjejune.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irjejune.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irjejune.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irjejune.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irjejune.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irjejune.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irjejune.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irjejune.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irjejune.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irjejune.wordpress.com/105/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=105&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irjejune.wordpress.com/2011/02/16/recommender-systems-wiki/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">wanderlust</media:title>
		</media:content>
	</item>
		<item>
		<title>Convex Optimization.</title>
		<link>http://irjejune.wordpress.com/2011/01/05/convex-optimization/</link>
		<comments>http://irjejune.wordpress.com/2011/01/05/convex-optimization/#comments</comments>
		<pubDate>Wed, 05 Jan 2011 02:02:20 +0000</pubDate>
		<dc:creator>wanderlust</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://irjejune.wordpress.com/?p=97</guid>
		<description><![CDATA[Course I&#8217;m taking. Need to brush up on basics before diving in. And I&#8217;ve got less than a day to do that. Anyone know a good crash course in linear algebra? Will be grateful. Thanks. &#160;<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=97&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="https://eee.uci.edu/wiki/index.php/CS_295_Convex_Optimization_(Winter_2011)">Course I&#8217;m taking</a>. Need to brush up on basics before diving in. And I&#8217;ve got less than a day to do that.</p>
<p>Anyone know a good crash course in linear algebra? Will be grateful. Thanks.</p>
<p>&nbsp;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irjejune.wordpress.com/97/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irjejune.wordpress.com/97/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irjejune.wordpress.com/97/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irjejune.wordpress.com/97/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irjejune.wordpress.com/97/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irjejune.wordpress.com/97/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irjejune.wordpress.com/97/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irjejune.wordpress.com/97/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irjejune.wordpress.com/97/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irjejune.wordpress.com/97/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irjejune.wordpress.com/97/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irjejune.wordpress.com/97/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irjejune.wordpress.com/97/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irjejune.wordpress.com/97/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=97&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irjejune.wordpress.com/2011/01/05/convex-optimization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">wanderlust</media:title>
		</media:content>
	</item>
		<item>
		<title>How to read a research paper</title>
		<link>http://irjejune.wordpress.com/2011/01/04/how-to-read-a-research-paper/</link>
		<comments>http://irjejune.wordpress.com/2011/01/04/how-to-read-a-research-paper/#comments</comments>
		<pubDate>Tue, 04 Jan 2011 00:00:48 +0000</pubDate>
		<dc:creator>wanderlust</dc:creator>
				<category><![CDATA[for novices]]></category>
		<category><![CDATA[paper]]></category>
		<category><![CDATA[reading]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[research paper]]></category>

		<guid isPermaLink="false">http://irjejune.wordpress.com/?p=93</guid>
		<description><![CDATA[If I&#8217;m completely in the groove, with a firm topic in mind, I find it relatively easier to read papers. However when I&#8217;m attempting to get started on something, or am reading a paper which, say, I have to summarize for a course, I lose my footing. I procrastinate, I become reluctant to start. I [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=93&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>If I&#8217;m completely in the groove, with a firm topic in mind, I find it relatively easier to read papers. However when I&#8217;m attempting to get started on something, or am reading a paper which, say, I have to summarize for a course, I lose my footing. I procrastinate, I become reluctant to start.</p>
<p>I decided I wanted out of this shite, and hence googled for &#8216;How To Read A Paper&#8217;. I found <a title="How to Read a Paper" href="http://docs.google.com/viewer?a=v&amp;q=cache:eOJomyS9xb8J:www.sigcomm.org/ccr/drupal/files/p83-keshavA.pdf+how+to+read+paper&amp;hl=en&amp;gl=us&amp;pid=bl&amp;srcid=ADGEESg3kBzrj6O63eTa1Q_IIowY4bXDnxKL0Es6GWj0P-3KXRoOtQ6MGM28IgtDJkoUUOJa9ZKE7RJgcR4v0looKJfPQeLjU8bERLOf1ptoEldtkoNjr7-1utms5PhHX5BU1nYSIHaR&amp;sig=AHIEtbRU0jj2T61LShekFUZvF-assDXkew&amp;pli=1">this paper</a> by someone from the University of Waterloo, and I suspect this will help out greatly.</p>
<p>Let me summarize it for you.</p>
<p>Essentially, given a research paper, you go over it in three passes.</p>
<p><strong>First Pass (5-10 minutes):<br />
</strong></p>
<ul>
<li>Read the Title, Abstract and Introduction.</li>
<li>Read the section/subsection headings and ignore all else</li>
<li>Read the conclusions</li>
<li>Glance over the references and tick off those you&#8217;ve already read.</li>
<li>By the end of this pass, you should be able to answer 5 C&#8217;s about the paper:
<ul>
<li>Category</li>
<li>Context (What papers are related? What bases are used to analyze the problem?)</li>
<li>Correctness (Are the assumptions valid?)</li>
<li>Contributions of the paper</li>
<li>Clarity (Is the paper well-written?)</li>
</ul>
</li>
</ul>
<p><strong>Second Pass (1 hour): </strong></p>
<ul>
<li>Read the paper more carefully, while ignoring details like proofs</li>
<li>Jot down points, make comments in the margins</li>
<li>Look carefully at all figures, especially graphs</li>
<li>Mark unread references for further reading (for background information).</li>
<li>Summarize main themes of the paper to someone else.</li>
<li>You mightn&#8217;t understand the paper completel. Jot down the points you don&#8217;t understand, and why.</li>
<li>Now, either
<ul>
<li>Decide not to read the paper</li>
<li>Return later to the paper after reading background material</li>
<li>Or persevere on to the third pass</li>
</ul>
</li>
</ul>
<p><strong>Third Pass (4-5 hours):</strong></p>
<ul>
<li>You need to virtually reimplement the paper. Recreate the paper, its reasonings</li>
<li>Compare your recreation with the original</li>
<li>Think of how you would present the ideas, and compare with how the ideas are presented.</li>
<li>Here, you also jot down your ideas for future work</li>
<li>Reconstruct the entire structure of the paper from memory.</li>
<li>Now you should be able to identify the strong and weak points of the paper,  the implicit assumptions and the issues there might be with experimental or analytical techniques, as well as missing citational information.</li>
</ul>
<p>That&#8217;s all.</p>
<p>Additionally, I think as a form of accountability (which I so need at the moment), I will blog every single paper I read, in accordance with the above structure.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irjejune.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irjejune.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irjejune.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irjejune.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irjejune.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irjejune.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irjejune.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irjejune.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irjejune.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irjejune.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irjejune.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irjejune.wordpress.com/93/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irjejune.wordpress.com/93/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irjejune.wordpress.com/93/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=93&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irjejune.wordpress.com/2011/01/04/how-to-read-a-research-paper/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">wanderlust</media:title>
		</media:content>
	</item>
		<item>
		<title>Transfer Learning etc</title>
		<link>http://irjejune.wordpress.com/2010/07/14/transfer-learning-etc/</link>
		<comments>http://irjejune.wordpress.com/2010/07/14/transfer-learning-etc/#comments</comments>
		<pubDate>Wed, 14 Jul 2010 19:36:17 +0000</pubDate>
		<dc:creator>wanderlust</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://irjejune.wordpress.com/?p=88</guid>
		<description><![CDATA[I think this&#8217;d work best if I just updated my daily progress here than try giving comprehensive views of what I&#8217;m doing. So you have data coming in that needs to be classified. Apparently the accuracy of most classifiers is abysmally low. We need to build a better classifier. I took a month&#8217;s worth of [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=88&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I think this&#8217;d work best if I just updated my daily progress here than try giving comprehensive views of what I&#8217;m doing.</p>
<p>So you have data coming in that needs to be classified. Apparently the accuracy of most classifiers is abysmally low. We need to build a better classifier.</p>
<p>I took a month&#8217;s worth of data, and applied all possible classifiers on it, cross-validated it. Accuracy was roughly in the area of 85-90%. While that&#8217;s not excellent, it&#8217;s not bad, given the small amount of training data.</p>
<p>So what&#8217;s this low accuracy everyone&#8217;s talking of?</p>
<p>Turns out, the new data coming in turns out to be very different from the data you train on. You&#8217;ll train on June&#8217;s data, but July&#8217;s data&#8217;s going to be much different. The same words will end up reappearing in different class labels. Hence the low accuracy.</p>
<p>Also, you have not just one classifier, but many. It turns out that when you train many classifiers on subsets of the data, they perform better than training one classifier on the entire data.</p>
<p>Learning will have to keep evolving. I first thought of Active Learning in this context, where you&#8217;ll expect the user to label the stuff you are not sure about. But then, what if you confidently label stuff that is patently wrong?</p>
<p>The many classifiers bit of the problem helps us visualize the training data in a different way &#8211; Each category &#8211; class label &#8211; has many sub-categories. Now each classifier is trained on a month&#8217;s worth of data. It turns out that each month can be likened to a sub-category. You train on one sub-category, and test on another sub-category, and expect it to return the same class label. That&#8217;s like training a classifier on data that contains hockey-related terms for the Sports label, and then expecting it to recognize data that contains cricket-related terms as Sports too.</p>
<p>Sounds familiar?</p>
<p>This would be transfer learning/domain adaptation &#8211; you learn on one distribution, and test on a different distribution. The class labels and features however, remain the same.</p>
<p>This would more specifically be Transductive Transfer Learning &#8211; you have a training set, from distribution D1, and a test set, from distribution D2. You have this unlabelled test data available during training, and you somehow use this to tweak the model you&#8217;ll learn from the training data.</p>
<p>Many ways exist to do this. You can apply Expectation-Maximization on a Naive Bayes classifier trained on the training data, to maximize the expectation of the test data, while still doing well on the training data.  You can train an SVM, assign pseudo-labels to the test data, add those to the next iteration of training, until you get reasonable confidence measures on the test data, while still doing well on the training data.</p>
<p>All these approaches to Transductive Transfer Learning are fine. They assume that you have test data available during training time.</p>
<p>We have a slight twist on that. It might be too expensive to store the training data. Or you might have privacy concerns and hide your data, but just expose a classifier you&#8217;ve built on top of it.  So, essentially, all you have is a classifier, and you need to tweak that when training data is available.</p>
<p>Let&#8217;s complicate it further. You have a set of classifiers. You can pick and choose classifiers you want to combine based on some criteria on the test data, create a superclassifier, and then try tweaking that based on the test data.</p>
<p>For starters, check <a title="Transferring Naive Bayes Classifiers for Text Classification" href="http://academic.research.microsoft.com/Paper/4229810.aspx">this paper by Dai</a> out. Here, you have access to the training data. What if you don&#8217;t? Can you then tweak the classifier without knowing the data underlying it?</p>
<p>Let&#8217;s assume it&#8217;s possible.</p>
<p>Then, on some criteria you pick, you choose a few classifiers from the many that you have. You merge them. And then tweak that superclassifier. Like for example, your test data contains data related to hockey and Indian films [Labels are Sport and Film]. You have one classifier on cricket and Indian films, one on hockey and Persian films, another on football and Spanish films. So C1 and C2 are the classifiers closest to your data. You combine C1 and C2 such that you get a classifier that&#8217;d be equivalent to one that&#8217;s trained on hockey, persian films, cricket and Indian films. Optimal classifier. And then tweak it.</p>
<p>That&#8217;s the architecture.</p>
<p>The questions we seek to answer are the ones regarding How to choose which classifiers to merge. And How to tweak the classifier given test data; and whether we&#8217;d need any extra data.</p>
<p>And&#8230; a more fundamental question.. given that the test data&#8217;s going to be from a distribution none of your ensemble would have seen before, is it worth the while to merge classifiers? Or we can vary the KL-divergence between the training and test distributions and see how having an ensemble helps.</p>
<p>Nascent ideas so far.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irjejune.wordpress.com/88/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irjejune.wordpress.com/88/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irjejune.wordpress.com/88/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irjejune.wordpress.com/88/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irjejune.wordpress.com/88/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irjejune.wordpress.com/88/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irjejune.wordpress.com/88/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irjejune.wordpress.com/88/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irjejune.wordpress.com/88/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irjejune.wordpress.com/88/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irjejune.wordpress.com/88/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irjejune.wordpress.com/88/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irjejune.wordpress.com/88/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irjejune.wordpress.com/88/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=88&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irjejune.wordpress.com/2010/07/14/transfer-learning-etc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">wanderlust</media:title>
		</media:content>
	</item>
		<item>
		<title>Comeback. Again.</title>
		<link>http://irjejune.wordpress.com/2010/07/11/comeback-again/</link>
		<comments>http://irjejune.wordpress.com/2010/07/11/comeback-again/#comments</comments>
		<pubDate>Sun, 11 Jul 2010 16:57:32 +0000</pubDate>
		<dc:creator>wanderlust</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://irjejune.wordpress.com/?p=85</guid>
		<description><![CDATA[So this post will slightly deviate from the general tone of this blog. It is a tad more personal. I&#8217;ve just come out of a phase of unstructured time where I really really wanted to fix short-term goals for myself, and failed miserably. At the end of it all, I watched Julie And Julia, where [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=85&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>So this post will slightly deviate from the general tone of this blog. It is a tad more personal.</p>
<p>I&#8217;ve just come out of a phase of unstructured time where I really really wanted to fix short-term goals for myself, and failed miserably. At the end of it all, I watched <em>Julie And Julia</em>, where the lead protagonist uses her blog to set short-term goals for herself, while also using it to check off each goal achievement. I want to emulate that.</p>
<p>I am now interning at a research lab in the industry. I am deciding on a problem statement that will mostly involve some form of transductive transfer learning. I have a great work environment, and an awesome mentor who helps with the short-term goal setting.</p>
<p>In such a setting, I feel I should probably maintain a daily log of how things are progressing, so that I can refer back to these notes later when I want to know how to set goals and progress with research. I have a controlled environment now, and it&#8217;d be interesting as well as helpful to document my time here such that I can replicate it elsewhere.</p>
<p>Most of my work will involve previous work and data that&#8217;s in the public domain, so I don&#8217;t think it&#8217;ll be a breach of any contract or NDA to talk about them on a public forum. Though, I might choose to make the text unsearchable and hence make the posts hidden, while keeping the password public. Not many know of the existence of this blog, and this I guess would be a sane decision. I&#8217;ll anyway have to check with my superiors #TODO.</p>
<p>Alrighty. Next post possibly coming in another hour.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irjejune.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irjejune.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irjejune.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irjejune.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irjejune.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irjejune.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irjejune.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irjejune.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irjejune.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irjejune.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irjejune.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irjejune.wordpress.com/85/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irjejune.wordpress.com/85/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irjejune.wordpress.com/85/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=85&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irjejune.wordpress.com/2010/07/11/comeback-again/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">wanderlust</media:title>
		</media:content>
	</item>
		<item>
		<title>Reading PDFs &#8211; Superpain.</title>
		<link>http://irjejune.wordpress.com/2010/04/12/reading-pdfs-superpain/</link>
		<comments>http://irjejune.wordpress.com/2010/04/12/reading-pdfs-superpain/#comments</comments>
		<pubDate>Mon, 12 Apr 2010 17:01:58 +0000</pubDate>
		<dc:creator>wanderlust</dc:creator>
				<category><![CDATA[softwares and tools]]></category>
		<category><![CDATA[adobe]]></category>
		<category><![CDATA[doc]]></category>
		<category><![CDATA[foxit]]></category>
		<category><![CDATA[LaTeX]]></category>
		<category><![CDATA[pdf]]></category>
		<category><![CDATA[reading]]></category>

		<guid isPermaLink="false">http://irjejune.wordpress.com/?p=80</guid>
		<description><![CDATA[I face this huge problem of late. I have a lot of documents to be read which are in the .pdf format. Fine, it&#8217;s a universal format, yada yada. It however is not without its pitfalls. Firstly, .PDF is more for the visualization than for the content. By this, I mean once you write something [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=80&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I face this huge problem of late.</p>
<p>I have a lot of documents to be read which are in the .pdf format. Fine, it&#8217;s a universal format, yada yada.</p>
<p>It however is not without its pitfalls.</p>
<p>Firstly, .PDF is more for the visualization than for the content. By this, I mean once you write something to a .pdf file, it&#8217;s like you&#8217;ve written it to paper. It&#8217;s a format for printing out stuff more than anything else. So while there is plenty of software to <em>create</em> .pdf files, there are very few to actually edit them.</p>
<p>So why do I want to edit .pdf files? Well, most academic papers are written with 11pt or 12pt font which doesn&#8217;t make for easy reading. And I abhor the large spaces wasted in the margins. I tried Foxit PDF editor, but it turns out that it lets you correct typos, add images, delete images, and add and delete pages, but that&#8217;s about it. It won&#8217;t let you modify a file in the true sense of the word.</p>
<p>So convert it to Word format. I found an online PDF to DOC converter, and did so.</p>
<p>Job done, right?</p>
<p>No.</p>
<p>You change the font size, it messes up the equations and tables and general layout. You decrease the size of the margins, same issue.</p>
<p>Oh dear god, I really wish I had the original LaTeX source that I could just modify small bits of it, or fit the same text and images into a layout better suited for reading. But trying to do that from a .pdf is like trying to get a live cow from roast beef.</p>
<p>It&#8217;s about time proceedings of NIPS and CHI got published in Kindle format, don&#8217;t you think? Papers are written to be read, right?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irjejune.wordpress.com/80/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irjejune.wordpress.com/80/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irjejune.wordpress.com/80/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irjejune.wordpress.com/80/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irjejune.wordpress.com/80/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irjejune.wordpress.com/80/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irjejune.wordpress.com/80/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irjejune.wordpress.com/80/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irjejune.wordpress.com/80/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irjejune.wordpress.com/80/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irjejune.wordpress.com/80/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irjejune.wordpress.com/80/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irjejune.wordpress.com/80/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irjejune.wordpress.com/80/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=80&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irjejune.wordpress.com/2010/04/12/reading-pdfs-superpain/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">wanderlust</media:title>
		</media:content>
	</item>
		<item>
		<title>PyBrain</title>
		<link>http://irjejune.wordpress.com/2010/03/07/pybrain/</link>
		<comments>http://irjejune.wordpress.com/2010/03/07/pybrain/#comments</comments>
		<pubDate>Sun, 07 Mar 2010 15:46:17 +0000</pubDate>
		<dc:creator>wanderlust</dc:creator>
				<category><![CDATA[machine learning]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[softwares and tools]]></category>
		<category><![CDATA[pybrain]]></category>

		<guid isPermaLink="false">http://irjejune.wordpress.com/?p=78</guid>
		<description><![CDATA[Machine Learning Library for Python. Yay. Here.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=78&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Machine Learning Library for Python. Yay. <a href="http://pybrain.org/pages/home">Here</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irjejune.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irjejune.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irjejune.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irjejune.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irjejune.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irjejune.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irjejune.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irjejune.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irjejune.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irjejune.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irjejune.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irjejune.wordpress.com/78/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irjejune.wordpress.com/78/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irjejune.wordpress.com/78/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=78&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irjejune.wordpress.com/2010/03/07/pybrain/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">wanderlust</media:title>
		</media:content>
	</item>
		<item>
		<title>Learning to Link With Wikipedia &#8211; II</title>
		<link>http://irjejune.wordpress.com/2010/02/16/learning-to-link-with-wikipedia-ii/</link>
		<comments>http://irjejune.wordpress.com/2010/02/16/learning-to-link-with-wikipedia-ii/#comments</comments>
		<pubDate>Tue, 16 Feb 2010 10:04:50 +0000</pubDate>
		<dc:creator>wanderlust</dc:creator>
				<category><![CDATA[machine learning]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[preprocessing]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://irjejune.wordpress.com/?p=76</guid>
		<description><![CDATA[I&#8217;m done with most of pre-processing. Feel free to tell me how crappy my code is. Just be polite, otherwise I&#8217;ll probably cry. This takes ages to write to disk. That&#8217;s the bottleneck. It&#8217;s a sort of hackjob, though I must say I used to write worse code. And you can use this code if [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=76&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m done with most of pre-processing. Feel free to tell me how crappy  my code is. Just be polite, otherwise I&#8217;ll probably cry. This takes ages to write to disk. That&#8217;s the bottleneck. It&#8217;s a sort of hackjob, though I must say I used to write worse code.</p>
<p>And you can use this code if you like.</p>
<p><pre class="brush: python;">
import xml.dom.minidom
import re

class xmlMine:
 stopWordDict = {'':1} #dictionary of stopwords

 titleArticleDict = {} #hashmap of titles mapped to articles.

 def xmlMine(self):
 print &quot;instantiated&quot;

 def getStopwords(self,stopWordFile):
 #loads stopwords from file to memory
 stopWordObj = open(stopWordFile)
 stopWordLines = stopWordObj.readlines()
 for stopWord in stopWordLines:
 stopWord = stopWord.replace(&quot;\n&quot;,&quot;&quot;)
 self.stopWordDict[stopWord] = 1
 #print self.stopWordDict

 def cleanTitle(self,title):
 #removes non-ascii characters from title
 return  &quot;&quot;.join([x for x in title if ord(x) &lt; 128])

 def extractLinksFromText(self,textContent):
 textContent = &quot;]] &quot;+textContent
 textContent = textContent.replace(&quot;\n&quot;,&quot; &quot;) #remove linebreaks
 textContent = textContent.replace(&quot;'&quot;,&quot;&quot;) #remove quotes. they mess up the regexes.

 #remove regions in wiki pages where looking for links is meaningless
 refs = re.compile(&quot;==[\s]*References[\s]*==.+&quot;)
 textContent = refs.sub(&quot; &quot;,textContent)

 refs = re.compile(&quot;==[\s]*See Also[\s]*==.+&quot;)
 textContent = refs.sub(&quot; &quot;,textContent)

 refs = re.compile(&quot;==[\s]*External links[\s]*==.+&quot;)
 textContent = refs.sub(&quot; &quot;,textContent)

 refs = re.compile(&quot;==[\s]*Sources[\s]*==.+&quot;)
 textContent = refs.sub(&quot; &quot;,textContent)

 refs = re.compile(&quot;==[\s]*Notes[\s]*==.+&quot;)
 textContent = refs.sub(&quot; &quot;,textContent)

 refs = re.compile(&quot;==[\s]*Notes and references[\s]*==.+&quot;)
 textContent = refs.sub(&quot; &quot;,textContent)

 refs = re.compile(&quot;==[\s]*Gallery[\s]*==.+&quot;)
 textContent = refs.sub(&quot; &quot;,textContent)

 refs = re.compile(&quot;\{\|[\s]*class=\&quot;wikitable\&quot;.+?\|\}&quot;)
 textContent = refs.sub(&quot; &quot;,textContent)

 textContent = textContent + &quot;[[&quot;

 #remove stuff that's not enclosed in [[]]
 brackets = re.compile(&quot;\]\].*?\[\[&quot;)
 textContent = brackets.sub(&quot;]] [[&quot;,textContent)
 wordList = textContent.split(&quot;]] [[&quot;) #and store only the list of words sans the brackets
 #print wordList

 newWordList = []

 for word in wordList:
 originalWord = deepcopy(word)
 word = word.lower() #convert to lowercase
 #remove part before |
 altText = re.compile(&quot;.*?\|&quot;)
 word = altText.sub(&quot;&quot;,word)
 #replace number, punctuation by space
 numbr = re.compile(&quot;\d&quot;) #number
 word = numbr.sub(&quot; &quot;,word)
 punct = re.compile(&quot;\W&quot;) #punctuation
 word = punct.sub(&quot; &quot;,word)

 #if space added, split by space. replace by two/more words
 newWords = word.split(&quot; &quot;)

 for newWord in newWords:
 #remove trailing s after consonant
 trailingS = re.compile(&quot;^(.*[bcdfghjklmnpqrtvwxyz])(s)$&quot;)
 if trailingS.match(newWord) is not None:
 lastS = re.compile(&quot;s$&quot;)
 newWord = lastS.sub(&quot;&quot;,newWord)
 #print newWord
 if newWord not in self.stopWordDict: #remove stopwords
 if len(newWord)&gt;2: #no point of too-short words.
 newWordList.append(newWord)
 return newWordList

 def extractTextFromXml(self,xmlFileName):
 # extracts the &lt;title&gt; and &lt;text&gt; fields from the xml files
 # processes both.
 xmlFile = xml.dom.minidom.parse(xmlFileName)
 root = xmlFile.getElementsByTagName(&quot;mediawiki&quot;);
 for mediaWiki in root:
 pageList = mediaWiki.getElementsByTagName(&quot;page&quot;)
 for page in pageList:
 titleWords = &quot;&quot;
 text = []
 textNodes = page.getElementsByTagName(&quot;text&quot;)
 for textNode in textNodes:
 if textNode.childNodes[0].nodeType == textNode.TEXT_NODE:
 #print textNode.childNodes[0].data
 text = self.extractLinksFromText(textNode.childNodes[0].data)
 #self.extractLinksFromText(repr(&quot;[[link0]] blah [[link1]] nolink [[link2]] nolink [[link3]]&quot;))
 titleNodes = page.getElementsByTagName(&quot;title&quot;)
 for titleNode in titleNodes:
 if titleNode.childNodes[0].nodeType == titleNode.TEXT_NODE:
 #print titleNode.childNodes[0].data.encode('utf-8')
 titleWords =  self.cleanTitle(titleNode.childNodes[0].data)
 #print titleWords
 self.titleArticleDict[titleWords] = text

def main():
 a = xmlMine()
 a.getStopwords(&quot;stopwords.txt&quot;)
 a.extractTextFromXml(&quot;Wikipedia-20090505185206.xml&quot;)
 opFile = open(&quot;links.txt&quot;,&quot;w&quot;)
 string = &quot;&quot;
 for article in a.titleArticleDict.keys():
 string = string + str(article)
 string = string + &quot;:&quot;
 linkList = a.titleArticleDict[article]
 for link in linkList:
 string = string + str(link) + &quot;,&quot;

 lastComma = re.compile(&quot;,$&quot;)
 string = lastComma.sub(&quot;&quot;,string)
 string = string + &quot;\n&quot;
 opFile.write(string.encode('utf-8'))

if __name__ == &quot;__main__&quot;:
 main()
</pre></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irjejune.wordpress.com/76/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irjejune.wordpress.com/76/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irjejune.wordpress.com/76/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irjejune.wordpress.com/76/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irjejune.wordpress.com/76/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irjejune.wordpress.com/76/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irjejune.wordpress.com/76/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irjejune.wordpress.com/76/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irjejune.wordpress.com/76/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irjejune.wordpress.com/76/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irjejune.wordpress.com/76/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irjejune.wordpress.com/76/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irjejune.wordpress.com/76/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irjejune.wordpress.com/76/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irjejune.wordpress.com&amp;blog=6066267&amp;post=76&amp;subd=irjejune&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irjejune.wordpress.com/2010/02/16/learning-to-link-with-wikipedia-ii/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">wanderlust</media:title>
		</media:content>
	</item>
	</channel>
</rss>
