Category Archives: datasets
Someone who works for the Govt of India told me about the Indian Gazette, which published a summary of all the activities of the government in English and Hindi. And there are state gazettes as well, which I assumed did the same. I found that the central government puts out the gazette with the same content in both English and Hindi. As perfect a sentence-alignment as you can expect.
Unfortunately, it doesn’t seem like the Karnataka government does that. They publish everything in only Kannada. The Kerala government publishes in only English. And the Tamil Nadu government publishes some bullet points in English and some in Tamil.
I’d not checked on this earlier, unfortunately. Now I’m back to square one, looking for a dataset for Kannada machine translation. Know of any?