BlogGalleryAbout meContact
Jaganadh's bookshelf: read

Python Text Processing with NTLK 2.0 CookbookPython 2.6 Text Processing Beginners Guide

More of Jaganadh's books »
Jaganadh Gopinadhan's  book recommendations, reviews, quotes, book clubs, book trivia, book lists
Ubuntu GNU/Linux I am nerdier than 94% of all people. Are you a nerd? Click here to take the Nerd Test, get nerdy images and jokes, and write on the nerd forum! Python

Bangalore

Hackett and Bankwell Issue 2 is available now

"Hackett and Bankwell is an educational comic/cartoon manual designed to teach the finer points of the GNU Linux platform using Ubuntu."

Hackett and Bankwell the linux comic issue 2 is available now from
http://www.intarwebz.com/wp-content/uploads/2009/03/hackett_and_bankwell_issue_2.pdf
This introduces CLI (Command Line Interface).
Those who missed the Issue-1 please check it out from
http://www.intarwebz.com/hackett-and-bankwell-1-free-pdf-ebook-version-11/

Happy Hacking !!!

Happy Onam

 Permalink

On English to Indian Language MT - II

Yesterday I contacted Golam the developer of "Anubadok" English to Bengali MT system. My previous post was about to start working on the system. I was talking about generating the dictionary for the MT system. Golam suggested me to look in to some more modules.
" Apart from bdict.db, you should also have a look at "lib/Anubadok/BnTable.pm"
and "lib/Anubadok/BnSonshi.pm" modules. These are Bengali (Bn) specific files where you need to change stuffs. "

Ya really we have to look in to those modules. I just opened those two files. BnTable.pm is  for handling grammar and BnSonshi.pm is for Sandhi. For tuning this to some other Indian Languages good knowledge on the grammar system is required. I am studying the code now. Will post more details soon.

Happy Hacking !!!!!!!!

Related Entries:
On development of an Open Source Machine Translation System for English to Indian Languages.
WordNet sense similarity with NLTK: some basics
Some thoughts on Tweeting
NLTK and Indian Language corpus processing Part-III
NLTK and Indian Language corpus processing - Part-II
 Permalink

On development of an Open Source Machine Translation System for English to Indian Languages.

On development of an Open Source Machine Translation System for English to Indian Languages.


As a continuation of discussions held in fsug-tvm and smc-discuss mailing I am writing the post. At the end of discussions I promised that I will prepare a note on how to start working in English to Indian Language Machine Translation(MT). I am not going to the history or theory of Machine Translation in this blog post.

There is a Free and Open Source Machine Translation system for English to Bengali translation called "Anubadok". It is available for download as well as online use. The system id developed by Golam Mortuza Hossain. It is a small system with handful of rules and dictionary and other resources. I don't know to read Bengala so I am not able to judge the capability of this system. Any how this system can be adopted for developing English to other Indian Language Machine Translation system.

The system is developed in Perl programming language. The developer of this system is a Post Doctoral Fellow in University of New Brunswick, Canada. A good documentation is available for this system. The project is hosted in the Sourceforgent.net as well as in Bengalinux site.

Let's see how to adopt this system for other Indian languages.

First go through the documentation of the system.
The current version can be downloaded from svn with the following command.
    svn checkout https://anubadok.svn.sourceforge.net/svnroot/anubadok/trunk anubadok

The go to the /anubadok/data/ directory. There you can find bdict.db file. Create a backup copy of the file. Now you can replace the Bengali words with your language words and we can run it.  So we will get a feel how this will working our language. While writing this post I replaced some two words and tested. It is working. Once our dictionary work is over we can run our system.(Dont hink that it will give extact result. Because we are replacing meaning only. We have to add riles for generating our language sentences. I will write on this topic after some times.) For the initial testing we can use the sentences available in /anubadok/TestSuites/ directory of the system. First let's workout this much thins and see what will be the output. Based on our result I think we can proceed.

You can see the performance status of this system given in a blog.
Status Table (Version: Anubadok-0.2.0 )

                       Declar.     Imper.     Interro.     Exclam.
Simple             W                   W           W                M
Compound     M                 M                 M            M
Complex          N                  N               N           N
Compound      N                    N            N              N
– Complex

W: Well implemented
M: Moderately implemented
N: Not/Not-well implemented


Happy Hacking !!!!!!!!!

I am exploring the system more I will post on the topic soon.

Related Entries:
On English to Indian Language MT - II
NLTK and Indian Language corpus processing Part-III
NLTK and Indian Language corpus processing - Part-II
New article on Machine Translation
Comments (1)  Permalink

baner flight and toilet in GNU/Linux

Do you like to show you name in colourful banner mode, when you are starting the terminal.

If so read this.

http://www.cyberciti.biz/faq/create-large-colorful-text-banner-on-screen/

and this

http://www.cyberciti.biz/faq/banner-print-large-banner-on-printer/
       

 Permalink

Lesser Known Linux Commands

In Raman's blog I found some Linux command description.

Readit from his blog

uniq

http://ramanchennai.wordpress.com/2009/08/13/lesser-known-linux-commands-uniq/

od

http://ramanchennai.wordpress.com/2009/08/09/lesser-known-linux-commands-od/


strings

http://ramanchennai.wordpress.com/2009/08/06/lesser-known-linux-commands-strings/

file

http://ramanchennai.wordpress.com/2009/08/01/lesser-known-linux-commands-file/

type

http://ramanchennai.wordpress.com/2009/07/28/lesser-known-linux-commands-type/

Related Entries:
BioPuppy2.0 released
BioPuppy Linux for Bio Informatics
 Permalink
1-5/5