BlogGalleryAbout meContact
My Resume
WebDOCPDF
RTFODTTXT
powered by emurse
Use OpenOffice.org Spread Firefox Affiliate Button Ubuntu GNU/Linux Perl Python SMC

Yet another Language death

I saw a shocking news in today's Hindu news paper " Andamanese tribes, languages die".

One more language vanishes the Great Andamanese languages .

As per the UNESCO World’s Languages in Danger thre are 5 native speakers of the language in 2007 . But the Hindu news says that the last speaker also passed away. The language too. Fortunately some documentation is there says the news.

Related Entries:
index
FOFFMEET@NITC
Google launches multilingual dictionary service
New article on Machine Translation
Plotting wave form and spectrogram the pure Python way
Comments (0)  Permalink

FOFFMEET@NITC

Comments (0)  Permalink

Again Python programming in Malayalam

Today I tried Object Oriented Programming in Python (Python3) with Malayalam class names and variable names.

See the code. It works very well with Python3 interpreter.

    class പക്ഷി:

        def __init__(self):
            """
            ക്ലാസ് ഇനിഷ്യലൈസേഷന്‍
            """
            self.വിവിധ = ['കാക്ക','പ്രാവ്','കുരുവി','തത്ത','മൈന','പരുന്ത്','മൂങ്ങ']

        def പറക്കുക(self, ഇനം):
            """
            പറക്കുമോന്ന് നോക്കാല്ലോ!!!!!!!!!!!
            """
            if ഇനം in self.വിവിധ:
                print("%s പറക്കുന്ന പക്ഷിയാണ്" % ഇനം)
            else:
                print("എനിക്കറിയാമ്മേലേ!!!!!!")

    if __name__ == "__main__":
        സൂചകം = പക്ഷി()
        പറവ = "കാക്ക"
        മൃഗം = "ആന"
        സൂചകം.പറക്കുക(പറവ)
        സൂചകം.പറക്കുക(മൃഗം)


Use Python3 interpreter to run the code !!!!


Happy Hacking !!!!!!!

Related Entries:
Python3 ZWJ and Malayalam; some doubts
Python3 is wonderful
Installing PyLucene 3.x in GNU\Linux
New book in 'Head First' series with python
Graphical works with NLTK
Comments (0)  Permalink

SIGAI Workshop on Emerging Research Trends in AI

The Special Interest Group in Artificial Intelligence (SIGAI) of Computer Society of India (CSI) is conducting an one day workshop on AI at C-DAC Mumbai.

For more details visit - http://sigai.cdacmumbai.in/

Happy hacking !!!!!!!!!

Comments (0)  Permalink

Playing with TIMIT corpus in NLTK - Part - I

The corpus collection in NLTK contains TIMIT Corpus Sample. I just explored the TMIT corpus and TimitCorpusReader function. If you are interested to know more about TIMIT please check http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1.

Some description available in NLTK code is given below.
"This corpus contains selected portion of the TIMIT corpus.

 - 16 speakers from 8 dialect regions
 - 1 male and 1 female from each dialect region
 - total 130 sentences (10 sentences per speaker.  Note that some
   sentences are shared among other speakers, especially sa1 and sa2
   are spoken by all speakers.)
 - total 160 recording of sentences (10 recordings per speaker)
 - audio format: NIST Sphere, single channel, 16kHz sampling,
  16 bit sample, PCM encoding"

If you explore the TMIT directory in nltk data you will find the following items in it

allfilelist.txt   allsenlist.txt  dr2-faem0  dr4-falr0  dr6-fapb0  dr8-fbcg1     prompts.txt  sentences.pos   spkrsent.txt  wrdalign.timit
allphonedur.txt   allsentime.txt  dr2-marc0  dr4-maeb0  dr6-mbma1  dr8-mbcg0     README       sentences.ppp   testset.doc
allphonelist.txt  dr1-fvmh0       dr3-falk0  dr5-ftlg0  dr7-fblv0  l             readme.doc   sentences.tags  timitdic.doc
allphonetime.txt  dr1-mcpm0       dr3-madc0  dr5-mbgt0  dr7-madd0  phoncode.doc  sentences    spkrinfo.txt    timitdic.txt

The items having 'dr' in the name are directories which contains spoken corpora files. In each directory some .wav files and associated transcriptions are located. We can acces and explore these files with the corpus api of NLTK.

Let's see how to explore the data.

Importing the timit corpus from NLTK

    >>> from nltk.corpus import timit
Now the corpus is imported

To view the fileid in the corpus

    >>> timit.fileids()

It will print a long list like
    ......................
    'dr8-mbcg0/sx417.wav',
     'dr8-mbcg0/sx417.wrd',
     'dr8-mbcg0/sx57.phn',
     'dr8-mbcg0/sx57.txt',
     'dr8-mbcg0/sx57.wav',
     'dr8-mbcg0/sx57.wrd',
     'spkrinfo.txt',
     'timitdic.txt']

You might have noticed that there is  .wav file, .wrd file, .phn file and .txt files are there. The .wav file contain utterances, .wrd file contains time marked word file, .phn file contains time marked phone list ..... For each utterance file there will be a .ph, .wrd and .txt file.

From the file ids we can get list of utterance in the NLTK TIMIT corpus.

    >>> timit.utteranceids()

It will print a long list like
    .....................
     'dr8-mbcg0/sa1',
     'dr8-mbcg0/sa2',
     'dr8-mbcg0/si2217',
     'dr8-mbcg0/si486',
     'dr8-mbcg0/si957',
     'dr8-mbcg0/sx147',
     'dr8-mbcg0/sx237',
     'dr8-mbcg0/sx327',
     'dr8-mbcg0/sx417',
     'dr8-mbcg0/sx57']

We can store the utterance ids to a variable for future use.

    >>> utid = timit.utteranceids()

Now the list 'utid' contains all the utterance ids in the TIMIT corpora. The utterance id is in a specified pattern. This pattern helps the corpus reader module to identify information regarding the utterances. Letus see how this is happening.

To get the speaker id from an utterance id. Let us take the first utterance id from the 'utid' list.

    >>> timit.spkrid(utid[0])
    'dr1-fvmh0'
For future use lets store the id in a variable

    >>> sid = timit.spkrid(utid[0])

For the selected utterance we can get the sentence id also.

    >>> timit.sentid(utid[0])
    'sa1'
'sa1' is the sentence id for the given utterence.

    >>> sentid = timit.sentid(utid[0])

Now I am storing the sentence id to a variable for future use.


I think from the above example you might have got an idea about the file name pattern of the timit corpus in NLTK.

Now we can try to get a list of utterance with an utterance is and sentence id.

    >>> timit.utterance(utid, sentid)

Now you will get a long list like this
"['dr1-fvmh0/sa1', 'dr1-fvmh0/sa2', 'dr1-fvmh0/si1466', 'dr1-fvmh0/si2096', 'dr1-fvmh0/si836', 'dr1-fvmh0/sx116', 'dr1-fvmh0/sx206', 'dr1-fvmh0/sx26', 'dr1-fvmh0/sx296',
...................

Now I am going to store the list of utterances to a list.

    >>> utt = timit.utterance(utid, sentid)

We can access the utterance based on the speaker id .

    >>> sputid = timit.spkrutteranceids(sid)

It will give all the utterance associated with the id 'sid'.
Those ids are
    ['dr1-fvmh0/sa1',
     'dr1-fvmh0/sa2',
     'dr1-fvmh0/si1466',
     'dr1-fvmh0/si2096',
     'dr1-fvmh0/si836',
     'dr1-fvmh0/sx116',
     'dr1-fvmh0/sx206',
     'dr1-fvmh0/sx26',
     'dr1-fvmh0/sx296',
     'dr1-fvmh0/sx386']

I stored all he ids to a list 'sputid'.

Let's try to get details about the speaker having the id 'dr1-fvmh0'. This id is stored in the variable 'sid' earlier.

    >>> timit.spkrinfo(sid)
    SpeakerInfo(id='VMH0', sex='F', dr='1', use='TRN', recdate='03/11/86', birthdate='01/08/60', ht='5\'05"', race='WHT', edu='BS', comments='BEST NEW ENGLAND ACCENT SO FAR')

Now it is giving the details about the speaker such as gender, birth date, and race etc.......

Continued ..............

Happy Hacking !!!!!!!!1

Related Entries:
NLTK new version released .
Graphical works with NLTK
NLTK and Indian Language corpus processing Part-III
NLTK and Indian Language corpus processing - Part-II
NLTK and Indian Language corpus processing Part - I
 Permalink

Installing PyLucene 3.x in GNU\Linux

Lucene
=======
"Lucene is a high performance, scalable Information Retrieval (IR) library. It lets you add indexing and searching capabilities to your applications. Lucene is a
mature, free, open-source project implemented in Java; it’s a member of the popular Apache Jakarta family of projects, licensed under the liberal Apache Software License.
"

PyLucene
========
" PyLucene is a Python extension for accessing Java Lucene. Its goal is to allow you to use Lucene's text indexing and searching capabilities from Python. It is API compatible with the latest version of Java Lucene, version 2.9.0 as of October 13th, 2009."


I downloaded PyLucene 3.X version and tried to install. Initially I got some error. I was trying to follow the instructions given at http://lucene.apache.org/pylucene/documentation/install.html. There was a confusing statement to me "<edit setup.py to match your environment>" :-(. Where to match ?? I scrated my hed for 2 days. Finally I identified the place to edit. Insted of discussinf error I will try to explain how to install PyLucene in GNU\Linux.

Download PyLucene 3.0.0-1 source from http://www.apache.org/dyn/closer.cgi/lucene/pylucene/ .

Extract the source. Now you will be having 'pylucene-3.0.0-1' directory. Change to the directory. Then execute the command "pushd jcc". Now you will be in the jcc directory inside the PyLucene dir. Open the 'setup.py' file. You have to make some changes in the file to install it properly. In setup.py there will be portion like this

    """"
    JDK = {
            'darwin': '/System/Library/Frameworks/JavaVM.framework/Versions/Current',
            'ipod': '/usr/include/gcc',
            'linux2': '/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0',
            'sunos5': '/usr/jdk/instances/jdk1.6.0',
            'win32': 'o:/Java/jdk1.6.0_02',
    }
    """"
If you are using GNU\Linux you have to change the line "'linux2': '/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0'," . Here you have to specify the path to your Java installation. Here in the example '/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0' is the Java installation path in my system. If you are using M$ windows you have to edit the line "'win32': 'o:/Java/jdk1.6.0_02',".

Now it is time to build and install JCC .
Be superuser and execute the command "python setup.py install".
If everything is ok JCC will be installed.

Then execute the command 'popd'. Now you are in the PyLucene source dir again.

In this directory you gave to some editing . Open the 'Makefile'. There will be a portain like this in the file.

"""    #Linux     (Ubuntu 8.10 64-bit, Python 2.5.2, OpenJDK 1.6, setuptools 0.6c9)
    #PREFIX_PYTHON=/usr
    #ANT=ant
    #PYTHON=$(PREFIX_PYTHON)/bin/python
    #JCC=$(PYTHON) -m jcc --shared
    #NUM_FILES=2
"""

Uncomment the five lines followed by "#Linux     (Ubuntu 8.10 64-bit, Python 2.5.2, OpenJDK 1.6, setuptools 0.6c9)".
Now the portain should look like
"""
    #Linux     (Ubuntu 8.10 64-bit, Python 2.5.2, OpenJDK 1.6, setuptools 0.6c9)
    PREFIX_PYTHON=/usr
    ANT=ant
    PYTHON=$(PREFIX_PYTHON)/bin/python
    JCC=$(PYTHON) -m jcc --shared
    NUM_FILES=2

"""
If you are using Python 2.6 some more editing to be done. Change the line "JCC=$(PYTHON) -m jcc --shared" to "JCC=$(PYTHON) -m jcc.__main__ --shared".

Run make and make install command. If every setting is PyLucene will be installed in your system. To check your installation rum 'make test' command.

Now enjoy !!!

Happy Hacking!!!!!!!!!!!

Related Entries:
Again Python programming in Malayalam
New book in 'Head First' series with python
Graphical works with NLTK
NLTK and Indian Language corpus processing Part-III
NLTK and Indian Language corpus processing - Part-II
 Permalink

Redmine installation is easy now !!!

 Redmine the Open Source Project Management tool installation is easy now. A Bitnami stack is available for Redmine now. You can download the stack from Bitnami site and install with in 10 minutes. It is great. I took around two days to install and configure it before. Todat I did it with 10 minutes .

Happy hacking !!!!!!!!!

Related Entries:
Installing Redmie in Ubuntu9.0.4
Redmine the Open Source way for effective project management
Comments (1)  Permalink

Google launches multilingual dictionary service

 Google launched a new language service - Google Dictionary. It is a multilingual bilingual dictionary. It provides access to English to 30 other world languages. Major Indian Languages are covered in the dictionary. English to any Language(L) and Language(L) to English search facility is there . If somebody searches a word it will show meaning in taget language , related phrases and web definitions. A sample search result for English <> Tamil is given below. I searched for 'book'. Result

Found in dictionary: English > Tamil.
 book
புத்தகம்
Related phrases
indent book
தேவைக் கோரிக்கைப் பதிவேடு
lung book
நுரையீரல் ஏடு
book-review
நூல் மதிப்புரை
book adjustment
புத்தக வழிச் சரி செய்தல்
book of accounts
கணக்கேடுகள்
book debt
புத்தகக் கடன்
book-keeping
வரவு செலவுக் கணக்கியல்
book trade
புத்தகத் தொழில்
account book
கணக்குப் புத்தகம்
book post
நூல் அஞ்சல்
Web definitions
a written work or composition that has been published (printed on pages bound together); "I am reading a good book on economics" 
wordnetweb.princeton.edu/perl/webwn
physical objects consisting of a number of pages bound together; "he used a large book as a doorstop" 
wordnetweb.princeton.edu/perl/webwn
record: a compilation of the known facts regarding something or someone; "Al Smith used to say, `Let's look at the record'"; "his name is in all the record books" 
wordnetweb.princeton.edu/perl/webwn
The googlization continues 
 Permalink

New book in 'Head First' series with python

 There is a new book came in the 'Head First' series with Python language examples.

Head First Programming

A learner's guide to programming using the Python language

ByDavid Griffiths, Paul BarryPublisher:O'Reilly MediaReleased: November 2009 Pages: 448 (est.)

Book

Waiting for the book to be launched in Indian market .

Related Entries:
Again Python programming in Malayalam
Installing PyLucene 3.x in GNU\Linux
Graphical works with NLTK
NLTK and Indian Language corpus processing Part-III
NLTK and Indian Language corpus processing - Part-II
Comments (1)  Permalink

Fedora-12 (Constantine) released

Fedora 12 released today .

See the release documentation here http://docs.fedoraproject.org/
Installation guide http://docs.fedoraproject.org/install-guide/f12/en-US/pdf/Fedora_12_Installation_Guide.pdf

To download go to http://fedoraproject.org/en/get-fedora-all

Fedora-12 comes with
Kernal - 2.6.31
KDE 4.3
NOME 2.28

It has mobilin support too.


Happy hacking !!

 Permalink
Next1-10/71