BlogGalleryAbout meContact
My Resume
WebDOCPDF
RTFODTTXT
powered by emurse
Use OpenOffice.org Spread Firefox Affiliate Button Ubuntu GNU/Linux Perl Python SMC

Again Python programming in Malayalam

Today I tried Object Oriented Programming in Python (Python3) with Malayalam class names and variable names.

See the code. It works very well with Python3 interpreter.

    class പക്ഷി:

        def __init__(self):
            """
            ക്ലാസ് ഇനിഷ്യലൈസേഷന്‍
            """
            self.വിവിധ = ['കാക്ക','പ്രാവ്','കുരുവി','തത്ത','മൈന','പരുന്ത്','മൂങ്ങ']

        def പറക്കുക(self, ഇനം):
            """
            പറക്കുമോന്ന് നോക്കാല്ലോ!!!!!!!!!!!
            """
            if ഇനം in self.വിവിധ:
                print("%s പറക്കുന്ന പക്ഷിയാണ്" % ഇനം)
            else:
                print("എനിക്കറിയാമ്മേലേ!!!!!!")

    if __name__ == "__main__":
        സൂചകം = പക്ഷി()
        പറവ = "കാക്ക"
        മൃഗം = "ആന"
        സൂചകം.പറക്കുക(പറവ)
        സൂചകം.പറക്കുക(മൃഗം)


Use Python3 interpreter to run the code !!!!


Happy Hacking !!!!!!!

Related Entries:
Python3 ZWJ and Malayalam; some doubts
Python3 is wonderful
Installing PyLucene 3.x in GNU\Linux
New book in 'Head First' series with python
Graphical works with NLTK
 Permalink

Playing with TIMIT corpus in NLTK - Part - I

The corpus collection in NLTK contains TIMIT Corpus Sample. I just explored the TMIT corpus and TimitCorpusReader function. If you are interested to know more about TIMIT please check http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1.

Some description available in NLTK code is given below.
"This corpus contains selected portion of the TIMIT corpus.

 - 16 speakers from 8 dialect regions
 - 1 male and 1 female from each dialect region
 - total 130 sentences (10 sentences per speaker.  Note that some
   sentences are shared among other speakers, especially sa1 and sa2
   are spoken by all speakers.)
 - total 160 recording of sentences (10 recordings per speaker)
 - audio format: NIST Sphere, single channel, 16kHz sampling,
  16 bit sample, PCM encoding"

If you explore the TMIT directory in nltk data you will find the following items in it

allfilelist.txt   allsenlist.txt  dr2-faem0  dr4-falr0  dr6-fapb0  dr8-fbcg1     prompts.txt  sentences.pos   spkrsent.txt  wrdalign.timit
allphonedur.txt   allsentime.txt  dr2-marc0  dr4-maeb0  dr6-mbma1  dr8-mbcg0     README       sentences.ppp   testset.doc
allphonelist.txt  dr1-fvmh0       dr3-falk0  dr5-ftlg0  dr7-fblv0  l             readme.doc   sentences.tags  timitdic.doc
allphonetime.txt  dr1-mcpm0       dr3-madc0  dr5-mbgt0  dr7-madd0  phoncode.doc  sentences    spkrinfo.txt    timitdic.txt

The items having 'dr' in the name are directories which contains spoken corpora files. In each directory some .wav files and associated transcriptions are located. We can acces and explore these files with the corpus api of NLTK.

Let's see how to explore the data.

Importing the timit corpus from NLTK

    >>> from nltk.corpus import timit
Now the corpus is imported

To view the fileid in the corpus

    >>> timit.fileids()

It will print a long list like
    ......................
    'dr8-mbcg0/sx417.wav',
     'dr8-mbcg0/sx417.wrd',
     'dr8-mbcg0/sx57.phn',
     'dr8-mbcg0/sx57.txt',
     'dr8-mbcg0/sx57.wav',
     'dr8-mbcg0/sx57.wrd',
     'spkrinfo.txt',
     'timitdic.txt']

You might have noticed that there is  .wav file, .wrd file, .phn file and .txt files are there. The .wav file contain utterances, .wrd file contains time marked word file, .phn file contains time marked phone list ..... For each utterance file there will be a .ph, .wrd and .txt file.

From the file ids we can get list of utterance in the NLTK TIMIT corpus.

    >>> timit.utteranceids()

It will print a long list like
    .....................
     'dr8-mbcg0/sa1',
     'dr8-mbcg0/sa2',
     'dr8-mbcg0/si2217',
     'dr8-mbcg0/si486',
     'dr8-mbcg0/si957',
     'dr8-mbcg0/sx147',
     'dr8-mbcg0/sx237',
     'dr8-mbcg0/sx327',
     'dr8-mbcg0/sx417',
     'dr8-mbcg0/sx57']

We can store the utterance ids to a variable for future use.

    >>> utid = timit.utteranceids()

Now the list 'utid' contains all the utterance ids in the TIMIT corpora. The utterance id is in a specified pattern. This pattern helps the corpus reader module to identify information regarding the utterances. Letus see how this is happening.

To get the speaker id from an utterance id. Let us take the first utterance id from the 'utid' list.

    >>> timit.spkrid(utid[0])
    'dr1-fvmh0'
For future use lets store the id in a variable

    >>> sid = timit.spkrid(utid[0])

For the selected utterance we can get the sentence id also.

    >>> timit.sentid(utid[0])
    'sa1'
'sa1' is the sentence id for the given utterence.

    >>> sentid = timit.sentid(utid[0])

Now I am storing the sentence id to a variable for future use.


I think from the above example you might have got an idea about the file name pattern of the timit corpus in NLTK.

Now we can try to get a list of utterance with an utterance is and sentence id.

    >>> timit.utterance(utid, sentid)

Now you will get a long list like this
"['dr1-fvmh0/sa1', 'dr1-fvmh0/sa2', 'dr1-fvmh0/si1466', 'dr1-fvmh0/si2096', 'dr1-fvmh0/si836', 'dr1-fvmh0/sx116', 'dr1-fvmh0/sx206', 'dr1-fvmh0/sx26', 'dr1-fvmh0/sx296',
...................

Now I am going to store the list of utterances to a list.

    >>> utt = timit.utterance(utid, sentid)

We can access the utterance based on the speaker id .

    >>> sputid = timit.spkrutteranceids(sid)

It will give all the utterance associated with the id 'sid'.
Those ids are
    ['dr1-fvmh0/sa1',
     'dr1-fvmh0/sa2',
     'dr1-fvmh0/si1466',
     'dr1-fvmh0/si2096',
     'dr1-fvmh0/si836',
     'dr1-fvmh0/sx116',
     'dr1-fvmh0/sx206',
     'dr1-fvmh0/sx26',
     'dr1-fvmh0/sx296',
     'dr1-fvmh0/sx386']

I stored all he ids to a list 'sputid'.

Let's try to get details about the speaker having the id 'dr1-fvmh0'. This id is stored in the variable 'sid' earlier.

    >>> timit.spkrinfo(sid)
    SpeakerInfo(id='VMH0', sex='F', dr='1', use='TRN', recdate='03/11/86', birthdate='01/08/60', ht='5\'05"', race='WHT', edu='BS', comments='BEST NEW ENGLAND ACCENT SO FAR')

Now it is giving the details about the speaker such as gender, birth date, and race etc.......

Continued ..............

Happy Hacking !!!!!!!!1

Related Entries:
NLTK new version released .
Graphical works with NLTK
NLTK and Indian Language corpus processing Part-III
NLTK and Indian Language corpus processing - Part-II
NLTK and Indian Language corpus processing Part - I
 Permalink

Installing PyLucene 3.x in GNU\Linux

Lucene
=======
"Lucene is a high performance, scalable Information Retrieval (IR) library. It lets you add indexing and searching capabilities to your applications. Lucene is a
mature, free, open-source project implemented in Java; it’s a member of the popular Apache Jakarta family of projects, licensed under the liberal Apache Software License.
"

PyLucene
========
" PyLucene is a Python extension for accessing Java Lucene. Its goal is to allow you to use Lucene's text indexing and searching capabilities from Python. It is API compatible with the latest version of Java Lucene, version 2.9.0 as of October 13th, 2009."


I downloaded PyLucene 3.X version and tried to install. Initially I got some error. I was trying to follow the instructions given at http://lucene.apache.org/pylucene/documentation/install.html. There was a confusing statement to me "<edit setup.py to match your environment>" :-(. Where to match ?? I scrated my hed for 2 days. Finally I identified the place to edit. Insted of discussinf error I will try to explain how to install PyLucene in GNU\Linux.

Download PyLucene 3.0.0-1 source from http://www.apache.org/dyn/closer.cgi/lucene/pylucene/ .

Extract the source. Now you will be having 'pylucene-3.0.0-1' directory. Change to the directory. Then execute the command "pushd jcc". Now you will be in the jcc directory inside the PyLucene dir. Open the 'setup.py' file. You have to make some changes in the file to install it properly. In setup.py there will be portion like this

    """"
    JDK = {
            'darwin': '/System/Library/Frameworks/JavaVM.framework/Versions/Current',
            'ipod': '/usr/include/gcc',
            'linux2': '/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0',
            'sunos5': '/usr/jdk/instances/jdk1.6.0',
            'win32': 'o:/Java/jdk1.6.0_02',
    }
    """"
If you are using GNU\Linux you have to change the line "'linux2': '/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0'," . Here you have to specify the path to your Java installation. Here in the example '/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0' is the Java installation path in my system. If you are using M$ windows you have to edit the line "'win32': 'o:/Java/jdk1.6.0_02',".

Now it is time to build and install JCC .
Be superuser and execute the command "python setup.py install".
If everything is ok JCC will be installed.

Then execute the command 'popd'. Now you are in the PyLucene source dir again.

In this directory you gave to some editing . Open the 'Makefile'. There will be a portain like this in the file.

"""    #Linux     (Ubuntu 8.10 64-bit, Python 2.5.2, OpenJDK 1.6, setuptools 0.6c9)
    #PREFIX_PYTHON=/usr
    #ANT=ant
    #PYTHON=$(PREFIX_PYTHON)/bin/python
    #JCC=$(PYTHON) -m jcc --shared
    #NUM_FILES=2
"""

Uncomment the five lines followed by "#Linux     (Ubuntu 8.10 64-bit, Python 2.5.2, OpenJDK 1.6, setuptools 0.6c9)".
Now the portain should look like
"""
    #Linux     (Ubuntu 8.10 64-bit, Python 2.5.2, OpenJDK 1.6, setuptools 0.6c9)
    PREFIX_PYTHON=/usr
    ANT=ant
    PYTHON=$(PREFIX_PYTHON)/bin/python
    JCC=$(PYTHON) -m jcc --shared
    NUM_FILES=2

"""
If you are using Python 2.6 some more editing to be done. Change the line "JCC=$(PYTHON) -m jcc --shared" to "JCC=$(PYTHON) -m jcc.__main__ --shared".

Run make and make install command. If every setting is PyLucene will be installed in your system. To check your installation rum 'make test' command.

Now enjoy !!!

Happy Hacking!!!!!!!!!!!

Related Entries:
Again Python programming in Malayalam
New book in 'Head First' series with python
Graphical works with NLTK
NLTK and Indian Language corpus processing Part-III
NLTK and Indian Language corpus processing - Part-II
 Permalink

NLTK and Indian Language corpus processing Part-III

I think you enjoyed the Part-I and Part-II of this tutorial. If you have any comment, suggestion or criticism please write to me. In part -III we can try to some more work with Indian Language Corpora in NLTK.

Generating word and POS bigram and trigram

For generating word and POS bigram I selected the 'hindi.pos' file and created the bigrams and trigrams.
Here is the code to do that.

    =========== Code Begin ===========
    from nltk.corpus import indian
    from nltk import bigrams
    from nltk import trigrams

    hpos = indian.tagged_sents('hindi.pos')

    # Stores the POS tagged sentences from 'hindi.pos'

    wpos = []

    for sent in hpos:
        tojoin = sent
        for tagged in tojoin:
            wpos.append(" ".join(tagged))

    #Stores word and pos as single unit to a list called 'wpos'

    wpos_bigram = bigrams(wpos)
    # Generating word and POS bigram
    for wpb in wpos_bigram:
       print " ".join(wpb)

    # Prints the word and POS bigram

    wpos_trigram = trigrams(wpos)
    # Generating the Word and POS trigram
    for wpt in wpos_trigram:
        print " ".join(wpt)
    #Prints the word and POS trigram

    =========== Code Begin ===========

       

For generating word and pos from other Indian Language corpus just replace 'hindi.pos' with appropriate file id.

Collocations Concordance from Indian Language Corpora

Now let's try to build collocation from hindi corpus(hindi.pos).

    >>> hw = nltk.corpus.indian.words('hindi.pos')
    >>> th = Text(hw)
    >>> th.collocations()
    Building collocations list
    है ।; के लिए; कहा कि; हैं ।;
    पारी खेली; है कि; रनों की;
    न्यू जीलैंड; युद्ध विराम;
    ने कहा; के हाथों; करते हुए;
    डेविस कप; की पारी; रहे हैं;
    खेली ।; रन पर; रन बनाये;
    हाथों लपकवाया; किए गए

Concordence from Hindi corpus in NLTK


    >>> th.concordance('न्यू')
    Building index...
    Displaying 13 of 13 matches:
    वसीय मैच में न्यू जीलैंड को जी
    ��न से बाहर कर न्यू जीलैंड की टी
    ��न सकती हैं । न्यू जीलैंड ने पा
    ती डुनेडिन । न्यू जीलैंड ने पा
    -२ से जीत ली । न्यू जीलैंड ने पा
    ��त किया गया । न्यू जीलैंड की पा
    लपकवा दिया । न्यू जीलैंड की तर
    ीसरे मैच में न्यू जीलैंड को २८
    ��े हरा दिया । न्यू जीलैंड को जी
    ��त किया गया । न्यू जीलैंड की शु
    �� पारी खेली । न्यू जीलैंड के १५
    ��ी कर पाये और न्यू जीलैंड की पा
    ोगदान दिया । न्यू जीलैंड की तर
    >>>

Here is an example to populate frequency distribution of some Hindi words in 'hindi.pos' file.

    ========== Code begin =====================
    # -*- coding: utf-8 -*-
    from nltk.corpus import indian
    from nltk import FreqDist
    hindi_text = indian.words('hindi.pos')
    freq_dist = FreqDist([w.strip() for w in hindi_text])
    modals = ['की','है','हो','तो']

    for modal in modals:
        print modal + " : " , freq_dist[modal]

    ====== Code End =================

The result is given below.

    की :  236
    है :  189
    हो :  28
    तो :  10


Happy Dipavali !!!
Happy Hacking


Related Entries:
NLTK and Indian Language corpus processing - Part-II
Graphical works with NLTK
NLTK and Indian Language corpus processing Part - I
Finding bigrams with NLTK.
NLP with Python NLTK Book
 Permalink

NLTK and Indian Language corpus processing - Part-II

In Part-I we saw how to access Indian Language corpora in NLTK and how to play with it. Now let's see some more examples.

First we can see how to access each word with associated POS tag. (Before proceeding don't forgot to do the imports done in Part-I )

    >>> for sent in hpos:
    ...    tmp = sent
    ...    for j in range(len(sent))
    ...        print " ".join(k[j])

It will print word with POS like

    दो QFNUM
    विकेट NN
    लिये VFM
    । PUNC
    अनवर NNP
    को PREP
    विंसेट NNP
    ने PREP
    रन NNC
    आउट NN
    किया VFM
    । PUNC
    >>>

Let us see how to do parsing with the Indian Language POS Tagged corpus. For the purpose I am using the RegexParser available in NLTK.


    >>> sentence = hpos[2]

I am taking the third sentence in the hindi.pos file for parsing

    >>> grammar = "NP: {<DT>?<JJ>*<NN>}"

Defined grammar for parsing with RegexParser in NLTK.

    >>> cp = nltk.RegexpParser(grammar) # Creating the parser object and passing the grammar to it
    >>> result = cp.parse(sentence) # Do the parsing and store the result to 'result'
    >>> print result # Printing the result

It will produce the parse structure like

    (S
      इराक/NNP
      के/PREP
      विदेश/NNC
      (NP मंत्री/NN)
      ने/PREP
      अमरीका/NNP
      के/PREP
      उस/PRP
      (NP प्रस्ताव/NN)
      का/PREP
      मजाक/NVB
      उड़ाया/VFM
      है/VAUX
      ,/PUNC
      जिसमें/PRP
      अमरीका/NNP
      ने/PREP
      संयुक्त/NNC
      (NP राष्ट्र/NN)
      के/PREP
      (NP प्रतिबंधों/NN)
      को/PREP
      (NP इराकी/JJ नागरिकों/NN)
      के/PREP
      लिए/PREP
      कम/INTF
      हानिकारक/JJ
      बनाने/VNN
      के/PREP
      लिए/PREP
      कहा/VFM
      है/VAUX
      ।/PUNC)

If you would like to visualise the parse structure just do this much

    >>> result.draw()

It will show a big parse tree. It is too big one so I am not attaching the screen shot.


Then what about generating bigrams from Indian Language corpora?

Here comes the code for that.

    >>> hinw = indian.words('hindi.pos') # Stores the words in 'hindi.pos' to hinw

    >>> hinbi = nltk.bigrams(hinw) # Generate the bigrams and store it in to hinbi

To print the bigrams

    >>> for i in hinbi:
    ...     print " ".join(i)

Here you can see some sample bigram

    चुके थे
    थे तथा
    तथा ३
    ३ बार
    बार विधानसभा
    विधानसभा के
    के सदस्य


   
Fine then what about trigrams.
Hmmm it is easy !!

First store words in the corpus to a list

    >>> hinw = indian.words('hindi.pos')

Then generate trigrams with nltk.trigrams() function

    >>> hintr = nltk.trigrams(hinw)

To print the trigrams
   
    >>> for j in hintr:
    ...    print " ".join(j)

Here is the sample output

    दो-दो तथा फ्रैंक्लीन
    तथा फ्रैंक्लीन और
    फ्रैंक्लीन और हैरिस
    और हैरिस ने
    हैरिस ने एक-एक
    ने एक-एक विकेट
    एक-एक विकेट लिये
    विकेट लिये ।


Finding count of a particular word in Indian Language corpus.

Store the words to some variable and use the count() function.

    >>> txt2 = indian.words('hindi.pos')
   
    >>> txt2.count('भारत')
    23
    >>>
Here I stred all the words in 'hindi.pos' and explored the count of the word भारत .

Find he percentage of text taken by a particular text

To find out the percentage of text taken by the word भारत

    >>> 100 * txt2.count('की') /len(txt2)
    2
    >>>

Producing lexical dispersion plot from Indian Language corpus

For that we have to play some trick

This is the command for plotting lexical dispersion plot of भारत  and की in Hindi corpus


    Text(txt2).dispersion_plot(['भारत','की'])

The Text() function convert the wordlist to nltk text object. It makes the plotting job easy. In the plot you cant see the word, because Unicode text will be displayed as box in the plot.


Selecting word based on parameters from Hindi corpus

For the same I am taking the example mentioned in the NLTK book.


    {w|w is a member of V and P(w)}

    [w for w in V if p(w)]

    >>> V = set(txt2)
    >>> my_word = [w for w in V if len(w) > 25]
    >>> sorted(my_word)
    >>> fd = FreqDist(txt2)
    >>> sorted([w for w in set(txt2) if len(w) > 5 and fd[w] > 25])

It will give the following list of word as output

        इस
        एक
        और
        कर
        कहा
        का
        कि
        किया
        ......



Conditional frequency distribution for Indian Language corpora

Here is an example

    Step 1
    Generate bigrams

    >>> hinbi = nltk.bigrams(hinw)

    Step 2
    Generate Conditional Frequency Distribution

    >>> gd = nltk.ConditionalFreqDist(big)

You can plot the cfd but it will take some time to generate the plot and you can see some animation effect
   
    >>> gd.plot()


One can print the tabulated cfd also

    >>> gd.tabulate()


More coming soon

Happy hacking !!!!!!!!


Related Entries:
NLTK and Indian Language corpus processing Part-III
Finding bigrams with NLTK.
Graphical works with NLTK
NLTK and Indian Language corpus processing Part - I
The Snack toolkit with Python
 Permalink

NLTK and Indian Language corpus processing Part - I

During my presentation in Indian Python conference some body asked about Indian Language corpus processing in NLTK. Some how I skipped the answer. Because I know that Indian Language corpus is there in NLTK. But I never tried to play with that. But after the conference I did some thing on that too. I am posting my experiments with results here. If it can be done in a better way please tell me so that I can improve.

The Natural Language Toolkit contains some Indian language corpus. The corpus is  POS Tagged one. It is available for Bangala, Hindi, Marathi and Telugu languages.

    Total number of words in Bangala is     10281
                  Hindi     9408
                  Marathi     19066
                  Telugu     9999

    Total number Sentences in Bangala     899
                  Hindi        541
                  Marathi    1197
                  Telugu    994

Let's see how to access Indian Language corpora in NLTK and how to play with it.

    >>> from nltk.corpus import indian

    It will import Indian Language corpus from NLTK data.

    >>> indian.fileids() # Shows files in Indian Language corpus collection in NLTK
    ['bangla.pos', 'hindi.pos', 'marathi.pos', 'telugu.pos']
    >>>


To find number of characters in each language corpora

    >>> for f in indian.fileids():
    ...     print f
    ...     print len(indian.raw(f))

It will produce the following output

    bangla.pos
    209525
    hindi.pos
    175045
    marathi.pos
    429234
    telugu.pos
    251391

To find number of words in each language corpus

    >>> for f in indian.fileids():
    ...     print f
    ...     print len(indian.words(f))
    ...
It will produce the following output

    bangla.pos
    10281
    hindi.pos
    9408
    marathi.pos
    19066
    telugu.pos
    9999

To find number of sentences
    >>> for f in indian.fileids():
    ...     print f
    ...     print len(indian.sents(f))
    ...
It will produce the following output
   
    bangla.pos
    899
    hindi.pos
    541
    marathi.pos
    1197
    telugu.pos


We can extract sentences from these corpora too.

For accessing sentences from Hindi corpus
    >>> hindi_sent = indian.sents('hindi.pos')

To print individual sentences

    >>> for hsen in hindi_sent:
        print hsen

It will print each sentence as a list of words. Let's see how store sentences to a file.

    >>> it = open("jhi.txt",'w')
    >>> for hsen in hindi_sent:
    ...     it.write(" ".join(hsen))
    ...

The above given piece of code will convert the list of words in to actual sentence, and it will store to the file specified.


To access words in a corpus

    >>> hin_word = indian.words('hindi.pos')

This piece of code will store all the words in hindi.pos file to hin_words.


As we stored the sentences in to file we can write words to file also.

    >>> hwo = open("hcwo.txt",'w') # Open a file to store the words
    >>> for hw in hin_word:
    ...     hwo.write(" ".join(hw)) # Write each words to the file
    ...

For accessing the POS tagged sentences from a corpora

    >>> hpos = indian.tagged_sents('hindi.pos')

    >>> for sent in hpos:
    ...    print sent

It will print the POS tagged sentences a list of tuples.

More coming soon !!!!!

Happy hacking

Related Entries:
NLP with Python NLTK Book
Graphical works with NLTK
NLTK and Indian Language corpus processing Part-III
NLTK and Indian Language corpus processing - Part-II
Finding bigrams with NLTK.
 Permalink

Python3 ZWJ and Malayalam; some doubts

Again I tried to do something in Python3. But it resulted in some strange results.
See the below given code.

    ==== Code Begin =========

    def ജഗന്‍ ():
        print("എന്റെ പേര് ജഗന്‍ എന്നാണ്")
        ജഗന്‍  = "ഞാന്‍"
        print(ജഗന്‍  )

    ജഗന്‍ ()

==== Code End ===========

When I tried to execute this it throws some error.

    ~/pypract$ python3 tes2.py
      File "tes2.py", line 2
        def ജഗന്‍ ():
                          ^
    SyntaxError: invalid character in identifier

    ~/pypract$

I thought that it may be due to the use of
'ZWJ' in some names I used in function names and variable names. So I decided to rewrite the same without 'ZWJ' character. The code is given below

    ==== Code Begin =====
    def ജഗന്‍():
        print("എന്റെ പേര് ജഗന്‍ എന്നാണ്")
        ജഗന്‍= "ഞാന്‍"
        print(ജഗന്‍)

    ജഗന്‍()
    ==== Code End =======

This code executed with out any error. What I did is I replaced the ന്‍ with the Unicode 5.1 equivalent .
The output is

    /pypract$ python3 tester.py
    എന്റെ പേര് ജഗന്‍ എന്നാണ്
    ഞാന്‍

I can't understand what is happening. Is a logical mistake I made in my program!!!
Or is it a problem related to ZWJ and Python????

Related Entries:
Again Python programming in Malayalam
Python3 is wonderful
Installing PyLucene 3.x in GNU\Linux
New book in 'Head First' series with python
Graphical works with NLTK
Comments (1)  Permalink

Python3 is wonderful


See the below given Python code. What do you think!! will it be executed without throwing errors or not?

####Code begin#####################
    import sys

    def പണിയെടുക്കൂ ( പാഠം):
        for വരി in പാഠം:
            print( വരി )


    വരവ് = sys.argv[1]

    മൊത്തം = open(വരവ്,'r').readlines()

    പണിയെടുക്കൂ(മൊത്തം)

######## Code End ######################

Don't scratch your head it will. If you use Python3 for running the code.

Save the code as test.py. Install Python3 . Run the program as python3 test.py <your file>

I just saw some new Python documentation for Python3 with some similar examples. Thanks to Santhosh Thottingal SMC for pointing the link. Then I decided to experiment with it.

Wow great in Python3 you can declare variable names function names in your local language. But you wont get the Python reserved words in in your language. I think Python is the first programming language which provides such a great facility.


Related Entries:
Again Python programming in Malayalam
Python3 ZWJ and Malayalam; some doubts
Installing PyLucene 3.x in GNU\Linux
New book in 'Head First' series with python
Graphical works with NLTK
Comments (4)  Permalink

Pycon India 2009 a report


Our team landed Bangalore on 25th night 9.15 for the first Indian Python Conference. Our colleague Mr.Sudharshan arranged accommodation for us in a men's hostel near to MEI @ Bangalore(Thanks to Sudharshan and Subhash). The whether was so cool. We felt it something like reached from Sahara to Antarctica.

On 26th morning we reached IISC around 9.45 a.m. Too late! We missed the talk "My adventures with Python" by Prabhu Ramachandran. The hall was full.

My friend Godfrey and myself attended the talks in Hall L4 in the morning session. The session started with a talk by Anand B Pillai  on "Python tools for Network Security". It was really nice one. He demonstrated how we can use Python tools for network security. The next talk was by Senthil Kumaran on "Algorithms in Python". He just gave brief intro to Python3 and his contribution to Python3. He demonstrated how different algorithms can be implemented in Python and also an evaluation of those algorithms. Students asked so many questions on the implementation. Really his presentation style was rocking one!

After these tow talks two lightning talks were given by two people from Dell gave a small talk on how Python is used in Dell for Hardware testing. But th code is not open!!!!!!!

During the lunch time I interacted with Anivar Aravind Santhosh Thottingal and some other SMC members. Just chit chat. After the lunch I rushed to Hall L5.

The first talk in Hall L5 after lunch was by Vinay Modi from Voice Pitara on "Semantic Web ad Python". He explained the concepts in Semantic Web and Python RDFLib in a nice way. He also mentioned about the semantic web expeditions by his group. I got some threads to begin on semantic web. There was no session managers(I am not sure). But he finished the talk by 45 minutes and answered for the queries from the audience.  The next talk was by Anand Janakiraman from Strand on "An analysis of the use of Python/Jaython at Strand". He introduced what are the services provided by Starnd and the role of Pytho/Jython in their product. He demonstrated the analysis of delegate registration in the India Python Conference with Avardis(TM) and some funny thing in the registration. How many spellings for Bangalore !! The law of Choice. His presentation style was incredible. Everybody enjoyed it. Some questions like why "jython" why not groovy ? etc came from the audience. Hmmm he answered it all with a smile, like a saint.

Tea break .

After having tea I rushed to Hall L4 . The first talk after the tea break was by Ramakrishna Readdy on "Building Python Applications for the Linux Enterprise". I am not literate enough to give the gist of his talk. The next talk was on "National Mission on Education" by Prabhu Ramacandran and Asokan Pichai. Dr.Prabhu introduced the project National Mission for Education and his drem about Python in eduction. After Prabhu ashokan pichai explained the plans of the project. After the talk they distributed a live DVD for experimenting with Python tools and T-Shirts too. But some people missed it!!

That was the end of the day one.

On the second day 27th I reached the venue by 9.45. little bit late. I rushed to Hall L5 . Keerthi Shankar was delivering his talk on "Python and .NET". In an attractive way he delivered the talk. He gave some real world examples of IronPython as well as .NET and its inter operability. The next talk was by Vikrant Patil from Strand on "Can your UI change colors like a chamelon". He demonstrated how the avardis(tm) scripting frame work works. He gave live demo with real world examples. It is wonder to see that how Python/Jython helps to solve customer requirements with in minutes. It is good model, really good business model. The next talk was given by me on "Natural Language Toolkit" . A talk with a text file opened in gedit and a python interpreter with some plots and mistakes. Thanks to Gopalasivam for providing laptop for the presentation.

Lunch break.

After having a nice lunch I escaped from the venue with my friends. Because my health became worse by the cool weather in Bangalore.


To me that was the end of first Indian Python Conference.

It was really great event. Well organised one . All the sessions were really brainstorming.


But

There is a bug in the T-Shirt "I am not reappy a wizard I am using Python".
"really" became "reappy" really unsolvable bug!!!!!!!!!

Thanks to my team mates Godfrey, Gopalasivam and Sudharsan for travelling with me to attend the conference.

Related Entries:
Again Python programming in Malayalam
Installing PyLucene 3.x in GNU\Linux
New book in 'Head First' series with python
Graphical works with NLTK
NLTK and Indian Language corpus processing Part-III
Comments (2)  Permalink

Chalo PyConf 2009

Dear All Pythonists
Wake up! Only one more day for Indian Python Conference. Chalo Bangalore.
Let us make it a great event .

Use OpenOffice.org

Use OpenOffice.org

Related Entries:
Again Python programming in Malayalam
Installing PyLucene 3.x in GNU\Linux
New book in 'Head First' series with python
Graphical works with NLTK
NLTK and Indian Language corpus processing Part-III
 Permalink
Next1-10/24