BlogGalleryAbout meContact
My Resume
WebDOCPDF
RTFODTTXT
powered by emurse
Use OpenOffice.org Spread Firefox Affiliate Button Ubuntu GNU/Linux Perl Python SMC

Named Entity Recognition in Perl

Named Entity Recognition is another trivial issue in Natural Language Processing. For details about named entities refer the wiki article on Named entity recognition.

Usually a named entity refers name of person, companies etc... It may be a group of word optionally joined with of the and like(In English). Automatic identification of this groups are a problematic one.

Wikipedia lists a handful of tools for this purpose. I think NLTK(Natural Language Tool Kit ) also have some module for it. Here I am going to say how to do it with perl.

There is perl module called Lingua::EN::NamedEntity written by 'Simon Cozens' author of "Advanced Perl Programming". It is written for Named Entity Recognition.

To install the module in GNU/Linux system follow the below given steps.
Open terminal. Type 'cpan' as root user. Type 'install Lingua::EN::NamedEnity' and follow the instructions.

Here is the code to extract named entity with the module.

====code begin =========================
#!/usr/bin/env perl

use Lingua::EN::NamedEntity;

while (<>) {
my $str = join '\n',<>;
my @entities = extract_entities($str);
foreach my $entity (@entities) {
        print $entity->{entity},"\n";
        
    exit;
}
===code end ============================

I think the code is self explanatory.(If not please leave a comment)
You may feel it as a slow code some times. I don't know why !!!!

My sincere thanks goes to "Simon Cozens" and his book, my old PGDL students. I think I got the gyan from Simons book and cpan.org.

Some body really wants to see how it can be implemented for Indian Languages.
Happy Hacking!!!!!!

Related Entries:
Some thoughts on Tweeting
NLTK and Indian Language corpus processing Part - I
On English to Indian Language MT - II
Generating pronunciation of English words with Perl.
Converting word sequence to title case in Perl and Python.
 Permalink