PyLucene in Action - Part I
PyLucene is a Python wrapper aroung the Java Lucene. The goal of this tool is use Lucene's text indexing and searching capabilities from Python. It is compatible with the latest version of Java Lucene. PyLucene is embeds a Java VM with Lucene into Python process. More details on PyLucene can be found at http://lucene.apache.org/pylucene/.
In this blog post I am going to demonstrate how to build a search index and query a search index with PyLucene. You can see the installation instruction for PyLucene im my previous blog post ....
1) Creating an index with Pylucene
I am using the below given code to create an index with the PyLucene
===Code Python BEGIN ===========================================
#!/usr/bin/env python
import os,sys,glob
import lucene
from lucene import SimpleFSDirectory, System, File, Document, Field, \
StandardAnalyzer, IndexWriter, Version
"""
Example of Indexing with PyLucene 3.0
"""
You have to supply two parameter to the luceneIndexer().
a) A path to the directory to where the documents for indexing is stored
b) A path to the directory where the index can be saved
2) Querying an index with Pylucene
The below given code is for querying an index with PyLucene
In the code I have manually specified the index dir """INDEXDIR = "./MyIndex" """. Instead of this one can receive the index directory as a command line parameter (sys.argv) too.
When using the function luceneRetriver() you have to give a query as parameter.
The source code is available in bitbucket https://bitbucket.org/jaganadhg/pyluceneia
Happy Hacking !!!!!!!!!!
In this blog post I am going to demonstrate how to build a search index and query a search index with PyLucene. You can see the installation instruction for PyLucene im my previous blog post ....
1) Creating an index with Pylucene
I am using the below given code to create an index with the PyLucene
===Code Python BEGIN ===========================================
#!/usr/bin/env python
import os,sys,glob
import lucene
from lucene import SimpleFSDirectory, System, File, Document, Field, \
StandardAnalyzer, IndexWriter, Version
"""
Example of Indexing with PyLucene 3.0
"""
def luceneIndexer(docdir,indir):==== Code Python END ============================================
"""
Index Documents from a dirrcory
"""
lucene.initVM()
DIRTOINDEX = docdir
INDEXIDR = indir
indexdir = SimpleFSDirectory(File(INDEXIDR))
analyzer = StandardAnalyzer(Version.LUCENE_30)
index_writer = IndexWriter(indexdir,analyzer,True,\
IndexWriter.MaxFieldLength(512))
for tfile in glob.glob(os.path.join(DIRTOINDEX,'*.txt')):
print "Indexing: ", tfile
document = Document()
content = open(tfile,'r').read()
document.add(Field("text",content,Field.Store.YES,\
Field.Index.ANALYZED))
index_writer.addDocument(document)
print "Done: ", tfile
index_writer.optimize()
print index_writer.numDocs()
index_writer.close()
You have to supply two parameter to the luceneIndexer().
a) A path to the directory to where the documents for indexing is stored
b) A path to the directory where the index can be saved
2) Querying an index with Pylucene
The below given code is for querying an index with PyLucene
======= Code Begin Python =======================================
#!/usr/bin/env python
import sys
import lucene
from lucene import SimpleFSDirectory, System, File, Document, Field,\
StandardAnalyzer, IndexSearcher, Version, QueryParser
"""
PyLucene retriver simple example
"""
INDEXDIR = "./MyIndex"
def luceneRetriver(query):
lucene.initVM()
indir = SimpleFSDirectory(File(INDEXDIR))
lucene_analyzer = StandardAnalyzer(Version.LUCENE_30)
lucene_searcher = IndexSearcher(indir)
my_query = QueryParser(Version.LUCENE_30,"text",\
lucene_analyzer).parse(query)
MAX = 1000
total_hits = lucene_searcher.search(my_query,MAX)
print "Hits: ",total_hits.totalHits
for hit in total_hits.scoreDocs:
print "Hit Score: ",hit.score, "Hit Doc:",hit.doc, "Hit String:",hit.toString()
doc = lucene_searcher.doc(hit.doc)
print doc.get("text").encode("utf-8")
luceneRetriver("really cool restaurant")
===============================================================
In the code I have manually specified the index dir """INDEXDIR = "./MyIndex" """. Instead of this one can receive the index directory as a command line parameter (sys.argv) too.
When using the function luceneRetriver() you have to give a query as parameter.
The source code is available in bitbucket https://bitbucket.org/jaganadhg/pyluceneia
Happy Hacking !!!!!!!!!!
Comments
On decide de changer d'itineraire et
http://www.jeansbrandsonline.com/
de partir vers Mt Kuring Gai au lieu de Cowan, le chemin est plus court, 2h30 au lieu de 6h.
http://www.louisvuitton-gucci.net/
Petit dej et cafe dans le bush et on s'en va!
Heute sind alle wieder zu sehen, die Zeitung Bericht erschien im Insekt verletzt fühle mich schrecklich.
http://www.thomassaboschmucksale.com/
No new comments allowed (anymore) on this post.








