BlogGalleryAbout meContact
Jaganadh's bookshelf: read

Python Text Processing with NTLK 2.0 CookbookPython 2.6 Text Processing Beginners Guide

More of Jaganadh's books »
Jaganadh Gopinadhan's  book recommendations, reviews, quotes, book clubs, book trivia, book lists
Ubuntu GNU/Linux I am nerdier than 94% of all people. Are you a nerd? Click here to take the Nerd Test, get nerdy images and jokes, and write on the nerd forum! Python

Bangalore

Quick MySQL to CouchDB migration with Python

I used to play a lot with text databases. Today I was just thinking of migrating some of my data collection to CouchDB. I used the following script to convert one of my DB table (Almost all fields are TEXT) to a CouchDB collection.

#!/usr/bin/env python
import couchdb
import MySQLdb as mdb
couch = couchdb.Server()
db = couch.create('YOUR_COLLECTION_NAME')
con = mdb.connect(host='HOST_NAME',user='YOU',passwd='YOUR_PASS',db='YOUR_DB')
cur = con.cursor(mdb.cursors.DictCursor)
command = cur.execute("SELECT * FROM YOUR_DB_TABLE")
results = cur.fetchall()
for result in results:
    db.save(result)

The DictCursor in Python MySQLdb API was a great help in creating fields and values in CouchDB collection. As my table contained text data only the operation was smooth and I was able to migrate about 1 GB data to CouchDB. But !!! life is not easy if your text data have encoding issues or junk values that can't be converted to Unicode you are in trouble. Don't worry here comes the solution; replace the last two lines in the code with below given code.

for result in results:
    k = result.keys()
    v = result.values()
    v = [repr(i) for i in v]
    d = dict(zip(k,v))
    db.save(d)

Hmm so far so good. But I tried the same code with a different table where the structure is like:

+-------+--------------+------+-----+---------+----------------+
| Field | Type         | Null | Key | Default | Extra          |
+-------+--------------+------+-----+---------+----------------+
| ID    | int(11)      | NO   | PRI | NULL    | auto_increment |
| NAME  | varchar(30)  | NO   |     |         |                |
| PRICE | decimal(5,2) | NO   |     | 0.00    |                |
+-------+--------------+------+-----+---------+----------------+

Now the code thrown a big list of error. Life is not easy !! have to find a good solution for this ... Happy hacking !!!!

 Permalink

Experiments with NoSQL databases: CouchDB

I started reading about NoSQL databases for a long time. Occasionally  I used some NoSQL databases like Apache CouchDB and Apache Cassandra for some analytics purpose(Some minor projects) with Python. This time I just thought why can't try something on Java + NoSQL. I created a small for project to play with. The idea of this project is: store Twitter search result to CouchDB.   I used the following Operating System, Programming Languages and Libraries in this project.

        Operating System                  :     Fedora 16 (verne)
        Programming Language     :     Java (JDK 1.6.0_29)
        IDE                                            :     Eclipse 3.7.1
        Apache CouchDB                   :    1.0.
        External Libraries                   :     Couchdb4J
                                                                Twitter4J
                                                              Apache Commons httpclient, logging, codec,commons,collections, beanutils
                                                              Jsonlib, ezmorph   

Installing CouchDB
To install CouchDB fire the terminal and type the command
    $su -c 'yum -y install couchdb'

After succesful installation start the CoucbDB server by issuing the command in the terminal
    $su -c '/etc/init.d/couchdb start'

Now your CouchDB instance will be up and running. You can check this by opening CouchDB Futon in the broswer by navigating to http://localhost:5984/_utils/. If everything will fine you will see the Funton Interface.

Let's start out project.
First create a function to connect to the CouchDB instance,create and retrun a database with given name. If the database already exits it has to return the database.

    /**
     * @param strDBName
     * @return dbCouchDB
     */

    public static Database connectCouchDB(String strDBName) {
        Database dbCouchDB = null;
        Session dbCouchDBSession = new Session("localhost", 5984);
        List<String> databases = dbCouchDBSession.getDatabaseNames();
        if (databases.contains(strDBName)) {
            dbCouchDB = dbCouchDBSession.getDatabase(strDBName);
        } else {
            dbCouchDBSession.createDatabase(strDBName);
            dbCouchDB = dbCouchDBSession.getDatabase(strDBName);
        }

        return dbCouchDB;

    }

   

Now we can create a function to search in Twitter Search and return the tweets.

    /**
     * @param strQuery
     * @throws TwitterException
     * @return queryResult
     */

    public static QueryResult getTweets(String strQuery)
            throws TwitterException {
        Twitter twitter = new TwitterFactory().getInstance();
        Query query = new Query(strQuery);
        QueryResult queryResult = twitter.search(query);
        return queryResult;

    }


To insert the tweets to the CouchDB document collection(database) it has to be converted to a document. Lets create a function to convert individual tweets to CouchDB document.

    /**
     * @param tweet
     * @return couchDocument
     */

    @SuppressWarnings("deprecation")
    public static Document tweetToCouchDocument(Tweet tweet) {

        Document couchDocument = new Document();

        couchDocument.setId(String.valueOf(tweet.getId()));
        couchDocument.put("Tweet", tweet.getText().toString());
        couchDocument.put("UserName", tweet.getFromUser().toString());
        couchDocument.put("Time", tweet.getCreatedAt().toGMTString());
        couchDocument.put("URL", tweet.getSource().toString());

        return couchDocument;

    }


Now we can try to write the Twitter Search results to the CouchDB document collection with the following function.

    /**
     * @param tweetQury
     * @param dbName
     * @throws TwitterException
     */

    public static void writeTweetToCDB(String strTweetQury, String strdbName)
            throws TwitterException {
        QueryResult tweetResults = getTweets(strTweetQury);
        Database dbInstance = connectCouchDB(strdbName);
        dbInstance.getAllDocuments();
        for (Tweet tweet : tweetResults.getTweets()) {
            Document document = tweetToCouchDocument(tweet);
            dbInstance.saveDocument(document);
        }

    }

Now it is time to execute our project. Add the following lines to the main() and run the project.

        String query = "java";
        String dbName = "javatweets";
        System.out.println("Started");
        writeTweetToCDB(query, dbName);
        System.out.println("Finished");


That is all !!!!!! .

The entire code is available at my bitbucket repo

Happy Hacking !!!!!!!!

 Permalink

CSV to CouchDB data importing, a Python hack

Last month I was playing with Apache CouchDB. Just some introductory stuff, map reduce etc... Soon I received some Linguistic data in .cvs format, as part of the project which I was managing. There was a need to analyze it. Usually we used MySQL/Spreadsheets  to store and analyze the data. Suddenly I thought why can't I do it with CouchDB ?? . There was no direct option for import CSV data to CouchDB. I searched in the web and ended with a hint. Manilal a friend of mine also pointed to the same hint http://www.apacheserver.net/Load-CSV-file-into-couchdb-at1056996.htm .

Soon I created a small script to do the job aka load CSV file to CouchDB. The script is available in my Bitbucket repo https://bitbucket.org/jagan/misc/src/84cefb61c86a/csv2couch.py . It is a quick solution. May be you may be have a better version !!! I thought putting it in the web may help somebody else.


Happy Hacking !!!

Related Entries:
Using Yahoo! Term Extractor web service with Python
Python workshop at Kongu Engineering College, Perundurai
FOSS Workshop at PSR Engineering College Sivakasi
Book Review: Python 2.6 Text Processing Beginner's Guide by Jeff McNei
New book by Packt:'Python Text Processing with NLTK2.0 Cookbook'
Comments (0)  Permalink
1-3/3