Tuesday, September 27, 2011

Informatica & hadoop... solutions for future ?

Distributed computing using hadoop has taken the IT industry by a whirlwind in the last few years.  After getting almost "adopted" by yahoo, hadoop has progressed quite fast, and is now maturing slowly but steadily.

More and more enterprise solution providers are annoucing their support for the hadoop platform, hoping to get a pie of the big Data business chunk.  Its possibly a fair thing to expect that the leader in Data Integration business solutions space, Informatica has also announced a tie up with Cloudera, for porting Informatica platform to hadoop.

Though the exact details are yet to come out, the possibilities are endless.  With hadoop (and its inherent distributed computing based on map/reduce technology), informatica can actually think of processing big data in sustainable time frames.

For one my customers, I deal with about 200 million rows of data per day in one job.  Besides the issues with oracle in tuning the query etc, the informatica component itself consumes times in terms of hours.  With map reduce in place, I hope to get that in minutes, oracle issues notwithstanding.

Although word about hadoop is spreading quite fast, its adoption (from buzzword to actual usage in enterprise) is not as fast.  To aid their cause, Informatica and cloudera have started an interesting series of webinars, termed as "hadoop tuesdays".  Its free to join, and they get experts to talk about various related issues around hadoop and big data and informatica.  Its been very useful and informative so far.