As I mentioned earlier, I was busy experimenting with a columnar database, InfoBright.
The experience till now is mixed, in some scenarios, the data loading is very good, as much as 40k rows per second, whereas through other channels its a poorer 500 rows per second. When I go with their built in loader, its lightening fast, but when I try from Pentaho or Informatica, its measly.
Apparantly, the drivers and the compatibility of the third party tools do play a role in attaining the performance.
The fact that they dont have any native driver published, is a huge bottleneck. Informatica dont even have a native connector for MySql, the more known cousin of InfoBright (being open source, the core engine of InfoBright is based on MySql only).
One thing came out of this experience for sure, the liking for open source got better and better. During the hunt to see the reason for slow performance from Pentaho, I even tried and managed to get the source code of the plugin (transformation) that pentaho uses for Infobright. Its pure java and it was a very powerful and awakening feeling to see the code of the item. I felt as if I have the power, the choice to make a difference, make it better. :)
I am currently exploring other options, one of them includes creating a dump file (read : csv) of the data, and then launching the command line tool to load the data into target db. I dont like it, but lets see if there is any better (read:faster) way around...
Post a Comment