Friday, October 14, 2011

Data Lineage.. what is that ?

It is one of those buzzwords, that keep doing the circuit every once in a while. Almost every enterprise wants to do the analysis regarding this, and is almost always hard to find people with knowledge/experience doing this kind of analysis.

For the unaware, Data Lineage is basically (really in very short words) a study of the data from its source to its eventual target, similar to what we'd do for our generation tree, we analyze the generation analysis of the data we are dealing with.

Starting from the source of the data, it travels through different subsystems, sometimes going through transformations, and thus possibly changing shape too...

Informatica had a very interesting blog post around this (already in 2007), which can turn out to be fairly informative.

