Wednesday, 19 January 2011

My take on why businesses have problems with ETL tools

Check out this very nice piece by Rick about the reasons why companies have failed to get the most out of their ETL tools.

My take is from the other side of the fence. As a business user I'm often frustated by ETL tools and have been known to campaign against them for the following reasons:

> ETL tools have been too focussed on Extract-Transform-Load and too little focused on actual data integration. I have complex integration challenges that are not necessarily a good fit for the ETL strategy and sometimes I feel like I'm pushing a square peg into a round hole.

> It's still very challenging to generate reusable logic inside ETL tools and this really should be the easiest thing in the world (ever heard the mantra Don't Repeat Yourself!). Often the hoops that have to be jumped through are more trouble than they are worth.

> Some ETL tools are a hodge podge of technologies and approaches with different data types and different syntaxes wherever you look. (SSIS I'm looking at you! This still is not being addressed in Denali.)

> ETL tools are too focused on their own execution engines and fail miserably to take advantage of the processing power of columnar and MPP databases by running processes on the database. This is understandable in open source tools (database specific SQL may be a bridge too far) but in commercial tools it's pathetic.

> Finally, where is the ETL equivalent of SQL? Why are we stuck with incompatible formats for each tool. The design graphs in each tool look very similar and the data they capture is near identical. Even the open source projects have failed to utilise a common format. Very poor show. This is the single biggest obstacle to more widespread ETL. Right now it's much easier for other parts of the stack to stick with SQL and pretend that ETL doesn't exist.

Disqus for @joeharris76