@joeharris76: Micro-review of InformaticaCloud for Amazon Redshift

I recently finished an evaluation of Informatica Cloud's platform as a candidate to take over our Redshift ETL workload.

I'm posting this 'thumbnail' for the sake of my own memory as much anything else. :)

TL;DR - It's very _close_ to being an ideal solution but, sadly, has some flaws we can't work around right now.

Good:

Very straightforward agent based architecture. Very simple to install and consume.
Reliably syncs/replicates *simple* data from source DBs via flat files and cleans up after itself.
Performs simple and complex joins on the data source. This is a *major* weakness for competitors as they can't do this.
All functionality is wizard driven and probably consumable by line of business users.

Bad:

Cannot create SQL based transforms without uploading PowerCenter mappings. This is too much to ask of LOB users.
Extracts land as plain text on the agent host before gzipping. This is a *very* slow approach. They should pipe directly to gzip.
Logging is *extremely* verbose. Every single row is written to the success or error. You'll quickly create many GB of logs.
Upsert operations update *every* matching row in Redshift. This is handled as a delete plus and insert within Redshift, which means we then need a very slow vacuum to reclaim space.
Finally, the deal breaker, cannot handle `timestamp with timezone` data types. In 2014. Redshift cannot handle these either but I need the ETL platform to be able to work with them.

Monday, 8 December 2014

Micro-review of InformaticaCloud for Amazon Redshift

1 comment:

Disqus for @joeharris76