Big Data Needs More ETL
Because of the inability of NoSQL and Hadoop to perform ad-hoc joins and data aggregation, more ETL is required to pre-compute data in the form of new data sets or materialized views needed to support end-user query patterns. In the NoSQL world, it is common to see the same event appear in several rows and/or collections, each aggregated by different dimensions and levels of granularity.
Access to most dimensional data must be “denormalized” into relative events or facts. This means Big ETL now has an increased payload in materializing these additional views. Additionally, process orchestration, error recovery, and data quality become more critical than ever to ensure there are no anomalies caused within the required data redundancy.