Best comparison : Tez vs Impala vs Drill vs Spark vs Flink

Best comparison : Tez vs Impala vs Drill vs Spark vs Flink

Impala:  Shipped by Cloudera, MapR, Oracle and Amazon since 2013, Impala is an open source tool developed by Cloudera to combat the slowness of Hive/MapReduce and to work as a promising interactive SQL-on-Hadoop solution. Impala includes a processing engine that is derived from Google Dremel and does not build on MapReduce. Best comparison : Tez vs Impala vs Drill vs Spark vs Flink

Impala process data in memory and is faster than Hive/MapReduce.  It initially lacked Hive’s breadth of capabilities, but has added many functions over time such as UDFs, COMPUTE STATS and window functions for aggregation.  Impala does not support mid-query fault tolerance. It supports data stored in HDFS, Apache HBase and Amazon S3.  Impala is best used with Parquet.  Depends on who you are talking to, some believe that Impala may be better than Hive on Tez.  Others believe that Hive on Tez is better than Impala.

Tez:   It was originated from Microsoft’s research paper and implemented mainly by Hortonworks.  In July 2014, Tez became a top level Apache project.  Its main goal is to improve Hive and Pig’s MapReduce jobs.It is shipped with Hortonworks and supported by Microsoft HDInsight and some third party applications like Datameer v5.0 or later.  Tez uses Directed Acyclic Graphs (DAGs) and does everything MapReduce does, but faster.  Tez enabled interactive SQL for Hive.  Tez is best used for queries on poorly defined heterogeneous data in Hive.  It is tightly integrated with MapReduce and, unfortunately, inhered the same limitation of rigidness as MapReduce.  The best use case for Tez is for heavy Hive users to speed up their query performance on Hive.

Best comparison : Tez vs Impala vs Drill vs Spark vs Flink

Drill:  Led by MapR, Drill has become an Apache top level project in December 2014.  Like Impala, Apache Drill is based on Google’s Dremel, they are native massively parallel processing query engines on read-only data.  To its advantage, and in contrast to Impala, Drill uses schema-free document model similar to MongoDB so that it can query non-relational data easily.  Drill can discover metadata dynamically and does not have to use Hive’s metastore like Impala.  However,

Full report : Tez vs Impala vs Drill vs Spark vs Flink


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.