Apache Storm vs. Apache Spark

Apache Storm vs. Apache Spark

Apache Stormstorm_logo_winner

  • Storm is a distributed real-time computation system
  • Apache Storm is a task parallel continuous computational engine.
  • Storm defines its workflows in Directed Acyclic Graphs (DAG’s) called “topologies” which run until shutdown by the user or encountering a failure.
  • Storm does not natively run on top of typical Hadoop clusters, it uses Apache ZooKeeper and its own master/ minion worker processes to coordinate topologies, master and worker state, and the message guarantee semantics.
  • both Yahoo! and Hortonworks are working on providing libraries for running Storm topologies on top of Hadoop 2.x YARN clusters
  • Storm can run on top of theMesos scheduler as well, natively and with help from the Marathon framework.
  • Regardless though, Storm can certainly still consume files from HDFS and/ or write files to HDFS.

Apache Spark


    • Apache Spark is a fast and general purpose engine for large-scale data processing
    • Apache Spark is adata parallel general purpose batch processing engine.
    • Workflows are defined in a similar and reminiscent style of MapReduce, however, is much more capable than traditional Hadoop MapReduce.
    • Apache Spark has its Streaming API project that allows for continuous processing via short interval batches.





Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.