How to configure hardware for Spark ?

How to configure hardware for Spark ?

    1. Run Spark on the same nodes as HDFS. The simplest way is to set up a Spark in YARN mode.
    2. Start with minimum nodes ( i.e 1 master 2 worker nodes ) if you are not sure about the Data volume
    3. As spark runs on the memory , we recommend to take up to 256GB Ram memory for each worker node
    4. Change the Execution memory for each user, as the default in Cloudera for example is less than 10Gb
    5. For low-latency data stores like HBase, it may be preferrable to run computing jobs on different nodes than the storage system to avoid interference.
    6. Please choose “Run in memory and disk” mode for the transformations in Spark. ( In your code or your ETL )
    7. Note that the Java VM does not always behave well with more than 200 GB of RAM. If you purchase machines with more RAM than this, you can run multiple worker JVMs per node.

Leave a Reply