Why I choose Scala for Apache Spark project ?
Apache Spark currently supports multiple programming languages, including Java, Scala and Python. Words on the street is that Spark 1.4, expected in June, will add R language support too. What language to choose for Spark project is a common question asked on different forums and mailing lists.
The answer to the question is quite subjective. Each team has to answer the question based on its own skillset, use cases, and ultimately personal taste. For me personally, Scala is my language of choice.
First of all, I elimiate Java from the list. Don’t get me wrong, I love Java. I have been working on Java for more than 14 years. However, when it comes to big data Spark project, Java is just not suitable. Compared to Python and Scala, Java is too verbose. To achieve the same goal, you have to write many more lines of codes. Java 8 makes it better by introducing Lambda expressions, but it is still not as terse as Python and Scala. Most importantly, Java does not support REPL (Read-Evaluate-Print Loop) interactive shell. That’s a deal breaker for me. With an interactive shell, developers and data scientists can explore and access their dataset and prototype their application easily without full-blown development cycle. It is a must-have tool for big data project.
Now it comes down to Python vs. Scala. Both have succinct syntax. Both are Object Oriented plus Functional. Both have passionate support communities.
I ultimately choose Scala due to the below reasons