Advanced Analytics for Spark
Authored by a substantial portion of Cloudera’s Data Science team (Sean Owen, Sandy Ryza, Uri Laserson, Josh Wills), Advanced Analytics with Spark (currently in Early Release from O’Reilly Media) is the newest addition to the pipeline of ecosystem books by Cloudera engineers. I talked to the authors recently.
Why did you decide to write this book?
We think it’s mostly to fill a gap between what a lot of people need to know to be productive with large-scale analytics on Apache Hadoop in 2015, and the resources that are out there. There are plenty of books on machine learning theory, and plenty of references covering how to use Hadoop ecosystem tools. However, there is not as much specifically targeting the overlap between the two, and focusing on use cases and examples rather than being a manual. So the book is a modest attempt to meet that need, which we see turn up frequently among customers and in the community.
Who is the intended reader?