Spark + Scala Boilerplate Project

After setting up a few Spark + Scala projects I decided to open-source a boilerplate sample project that you can import right into IntelliJ and build with one command.

Usually I write Apache Spark code in Python, but there are a few times I prefer to use Scala:

  • When functionality isn’t in PySpark yet.
  • It’s easier to include dependencies in the JAR file instead of installing on cluster nodes.
  • Need that extra bit of performance.
  • Even more reasons here on StackOverflow.

One of the downsides to using Scala over Python is setting up the initial project structure. With PySpark, a single “.py” file does the trick. Using this boilerplate project will make using Spark + Scala just as easy. Grab the code and run “sbt assembly” and you’ll have a JAR file ready to use with “spark-submit”.

Check it out here: https://github.com/mikestaszel/spark-scala-boilerplate

 

Mike Staszel