One of the biggest, most time-consuming parts of data science is analysis and experimentation. One of the most popular tools to do so in a graphical, interactive environment is Jupyter.
Combining Jupyter with Apache Spark (through PySpark) merges two extremely powerful tools. AWS EMR lets you set up all of these tools with just a few clicks. In this tutorial I’ll walk through creating a cluster of machines running Spark with a Jupyter notebook sitting on top of it all.