Jupyter Notebook extension for Apache Spark integration.
Includes a progress indicator for the current Notebook cell if it invokes a Spark job. Queries the Spark UI service on the backend to get the required Spark job information.
This is really neat. No more checking another tab for job progress when running cells in a notebook!
You’ll definitely want to read this if you’re using AWS Kinesis with Apache Spark to stream data, it’s been extremely valuable:
New tiny GitHub project: https://github.com/mikestaszel/spark_cluster_vagrant
Over the past few weeks I’ve been working on benchmarking Spark as well as learning more about setting up clusters of Spark machines both locally and on cloud providers.
I decided to work on a simple
Vagrantfile that spins up a Spark cluster with a head node and however many worker nodes desired. I’ve seen a few of these but they either used some 3rd party box, had an older version of Spark, or only spun up one node.
By running only one command I could have a fully-configured Spark cluster ready to use and test. Vagrant also easily extends beyond simple Virtualbox machines to many providers, including AWS EC2 and DigitalOcean and this
Vagrantfile can be extended to provision clusters on those providers.
Check it out here: https://github.com/mikestaszel/spark_cluster_vagrant