Installing Spark on Ubuntu in 3 Minutes

One thing I hear often from people starting out with Spark is that it’s too difficult to install. Some guides are for Spark 1.x and others are for 2.x. Some guides get really detailed with Hadoop versions, JAR files, and environment variables. Here’s yet another guide on how to install Apache Spark, …

Apache Spark on Google Colaboratory

Google recently launched a preview of Colaboratory, a new service that lets you edit and run IPython notebooks right from Google Drive — free! It’s similar to Databricks — give that a try if you’re looking for a better-supported way to run Spark in the cloud, launch clusters, and much more. Google …

Execute Interactive Programs From Go

I had a hard time figuring out how to make a Go program execute a command and make that program take over the console. I wanted my program to launch an SSH session. I recently started working on a tool to help me SSH into EC2 instances (more details coming in a future blog post). The goal was to …

Writing Huge CSVs Easily and Efficiently With PySpark

I recently ran into a use case that the usual Spark CSV writer didn’t handle very well — the data I was writing had an unusual encoding, odd characters, and was really large. I needed a way to use the Python unicodecsv library with a Spark dataframe to write to a huge output CSV file. I don’t …