AWS Certified Solutions Architect · August 8, 2018 · amazon aws ec2 linux

Quick post – I’ve been busy studying for the AWS Certified Solutions Architect – Associate exam for the past few weeks – good news, I passed it a few days ago! Shoot me a note if you ever need some solutions architected. I primarily did this because I&…

azssh: Easily manage EC2 instances · June 16, 2018 · aws go golang

azssh is a small commandline utility I wrote a few months ago to help with managing EC2 instances. My workflow on EC2 consists of starting and stopping instances and sometimes SSHing in to run some commands. That’s what this utility does – starts and stops EC2 instances, tells…

Vowpal Wabbit Docker Image · May 17, 2018 · docker linux machine learning ml vowpal vowpal wabbit

Vowpal Wabbit is a really fast machine learning system. A few months ago I put together a Docker image of Vowpal Wabbit, making it easy to run on any platform. It’s been sitting up on Github and the Docker Hub, but I forgot to write a blog post!…

Spark + Scala Boilerplate Project · April 16, 2018 · github scala spark

After setting up a few Spark + Scala projects I decided to open-source a boilerplate sample project that you can import right into IntelliJ and build with one command. Usually I write Apache Spark code in Python, but there are a few times I prefer to use Scala: When functionality isn&…

Fixing WordPress Jetpack Connection Errors · April 12, 2018 · apache bugs php wordpress

I recently migrated my WordPress installation from an old Debian 8 Google Cloud instance to Debian 9. I decided to do the installation myself this time instead of using a Bitnami image for greater control. I couldn’t get certbot (a Let’s Encrypt client for free SSL…

Apache Spark on Google Colaboratory · March 7, 2018 · apache colab data google pyspark python spark tutorial

Google recently launched a preview of Colaboratory, a new service that lets you edit and run IPython notebooks right from Google Drive – free! It’s similar to Databricks – give that a try if you’re looking for a better-supported way to run Spark in the cloud,…

Execute Interactive Programs from Go · February 21, 2018

I had a hard time figuring out how to make a Go program execute a command and make that program take over the console. I wanted my program to launch an SSH session. I recently started working on a tool to help me SSH into EC2 instances (more details coming…

Writing Huge CSVs Easily and Efficiently with PySpark · February 5, 2018 · data pyspark python spark

I recently ran into a use case that the usual Spark CSV writer didn’t handle very well – the data I was writing had an unusual encoding, odd characters, and was really large. I needed a way to use the Python unicodecsv library with a Spark dataframe to…

Jupyter Notebooks with PySpark on AWS EMR · October 16, 2017 · aws emr jupyter pyspark python spark

One of the biggest, most time-consuming parts of data science is analysis and experimentation. One of the most popular tools to do so in a graphical, interactive environment is Jupyter. Combining Jupyter with Apache Spark (through PySpark) merges two extremely powerful tools. AWS EMR lets you set up all of…

Vowpal Wabbit - Ramdisk vs. EBS-Optimized SSD · October 9, 2017 · data data science machine learning ml vowpal vowpal wabbit

Recently I started playing around with Vowpal Wabbit and various data sets. Vowpal Wabbit promises to be really fast, so much so that disk IO is one of the most common bottlenecks according to the author. I did a quick test to see if using a RAM disk would make…