Avatar
🚀

Follow me on:

  • The only package I ever install using Homebrew is TA-Lib. Everything else I need is already available. TA-Lib isn’t updated often and it’s easy to install by compiling from source. How-To This guide will work on macOS 11.1 (Big Sur). It works perfectly on an M1 (or Intel) Mac. First, download TA-Lib from SourceForge. Then run: tar xf ta-lib-0.4.0-src.tar.gz cd ta-lib ./configure --prefix=/usr/local make sudo make install That’s all it takes.
    Published January 3, 2021
  • Quick post mostly for my own reference since I always need to re-learn how to do this. This used to be more difficult in older versions of Spark, but when using Spark 2.4 or later, all you have to do is: wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar -P $SPARK_HOME/jars/ wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.3/hadoop-aws-2.7.3.jar -P $SPARK_HOME/jars/ That’s all there is to it. The s3a:// prefix should work now for reading and writing data using Spark.
    spark Published November 29, 2020
  • I finished a thing in my free time! Chipee! Goals I’ve always wanted to write an emulator and this is the first time I actually got around to finishing one! My goals were to learn about how to write an emulator and to re-learn C. It’s been years since I wrote any halfway decent C. I’ve also never done anything using SDL, sound, or even a window with graphics. Why CHIP-8?
    development Published January 1, 2020
  • If you’re doing any production-level work in AWS, you should be using AWS CloudFormation. It’s really easy to get started. Let’s walk through the basics. Why use CloudFormation? Here’s a common scenario: creating an EC2 instance and assigning an Elastic IP address. Let’s say it’s for a web server. Great! That’s easy. Just spin up an EC2 instance. Choose the correct image, size, security groups, VPC, subnet, keypair, and so on.
    aws Published March 28, 2019
  • Another quick post — found this in the AWS Console UI. If you ever need to share your AWS Canonical ID with someone, e.g. to share S3 buckets. You can find your AWS Canonical ID by using various APIs — but I was also able to find it using the AWS Console UI. By opening up the S3 console and selecting a bucket you own, you can view the Canonical ID by viewing the Access Control List in the Permissions tab.
    aws Published February 28, 2019
  • One thing I hear often from people starting out with Spark is that it’s too difficult to install. Some guides are for Spark 1.x and others are for 2.x. Some guides get really detailed with Hadoop versions, JAR files, and environment variables. Here’s yet another guide on how to install Apache Spark, condensed and simplified to get you up and running with Apache Spark 2.3.1 in 3 minutes or less.
    development spark Published September 19, 2018
  • Google recently launched a preview of Colaboratory, a new service that lets you edit and run IPython notebooks right from Google Drive — free! It’s similar to Databricks — give that a try if you’re looking for a better-supported way to run Spark in the cloud, launch clusters, and much more. Google has published some tutorials showing how to use Tensorflow and various other Google APIs and tools on Colaboratory, but I wanted to try installing Apache Spark.
    spark python Published March 7, 2018
  • I had a hard time figuring out how to make a Go program execute a command and make that program take over the console. I wanted my program to launch an SSH session. I recently started working on a tool to help me SSH into EC2 instances (more details coming in a future blog post). The goal was to automatically open up an SSH session into an EC2 instance. It’s easy to execute a program like ssh but the input and output of that program is lost.
    development go Published February 21, 2018
  • I recently ran into a use case that the usual Spark CSV writer didn’t handle very well — the data I was writing had an unusual encoding, odd characters, and was really large. I needed a way to use the Python unicodecsv library with a Spark dataframe to write to a huge output CSV file. I don’t know how I missed this RDD method before, but toLocalIterator was the cleanest, most straight-forward way I got this to work.
    spark Published February 5, 2018
  • I recently finished reading Data Science from Scratch by Joel Grus. This book is a great introduction to data science concepts. It uses real code to demonstrate complex Python, data analytics, data science, and machine learning concepts. I’m really glad I picked up this book as the first book I’ve read about machine learning. There was a great combination of mathematics, statistics, and real applications of machine learning algorithms. The book starts out with a quick introduction to Python, followed by an in-depth review of all the math you need for the code to make sense.
    books Published July 10, 2017