Update DAGs on EC2

This past weekend I decided to spin up a quick Airflow deployment for some personal scripts I wanted executed on a schedule. I didn’t have scale in mind and I didn’t have robustness in mind. I had speed. I’ll review how I’m deploying my version-controlled DAG code to my EC2. I want to highlight, this … Read more

Comms As A Data Engineer

Comms as a Data Engineer can be tough. Should you email a group of people? Should you dump a message in a public Slack channel they frequent? Should you follow up daily, weekly, etc? It’s a lot of manual labor. I don’t like manual work. Also, I hate email. This seems to be a common … Read more

Apache Airflow DAG Factories

What in the the world are Apache Airflow DAG Factories and why should you use them? Let’s go into what they are, why they’re used, and how they could make your life easier. We’ll also go into the nitty gritty of how to design and build one. Also, before I jump into this post, shout … Read more

How To Clone A Git Repo In Python – Updated

python

So, a loooong time ago I wrote this post on how to clone a Git repo in Python3. I used subprocess that first time around to run git commands. I was essentially trying to run git commands in python explicitly. But, there’s a better way to do this. It’s prettier, it’s easier to read. There’s … Read more

Cast an array of items using lambdas in Python.

python

Let’s cast an array of items using lambdas in Python. It’ll look cleaner than its forloop counterpart. They’re a great way to clean up super verbose code and helpful when doing array manipulation. Before we begin, if you don’t know what a Python lambda is please check out this article we recently wrote. My most … Read more

Run Apache Airflow Locally in Docker

We’ll walk you through how to run Apache Airflow locally in Docker. The first chunk of this post will cover how to get Airflow standing, the second will go into some nuance and will answer several whys. My Setup Getting Airflow Standing Docker Airflow Looking for more detailed info? Check out Airflow’s official docs. Happy … Read more

Execute Bash Commands and Return Results in Python

bash

I’ve had numerous cases where I’ve needed to execute bash commands and return the results in Python for some additional manipulation. My most recent example involved me working with BigQuery schemas. Getting schemas from the CLI was easy peasy but I’m not the best at BASH programming so I naturally turned to something more familiar … Read more

Apache Airflow DAG is Failing Silently

python

So your Apache Airflow DAG is failing silently. Are you running an ETL on a huge dataset? This is a symptom of an Airflow instance without sufficient memory. Dig into your instances logs and you’ll probably see an evicted worker if your running your instance’s workers on Kubernetes. You’ll see similar logs wherever you run … Read more

Functional Annotations in Python 3.x

python

Have you used Functional Annotations in Python 3.x? Maybe you’ve heard them mentioned? Regardless, let’s explore what they are and how they help us. Because if they don’t help us, then we probably shouldn’t care. The Problem You’re programming and don’t know what thisRandomFunction should return. Maybe it’s a bool maybe it’s a string who … Read more