Airflow Dag Factory

Post about Apache Airflow Or Data Engineering

What in the the world are Apache Airflow DAG Factories and why should you use them? Let’s go into what they are, why they’re used, and how they could make your life easier. We’ll also go into the nitty gritty of how to design and build one. Also, before I jump into this post, shout … Read more

Gitpython Clone

python

So, a loooong time ago I wrote this post on how to clone a Git repo in Python3. I used subprocess that first time around to run git commands. I was essentially trying to run git commands in python explicitly. But, there’s a better way to do this. It’s prettier, it’s easier to read. There’s … Read more

Extract Domain from URL in BigQuery Using NET.REG_DOMAIN

Posts about BigQuery

This post will show you how to pull a domain from a full website path in BigQuery. So let’s set the stage for a hypothetical. You own a URL shortener company. You want to partner with a website for whatever reason. You decide that you want to do analysis over the data you’ve streamed or … Read more

How to Use the HAVING Clause in BigQuery (With Examples)

Posts about BigQuery

In what situation would you want to use BigQuery’s having clause outside of an interview? We’ll go over a couple of use cases and how I use it as a Data Engineer for Reddit. My Setup What Is BigQuery? BigQuery is a data warehouse as a service. Google handles your compute, your storage, and does … Read more

Essential Terminal Commands for Beginners (Mac, Linux, Windows)

bash

Let’s go over some terminal commands for Beginners. This assumes you have basic computer knowledge and might have some sort of interest in software development. Terminal commands, especially those for linux systems, will get you pretty far. My Setup MacOS Opening Your Terminal On a mac hit command+space. This will open a search window. My … Read more