How to Refresh and Update Airflow DAGs on EC2

Post about Apache Airflow Or Data Engineering

This past weekend I decided to spin up a quick Airflow deployment for some personal scripts I wanted executed on a schedule. I didn’t have scale in mind and I didn’t have robustness in mind. I had speed. I’ll review how I’m deploying my version-controlled DAG code to my EC2. I want to highlight, this … Read more

How to Cast Array Elements Using Lambda and Map in Python

python

Let’s cast an array of items using lambdas in Python. It’ll look cleaner than its forloop counterpart. They’re a great way to clean up super verbose code and helpful when doing array manipulation. Before we begin, if you don’t know what a Python lambda is please check out this article we recently wrote. My most … Read more

How to Run Apache Airflow Locally in Docker

Post about Apache Airflow Or Data Engineering

We’ll walk you through how to run Apache Airflow locally in Docker. The first chunk of this post will cover how to get Airflow standing, the second will go into some nuance and will answer several whys. My Setup Getting Airflow Standing Docker Airflow Looking for more detailed info? Check out Airflow’s official docs. Happy … Read more

How to Upload a Pandas DataFrame to DynamoDB with Python

python

So you’re trying to upload a Pandas DataFrame to DynamoDB using Python? Let’s take a step back first. Why are we using DynamoDB? What is DynamoDB? What is DyanmoDB DyanmoDB is a non-relational fully managed database product offered by Amazon’s cloud computing arm AWS. So why would you go the DynamoDB route vs MySQL, Postgres, … Read more

How to Write to an Excel File in Python Using OpenPyXL

python

Prep First let’s make sure you have the OpenPyXL library installed. If you’re already in Python and don’t want to needlessly exit the interpreter, then type the following: This command should list out all the methods of OpenPyXL if it is in fact installed on your machine. Else, we’ll have to turn to pip. Assuming … Read more