Removing Duplicate Elements From An Array in BigQuery

We’ll be removing duplicate elements from an array in BigQuery in this post. Let’s set the stage. It’s a chilly night in the office and marketing reaches out asking for data. PANIK. But they’ve given you a general location of where that data is located. CALM. You do a quick select * of the table …

Read more

Horizontally Scaling A WordPress Website – Part 1 – The Overview

Let’s start off with saying that horizontally scaling a wordpress website is very easy to do but is kinda expensive. I’d say it’s about $100+ a month to leave this infra consistently standing in AWS. Most of the cost is in the DB and EC2 instances. So, beware of forgetting about infra you stand up …

Read more

Apache Airflow DAG is Failing Silently

So your Apache Airflow DAG is failing silently. Are you running an ETL on a huge dataset? This is a symptom of an Airflow instance without sufficient memory. Dig into your instances logs and you’ll probably see an evicted worker if your running your instance’s workers on Kubernetes. You’ll see similar logs wherever you run …

Read more

Listing the Largest N Files or Folders Recursively

Listing the largest N files or folders recursively is handy. Let’s frame a usecase. So, you’re Ubuntu server is tanking. You’re running Jenkins or something else and your job logs have just started piling up. You’re UI doesn’t work anymore and the only thing you can do is SSH into your instance. What do you …

Read more

Functional Annotations in Python 3.x

Have you used Functional Annotations in Python 3.x? Maybe you’ve heard them mentioned? Regardless, let’s explore what they are and how they help us. Because if they don’t help us, then we probably shouldn’t care. The Problem You’re programming and don’t know what thisRandomFunction should return. Maybe it’s a bool maybe it’s a string who …

Read more

How To Get Started With Apache Airflow?

When Airbnb was scaling rapidly, they faced the problem of organizing complex data pipelines. To combat this and become a data-driven organization, Airbnb launched Apache Airflow in 2015, their custom-made open-source platform to manage complex workflows. In simple words, Apache Airflow is a platform where you can create, schedule, and monitor complex workflows using simple …

Read more

Editing all Elements of a DataFrame According to a Condition

Let’s work on manipulating a dataframe. Let’s work on editing all elements of a DataFrame according to a condition. You have a horde of data you just imported from a CSV or an Excel doc. You’ve managed to get the data into a Pandas DataFrame using one of the built in import methods like read_csv …

Read more

Remove Your Computer’s Name From your Bash/Terminal on Ubuntu

Happen to be trying to remove your computer’s name from your bash/terminal on Ubuntu? I thought it was an eyesore when I was writing up some docs and was taking screenshots. I did some research and thought I’d pass along what I learned. My Setup Windows 10 Ubuntu Subsystem Edit your hidden .bashrc file! So …

Read more

What git branch am I in?

What git branch am I in? It’s an age old question I’ll ask myself maybe once an hour. I’ll make a big change, decide that I need to save the repo before I break something, I make a commit (often times with commitizen), and then git push origin … . But where was I pushing …

Read more

Exit mobile version