So, you’re dumping data into BigQuery and suddenly your storage bill is looking a little spicy. You’ve got partitions, but they’re hanging around forever. This happens. It’s growing pains. I’ve been there. Let’s go over how to set partition expiration so you can automate the cleanup and keep costs down.
Why Bother with Expiration?
You don’t need to keep that raw event log from two years ago sitting in your warehouse. Partition expiration lets you tell BigQuery to auto-delete partitions after a set time. It saves money. It keeps things tidy. Stilll, some folks forget to set it and wonder why their storage is huge. Don’t be that person.
Setting Expiration via SQL
Just use the OPTIONS clause when you create the table. It’s straightforward.
CREATE TABLE `my_project.my_dataset.logs`
PARTITION BY DATE(timestamp_col)
OPTIONS (
partition_expiration_days = 30
)
AS SELECT * FROM `source_table`;
So, partition_expiration_days is the magic here. Set it to 30 and BigQuery handles the rest. No cron jobs needed. No messy scripts. Just set it and forget it.
Updating an Existing Table
What if you already have a table running wild? No worries. You can alter it without dropping everything.
ALTER TABLE `my_project.my_dataset.logs`
SET OPTIONS (partition_expiration_days = 7);
Run this and boom. Future partitions will expire based on this rule. Note: this doesn’t retroactively delete old partitions, so check your data first if you need to clean up the past manually.
Doing it in the Console
If you prefer clicking buttons, head to the BigQuery console. Open your table details, scroll to partition settings, and set the expiration time. Save and you’re good.
Sometimes the UI is just faster. I still prefer SQL because I can version control it, but hey, use what works for you.
Wrap Up
So, partition expiration is a no-brainer. Set it, forget it, and watch your storage costs drop. Don’t let old data rot in your warehouse. Happy coding!