Note: This is a re-upload from my last website, it's actually been ~9 months since I left Facebook.
I take a lot of pride in what I'm able to do at work. I think this dedication pays off, but I'll let you be the judge.
An Accidental Deletion Pipeline
Context: We have accidental data-loss SEVs. Oftentimes,
there is a significant portion of time investigating which objects are
recoverable, and then pushing the recovery step by step through metadata
repair, cold storage restores, etc. This project:
- helped automate the restore process by streamlining all the necessary steps to attempt to recover the data from backups.
- saves on-call time, prevents human mistakes, and is well maintained compared to ad-hoc scripts written by team members.
A Bulk Metadata Editor
Context: We have a lot of blobs. Like... exabytes. Sometimes
we want to edit the metadata of records in the ~100k-1m range. To do
that, you would write some filler code in a pipeline that was meant to
handle state transitions (i.e. active to deleted), not anything along
the lines of "If this blob is in this list of blobs, do this edit." This
project:
- allows an Everstore Engineer to bulk-edit blobs based on a simple,
declarative binary or extend it to implement their own logic (like what
is mentioned above). It safeguards us from making unexpected changes,
and allows the engineer to complete a “dry-run”, which previously
wouldn’t have been possible.
- runs these edits extremely quickly in parallel, processing ~100k records per minute.
Off-hand efficiency improvements
- Extended configuration on Cold Storage backups, decreasing
compaction overhead by 35% during high-stress COVID load. To quantify
this percentage further, this is about 1.2EB of savings.
- Decreased our online queue of pending volume deadings from 3M to 300k by short-circuiting from known failures.