Discover more from Sanity Check
Data Repair Work
Sanity Check • No. 016
I’m Ben, and welcome to Sanity Check. The data field is so funny with their titles. Here, I don’t worry about them. If you don’t mind meandering through the full range of analytic jobs to be done, you just might like this newsletter.
QQ: QUICK QUOTES
Here’s what’s new with me:
🌀 Hurricane Idalia passed by this week. My family left town, but the initial report is everyone is safe and our home is still standing. My thoughts and prayers are with those that were more severely impacted 🙏
👂 I was accused of eavesdropping to a conversation in Spanish. ¡Pero, no comprendo español!
💅 Over the weekend I put a fresh coat of paint on RayData.Co. This Substack will continue to be the home of my weekly wanderings. The main site will serve as the host for evergreen content.
🚧 Caution: Opinions still settling, and I’d love to hear yours
Data Repair Work
Data projects / programs / products - whatever the name - are notoriously difficult to maintain. It is difficult because data can break in so many different ways.
In tackling this complexity I always find it helpful to enumerate the possibilities. I thought of 8 ways a data project could break.
Once you have the possibilites mapped out you can begin matching them up with a response plan. For example:
A metric definition update could be put through a deprecation process. The new definition lives along side an old definition for a period of time to help stakeholder adjust to the impact of the change.
With schema changes, you may want to be alerted, but not automatically sync new changes - otherwise you could end up with a surprisingly large Fivetran bill.
It is easy to get out in front of unexpected granularity changes by defining surrogate keys and adding “uniqueness” & “not null” tests to the new column.
By having these response plans in place, and even some proactive measures, you will add stability to your data project. With stability comes trust from the business. Then with trust you will be given new opportunities to add value!
What other ways do you see data projects break?
What methods do you have in place to respond to a break?
A few interesting articles, podcasts, or websites I recently came across
There’s no clear theme to this week’s selected columns, but I’ve ordered them from most relevant to general interest.
The Beautiful Mess: Maintenance, KTLO, and BAU - John Cutler recently wrangled his thoughts on the stigma associated with “maintenance” or “keeping the lights on.” At the end he shares a model to better balance new feature work with managing the complexity of what has already been delivered. (link)
Building a Business Review Process From Scratch - This presentation is from Data Council earlier in the year, but I just came across it last night. It’s a great blueprint for how data teams can deliver value beyond dashboards. (link)
dbt Plugins - There was a hidden feature in dbt’s 1.6 release - plugins. Nicholas Yager walks through this new capability. I’m going to stew on what new possibilities this opens up. (link)
AI Induced Demand - There is speculation that LLMs are going to replace white collar labor, including data teams. Tomasz Tunguz shares a quick zag. The AI efficiency gains are going to lead to more demand of data teams. (link)
Stratechery Interview on the AI Hype Cycle - If you are a Stratechery subscriber this interview with Daniel Gross and Nat Friedman was an interesting one. Their discussion of how this technological shift compares to other paradigm shifts stuck with me. Daniel Gross likened it to the iPhone while Ben Thompson framed as similar to the introduction of mainframe computers. (link)
Personal Websites - After reworking my website this weekend Nate Kadlac rolled out a big guide to effective personal websites. Now I’ve got to rework it again!
But seriously, this is a good resource for how a personal site can help in unexpected ways. (link)
Thank you for reading.
Let’s keep it going. 💜
If you enjoyed this edition, would you mind giving the heart below a click?