The Data Engineer’s Struggle In a Fragmented And Politicized World, or… Confronting A Crisis Of Credibility

“Argh, again with shooting the messenger!”, I found myself thinking on Friday when news broke of Trump firing Bureau of Labor Statistics chief Erika McEntarfer in the wake of irksome numbers. “This reeks of elbowing CDC out of COVID reporting channels in 2020”, I further reflected dourly. Thus continues the trend of quashing the purveyors of inconvenient data while ironically and shamelessly doing so under the banner of fighting Fake News. The firehose of current events tempts us to disengage out of fatigue, and the regular drops of even more sensational stories threaten to consume what limited bandwidth remains, but if we wish to sustain a functional democracy and solvent economy then we must keep our eye on the millstone steadily grinding away at key institutions.

Anyone in the trenches during this phase of the Artificial Intelligence hype cycle knows well that, for all the glamor accorded the “AI Experts”, most use cases live downstream of an Herculean effort in the humbler realm of “Data Engineering”, a task that involves not just collecting, curating, normalizing, enriching, and serving data, but also regularly swimming even further upstream to shape its very generation. Reasonable people can disagree about the interpretation of data and how to act on it, but the integrity of that upstream data is crucial to our having any hope of behaving rationally, and politicized attempts to disrupt its flow or bend its interpretation when the conclusions prove inconvenient threaten a broad unraveling of evidence-based decision making.

Perhaps the BLS does warrant some manner of overhaul. I’m not close enough to the problem to know, but given the priors one might reasonably assume that a federal institution founded 140 years ago harbors many archaic processes, carrying on more out of habit than intent as the world has moved on. Yet Trump’s knee-jerk reaction, accompanied by no meaningful explanation, clearly reveals motivated reasoning, as we can fairly assume that no such sacking would have occurred if the numbers had fit his narrative. This decision has all the authenticity of suddenly going quiet on election integrity concerns just as soon as the latest contest has broken your way.

As I ponder my own past struggles in similar realms, one concept that springs to mind involves compartmentalization — expressed as “Separation of Powers” in government architecture and “Separation of Duties” in security engineering — which brings me to reflect on my high school science fair experiences. By dint of fate I found myself employing Methotrexate, a chemotherapy agent, to torture hapless Sordaria fimicola, a fungus made popular for genetic experiments owing to its linear pod of eight spores, colored black or white, whose patterning reveals the rate of cross-over occurring during meiosis, subsequently fixed by mitotic division. My mission involved cultivating this fungus atop a gelatinous nutrient broth, agar, infused with differing concentrations of the drug to assess an hypothesized impact on cross-over rate.

Years ago, during a DoD background investigation, a grad school classmate described my character to an agent as “annoyingly honest”. I imagine, for better or worse, that all the people who have known me over the years would express similar sentiments. And so trust me when I say that I made no conscious attempt to cheat during science fair and yet also look somewhat askance at my former self. How can I know that bleary-eyed under-the-gun me wasn’t subject to some subtle subconscious pressure pushing me toward confirmation bias while counting zillions of fungal pods under a microscope?

I had my hands in too many parts of the process to trust my objectivity — experiment design, data collection, data interpretation, and finally singing and dancing in front of judges passing out ribbons and prize money. A more rigorous process would have involved blinding the counter (me, staring through a microscope) to the actions of the plater (also me, mixing the agar and drug atop which the fungus went) to force unswervingly objective data collection, but I don’t remember that being suggested to me, much less something I would have dreamed up on my own, and so we should not accord too much confidence in the conclusions of sixteen-year-old me. Imagine, then, the temptation facing the most powerful person in the world — to cheat at one of the highest-stakes games — if they could exert undue influence on every step of the data pipeline.

Considering a more recent epoch, I find myself pondering my struggles as a government Data Engineer centered on building a “no-code” “flow-based programming” platform, integrating myriad data sources into it, and evangelizing its adoption across the enterprise. Notably, this was occurring against the backdrop of SIGINT analysts for whom state of the art often involved hoarding bespoke Perl scripts and caches of data in private directories — a situation that obscured data provenance, stifled tradecraft propagation, and rendered methodologies brittle and opaque. Naive twenty-five-year-old me had scarce idea how uphill the fight would be.

The politics proved labyrinthine and the incentives perverse. With some regularity I found myself, while selling a transformative tool to stakeholders, channeling the frustrations of Skunk Works legend Ben Rich, who, upon suggesting that half a dozen stealth aircraft could do the work of hundreds of conventional bombers, would find himself rebuffed by generals looking to get their next star — “How am I going to get promoted managing so few planes?”. Data source owners, meanwhile, expressed a not-unreasonable fear of failing to get credit for their work if their place receded into being one black box among many in a complex flow whose underpinnings may fade in the collective consciousness, a serious risk to funding continuity. High-end analysts, lastly, felt their super-star status threatened, sensing a formal software engineering process poised to devour and commoditize their “secret sauce”.

“Show me the incentives and I’ll show you the outcome”, the late Charlie Munger would often quip, an aphorism that a much greener version of myself utterly failed to appreciate while imagining that the hardest problems were technical and that nailing them would assure adoption. To build a successful system that spans hardware, software, and Peopleware entails reconciling a staggeringly large number of competing interests — a challenge that lies at the heart of many of our biggest contemporary conflicts in an increasingly complex world.

With the explosion of COVID now five years behind us, one might hope that we could finally have a sober conversation about what happened, but the acrimonious nature of our political reality, stacked atop fundamental uncertainties, seems to be slowing the absorption of its lessons to a degree I found depressing even before the Groundhog Day moment of McEntarfer’s firing from the BLS last week.

I think we can gauge the intellectual honesty and general rationality of people by how dynamic their behavior was during the pandemic. One extremist camp dug in hard on a position of masking, distancing, shutdowns, and vaccination, centered on “trusting the science” and making what they viewed as pro-social sacrifices. Another group, equally rigid in their dogma, entrenched in a static political identity defined by a polar opposite stance on each facet, revolving around the distrust of elites and the primacy of freedom. A third cluster, meanwhile, hewed to the Precautionary Principle, arguing at the outset for an approach defined by risk aversion in the face of existential threats coupled with a paucity of precedent and information, but continually modified as we accumulated data that facilitated the clustering of populations by risk and the development of targeted mitigation tactics.

At best the real-time data we had during the pandemic was terrible. Sadly, our retrospective understanding isn’t that much better. Frankly, the whole episode was a Data Engineer’s worst nightmare, and our lack of progress fills me with dread for the next such cataclysm. Even setting aside the inherent difficulties of analyzing and acting on perfect information in such a crisis, the upstream data generation and transmission was itself horrendous. Who got tests, what kind, and when? What were the error rates of those tests? What qualified as a “death by COVID”? When health care providers transmitted such telemetry, how much context did they attach, and how well did ingesting systems maintain metadata about provenance and assumptions? How good of a job did data brokers do in normalizing data into common formats and making it available in a timely fashion? What mechanisms existed to issue and propagate corrections? These are the gritty details that hamper casual armchair analysis on a sunny day. In a global emergency, flubbing them costs lives and wrecks economies.

The hot mess of aforementioned problems, alas, only covers the struggles you will face before politics enters the room. Consider, now, the Data Engineering apocalypse in July 2020, in which the Trump administration declared the CDC’s data collection hub obsolete, decreed that all healthcare providers must redirect their data delivery within forty-eight hours to a brand new system hosted by the HHS and built on short-notice under a no-bid contract, and then threatened that any non-compliant providers would have their need-based federal support curtailed if they failed to comply. Hospitals, already desperately understaffed in the face of a summer surge, had to re-allocate personnel from providing actual healthcare to serving as data entry clerks to copy data between systems. Downstream analytics broke as CDC data feeds began to go dark. Predictable chaos ensued.

There may well have been good reasons to overhaul how data brokering worked in this ecosystem. And, yes, there is nothing like a good crisis to force long overdue change. Yet the ham-fisted way in which events unfolded in the midst of an enormous disaster suggests cynical and opportunistic behavior. Just a few months earlier, Trump had asserted that COVID would be a big nothing burger. Just a few months later, there would inconveniently be an election in the midst of an approval rating dip. And all the while George Floyd protests were roiling the streets of most major cities. How convenient, then, might it be to abruptly break the data flows that would likely have painted a dire picture of a summer surge taxing a healthcare system already stretched to the breaking point? Such are the temptations of regimes everywhere flirting with authoritarianism when confronted with bad news.

To behave like this, however, further mortgages our future when we are already saddled with crippling debt. And at a moment when the interest burden of our treasury notes stands poised to eclipse defense spending, even without considering that social security obligations are a form of debt, we can brook no confidence-undermining dalliances. We must wake up to the looming crisis of credibility.

I get that Trump doubtless finds bad news from the BLS infuriating at this moment of tariff horse trading, as if his weak hole cards were suddenly revealed not just to others but also himself during a tense game of No-Limit Texas Hold’em, but peremptorily firing the bearer of bad news just because it doesn’t fit his chosen narrative only makes a bad situation worse. The consequences of undermining confidence in what are supposed to be objective reporting bodies, to say nothing of independent central bankers, exemplified by the recent protracted and ruinous debacles in such places as Greece and Argentina, are clear — bond markets will demand increasingly punitive yields which will in turn drive borrowing costs through the roof and thereby foment a debt death spiral from which you cannot escape as you attempt in vain to roll over increasingly expensive obligations.

We may be tempted to shrug this off — the new acting commissioner is a well-respected longtime employee of the bureau and the cadre of career professionals at the BLS this week is the same as last week — but this abrupt termination doubtless sets a chilling precedent, albeit one as yet poorly understood. Think of how the disappearance of the sun from our solar system, banished at the snap of the fingers of a trickster god, would not be felt for eight minutes, owing to field propagation delays. So, too, are the effects belatedly felt in large organizations when key players and cultural cornerstones suddenly cease to be. Worse still, imagine how long it might take to assess the damage to an organization operating in a technical realm exhibiting a high degree of bi-temporality, one where a huge delay exists between reality and your lagging understanding of it.

I fear, to borrow language from Ray Dalio’s tome The Changing World Order, that we have long since transitioned from the “rich and you know it” phase to the subsequent “poor but think you are rich” delusion, now finding ourselves teetering on the precipice of a brutal reckoning, the excruciating “poor and you know it” denouement. There may still be time to right this ship but surely politicizing — and thereby sabotaging — the institutions that underpin third party confidence in our grand experiment is not the way to do it. You can fool some of the people some of the time, and you might even be able to fool all of the people some of the time, but, if history is any guide, then in the fullness of time the bond market will ruthlessly tally all of your sins, and the Iron Bank will not heed your cries of Fake News.


Discover more from All The Things

Subscribe to get the latest posts sent to your email.

3 thoughts on “The Data Engineer’s Struggle In a Fragmented And Politicized World, or… Confronting A Crisis Of Credibility”

  1. Pingback: Another Govie Bites The Dust – All The Things

  2. Pingback: This Is Not Fine – All The Things

  3. Pingback: Reaping The Whirlwind, or… What’s In Your Loot Box? – All The Things

Leave a Reply

Scroll to Top

Discover more from All The Things

Subscribe now to keep reading and get access to the full archive.

Continue reading