All posts by awgibbs

Three Options

Over a lifetime of wrangling projects and relationships one repeatedly encounters a decision point with three options around managing sub-optimal outcomes.  These options exist in all contexts and carry similar costs, benefits, and risks.  Maintaining awareness of them and choosing deliberately between them makes all the difference.  These options are:

  1. Renovate
  2. Tolerate
  3. Separate

I imagine that many of my most expensive mistakes in life stem from losing sight of this or failing to prioritize making such a hard decision.

In my time at Bridgewater Associates I had many opportunities to reflect on this meta-challenge, perhaps provoked in substantial measure by staring at the Dot Collector (jump to 9:00) and thinking on the oft thin line between “Problems – Not Tolerating Them”, “Determination”, and “Practical Thinking” in many contexts, a realm where “Seeing Multiple Possibilities” and “Dealing With Ambiguity” hold central importance.  A high tolerance for pain is a powerful but dangerous personality trait.  One must take great care to separate “can” from “should” here.

For the sake of brevity I will focus on the application of this thinking to Software Engineering in a highly entrepreneurial context, a domain with which I have wrestled for most of my career at many different kinds of employers.

The key elements where one will have investments and opportunities include:

  1. People
  2. Product
  3. Process
  4. Technology
  5. Market

Wisdom involves constantly making an explicit decision between renovation, toleration, or separation for matters in each of these realms.  Meanwhile the criteria by which one ought reason about ongoing approaches include:

  1. Present Value
  2. Future Value
  3. Support Cost
  4. Opportunity Cost
  5. Risk Profile

As a Maker who takes pride in one’s work, therein resides a strong desire for continual and unending improvement, but the optimal capture of value generally comes from an implementation that falls far short of perfection.  As a Dreamer who can imagine the applicability of one’s work to many problems, therein resides a tendency to keep fighting for one’s envisioned utopia, but there is no validation of your ideas quite like users, funding, and revenue.  As an Entrepreneur one needs to be both of these things, but within reason, tempered by humility, practicality, and data-driven analysis that underpin ruthless prioritization, fanatical focus, and judicious risk management.

Dijkstra perhaps nails a central problem in insisting that ‘we should not regard [lines of code] as “lines produced” but as “lines spent”‘.  With every line of code we produce we create a maintenance burden, increase the cognitive load to add other new features, forgo countless other opportunities, decrease the system’s reliability, and add complexities and risk around security.  Sometimes Good Enough is truly Good Enough.  Use that third-party tool that gives you 90% of what you need and move on.  Maybe tolerate that annoying but rare bug with a frustrating but bearable manual remediation approach.  Or, which is a much harder pill to swallow, but sometimes the right choice: Burn it down.

Letting go of things is hard.  Firing people, abandoning products, scrapping processes, ditching technologies, and leaving markets HURTS.  But being willing to do so is key in being able to innovate.

If you want to win, then you have to be agile.  If you want to be agile, then you have to be able to pivot quickly _and_ progress rapidly.  This entails a combination of:

  1. Continually making the right foundational investments
  2. Maturing processes in sync with value realization
  3. Aggressively pruning fruitless approaches to free resources

Tech has gotten incredibly complicated, most problems have multiple acceptable solutions, and having your engineers slog through duplicative, non-differentiating, infrastructure-level sludge as you progress through a series of application-layer pivots is horrifically expensive, so some foundational investments in DevOps and Data Engineering are critical.  Bias toward continual renovation here, but don’t prematurely optimize and don’t be afraid to give up on tech that isn’t working.

Change control and automated testing are wonderful things for maintaining agility but treat them as a double-edged sword.  They are wonderful for being able to move fast _without_ breaking things when you’ve gotten proven value creation whose disruption would be seriously damaging to client relations.  They can prove a pointless encumbrance when you don’t even know what you ought be building and nobody cares about it yet.  Unless you find yourself subject to undue risk or drag then be prepared to tolerate shortcomings in these realms.

When wrangling application-layer development: fail fast, fail often, and do so in a highly informed way.  In the short-term, leverage usage telemetry and analytics to understand how people are employing your system and where their engagement breaks down.  In the medium-term, gauge interest by a willingness to become a paying customer. In the long-term, customer retention and market capitalization tell the story.

Give people what they want enough to pay to have.  Time-box your bets on things where you can’t seem to get that signal.  Focus and prioritize your efforts as if those were your most important tasks because they are.  Always remember all of your options.  Steer clear of the Sunk Costs Fallacy.  Be willing to set fire to what is not working and use its light to guide your way.

Layered Convergence

Enormous benefits follow when software engineers share a tech stack.  Thus upon finding a fragmented ecosystem a newly arrived technical leader will experience great temptation to rush convergence.  A gentle touch and an iterative approach, however, will likely yield the best results with the least resistance.

Instead of attempting to fast-forward to the desired end-state, consider layering in convergence with these stages:

  1. Shared component technology
  2. Shared implementation patterns
  3. Shared software libraries
  4. Shared managed services

Taking this iterative approach, one more evolutionary than revolutionary, not only allows incremental value capture along the journey, but also yields a groundswell of support and a multitude of use cases while minimizing risk and speculation at the earlier stages.

Consider the benefits that begin to accrue and subsequently compound:

  1. Local technical guilds
  2. Key-man risk reduction
  3. Employee/project mobility
  4. Operations support sharing
  5. Burn-out risk reduction
  6. Tool chain maturity
  7. Application feature focus
  8. Feature delivery acceleration
  9. Compliance burden reductions
  10. Disaster recovery robustness

Why not capture as many of these benefits as soon as possible?  Continually enlist the help of others and you will be perpetually delighted at how fast your vision becomes reality.  Accelerate that recruitment by showing people a better way and encouraging their willing participation.

Channeling

My cats abhor my new Dyson vacuum cleaner

Screen Shot 2019-08-10 at 11.45.30 AM

I need but pick it up to trigger fierce hissing and scurrying.  I, however, keep finding reasons to love it.

With every use it reminds me of the need and attendant procedures of cleaning its filter. Pop it open, carry it to the sink, and there again one finds embedded language agnostic instructions.

This represents subtle genius.  Users lose paper instruction manuals.  Online documentation is out-of-sight and out-of-mind.  Put it in plain sight, however, and you channel users toward the desired behavior.  Customer satisfaction goes up.  Support costs go down.  Everybody wins.

A few months ago I found myself overhauling the DevOps pipeline at Finite State.  To that end I crafted a DSL against which a code generator runs that stamps out the CloudFormation to instantiate a CodePipeline in AWS with everything you need either to generate a production system, a test system, or a fully realistic dev-int environment.

Usage is fairly simply and yet when a new engineer shows up things like this can prove quite mysterious.  What do to?  Exactly what an enterprising colleague of mine did totally reflexively without being asked: he created and homed detailed instructions in the README.md of our GitHub mono-repo so it’s the first thing any developer will bump into when arriving at our system.

Make it impossible for your users to be confused and do the wrong thing.  The dividends you will reap are enormous.

Don’t Make Me Think

Consider the following two user interfaces…

2019 Dodge Grand Caravan:

dodge_grand_caravan_2019

2019 Subaru WRX:

subaru

The Caravan I just employed as an Enterprise rental from Stamford to Columbus.  Its fold-down seats are kind of awesome, allowing it to transform effortlessly from minivan to cargo van.  It proved perfect for transporting a mix of precious belongings and my feline companions, the latter of which I wanted sharing a climate controlled space with me.

The WRX I enjoy as my everyday car.  I acquired it last fall when, heartbreakingly, I lost my beloved Audi S4 in a flood.  The WRX remains one of a vanishingly small number of cars one can acquire in the US market that offer both AWD and a fully manual transmission.

But these are not the features under consideration today.

Imagine that you find yourself on the road in your rented Caravan just shy of Columbus and suddenly in an epic downpour.  The 18-wheeler in front of you brakes hard and engages its hazard lights.  You likewise brake hard and reach to engage your own hazard lights when…  fuck, where the hell is the button?  You look up and in your mirrors see another 18-wheeler barreling toward you, oblivious to the emergent conditions.  In desperation you slam on the gas and veer rightward into the breakdown lane.  The following 18-wheeler belatedly realizes the crisis it has created and veers leftward.  The two of you collaborate to thread the needle, catastrophe avoided by the narrowest of margins.

Suppose you find yourself in such a moment while driving the WRX and wish to engage your hazard lights.  Your right hand’s fingers relax, your triceps contracts, your hand finishes opening, and your palm mashes the giant red button that inhabits a space all its own.  DONE.  Crisis (hopefully) averted.

Consider, now, the same situation, but you are piloting instead the Caravan, as I was.  You look for the hazard light button in the conventional region but it is nowhere to be found.  It is hiding.  Its red matches the color that indicates heat for the climate control system.  Its size measures less than half of the adjacent buttons of purpose far less desperate.  Its location is well below the plane of where one’s eyes naturally travel and requires that your arm first drop and then thrust and furthermore poke with a single finger.

Which UX would you prefer in an emergency while piloting an unfamiliar vehicle?

How many lives have the UX designers of Dodge cost with the careless placement of a single button?

Your choices as an engineer can yield weighty consequences even if you never get to see them directly.  Have empathy for both the novice and the expert, the casual user and the crisis-beset operator…  Your efforts to mind the details may make all the difference.

Time Well Spent

Facebook continues to improve its ad targeting on me.  I’m not sure how to feel about that, but Timeular is nonetheless interesting.

I imagine Timeular would exhibit a strong Observer Effect.  That may or may not be a good thing.  Depending on the kind of work you are instrumenting, it may squeeze out wasteful time, and it may serve as harassment that prevents attainment of Flow.  For many folks, passive analysis of digital exhaust streams may prove more effective.

I recently read Silence: In The Age Of Noise by Erling Kagge after a Lunch With The FT article piqued my curiosity.  Memorable among the stories was an interview with a Space-X manager who noted that the only times he could perform deep thinking were in the toilet, in the shower, on his commute, etc.  It made me reflect on my evolving work patterns through time and their implications.

For the majority of my career, up until ~2.5 years ago, I did the bulk of my work physically located in a SCIF and digitally located on networks that were ruthlessly segmented.  With the benefit of hindsight, I look back on this arrangement as wonderful. While security concerns drove the arrangement, the benefits to knowledge work proved substantial.

You could not bring cell phones into the building.  You could not connect to the Internet from your primary work station.  Want to use your cell phone?  Walk out to your car.  Want to use the Internet?  Use a physically distinct work station.  This probably sounds crazy if you have not lived it, but actually it is kind of awesome in its own quirky way.  By imposing a transaction cost on this context switching the environment discouraged flitting between work modalities in a way that destroys focus.

I remember telling people during this time of my life that I did some of my best work sitting in the toilet at the office.  I might wander there in a trance like state, having loaded a complex problem into my head but not yet worked out a solution, and sit in a sensory deprivation chamber while I cogitated.  Now, thanks to the technology of Apple and Facebook, as well as the reduced paranoia of a non-governmental employer, I can use my toilet time to watch cat videos or read about North Korea’s nuclear ambitions.  Using that time as I perhaps ought takes a conscious effort and serious discipline.

For many years I took for granted the cognitive boundaries that my employer engineered for me.  Now I must engineer them myself.

Simplicity Begets Complexity

Aeons ago, in a pre-cloud era of my professional life, I found myself bootstrapping a software system as a (mostly) solo developer.  It would ultimately prove successful beyond my imagination, bringing along an assortment of people and systems (the key ingredients to real and lasting success), but it had humble beginnings.  It commenced with the scaffolding of Ruby On Rails, which apart from ActiveRecord quickly fell away, but its PostgreSQL core persisted.

A colleague recently remarked that “PostgreSQL is the second best choice for everything“.  That resonated with me.  As you bring a new system into existence, a needlessly complex tech stack thwarts progress.  Finding a way to leverage a single piece of coherent tech as the bedrock yields enormous benefits at the outset.  Doing so also entails substantial peril as you find yourself outgrowing yet addicted to that foundation.

git pull; rake db:migrate; touch tmp/restart.txt

During the earliest days, and for a long time, that was pretty much the deploy process for the aforementioned system.  I was able to move fast.

Of course even in that one line an insidious race condition lurks.  The disk in your environment gets new code, the memory of processor nodes for a transient period will be running different code, and some of that running code may be out-of-sync with your database’s layout which may cause data loss or corruption.

But…  Move fast and break things!  That’s a fine philosophy when working on a prototype.  It may even be fine for relatively mature products with low criticality and/or under certain definitions of “break”.  And certainly failing to have anyone care enough about your software to notice that it broke often proves the bigger risk.

Eventually, though, reality catches up and you begin to harden your system iteratively.  For me, in this particular adventure, that meant continually rearchitecting for looser coupling.

Durable message queues came to sit between my PostgreSQL and other systems.  Message processing became more decoupled, with distinct “ingest” and “digest” phases ensuring that we never lost data by first just writing down a raw document into a blob field and subsequently unpacking it into a richer structure.  Substantive changes to data formats rolled out in two or more phases to provide overlapping support that prevented data loss while allowing zero(-ish) downtime deploys.  An assortment of queues, caches, reporting faculties, and object stores accreted within the storage engine.

And so PostgreSQL found itself variously serving as data lake, data warehouse, data mart, and data bus.  This simplified certain classes of problems a lot.

Imagine that you are implementing backup-and-restore procedures.  Wouldn’t it be great if your entire state store was one engine that you could snapshot easily?

Imagine that you are a very small team supporting all of your own ops.  Wouldn’t it be great if you only had a single tech product at the core of your system to instrument and grok?

PostgreSQL was that thing.

It was wonderful.

And it was awful.

Consolidating on that one piece of tech proved pivotal in getting the system off the ground.  Eventually, however, the data processing volume, variety, and velocity ballooned, and having everything running on PostgreSQL took years off of my life (and apologies to those on whom I inflicted a similar fate).

Load shocks to internal queues made Swiss cheese of on-disk data layouts.  Reporting processes would bulldoze caches.  Novel data would confound the query planner and cause index scans through the entirety of massive tables.  The housekeeping “vacuum” processes would compete with real-time mission data processing while also running the risk of failing to complete before hitting a transaction wrap-around failure (and once cause what was possibly the most stressful day of my career).

“It’s always vacuum”, I wrote on a sticky-note that I affixed to the monitor of the guy who often took the brunt of front-line support.  “Always” was only slightly hyperbolic, true often enough to be a useful heuristic.

So simple, yet ultimately so complex.  We ended up spending a lot of time doing stabilization work.  It was stressful.  But at least we knew it was engineering muscle going into a proven and useful product.

Fast-foward to the present.

I have of late been building systems in a cloud-first/cloud-native fashion within AWS that anticipates and preempts an assortment of the aforementioned challenges.  The allure of Serverless, high availability, extreme durability, and elastic scalability is strong.  These things come, however, at a cost, often nefarious.  The raw body of tech knowledge you need to understand grows linearly with the number of pieces of component tech you have fielded.  The systems integration challenges, meanwhile, grow geometrically complex, and holding someone accountable to fix a problem at component boundaries proves maddening.

When a CloudFormation stack tries to delete a Lambda that is attached to a VPC via a shared ENI and unpredictably hangs for forty minutes because of what is likely a reference counting bug and unclear lines of responsibility, who you gonna call?  And when the fabric of your universe unravels without warning or transparency because you have embraced Serverless in all its glory and your cloud provider rolls out a change that is supposed to just be an implementation detail but that bleeds up to the surface, what you gonna do?

This other way of building apps, leveraging a suite of purpose focused tools and perhaps riding atop the PaaS-level offerings of a cloud provider, can provide an enormous amount of lift, raise the scalability ceiling, and relieve you from stressing over certain classes of unpredictability.  It does, however, come at the risk of front-loading a lot of technical complexity when you are struggling to prove your application’s viability.

In some cases the best approach may be to employ a very simple tech stack while structuring the logical boundaries of your code base in a way that anticipates an eventual lift-and-shift to a more heterogenous architecture.  In other cases wisdom may lie in reducing guarantees about correctness or timeliness, at least for a while.  Know the application you are building, remain cognizant of where you are in the application’s life cycle, make risk-based choices from that context, and beware the gravity well that you are creating.

If today you can get by on a simple snapshot-and-restore scheme and tomorrow move to a journal-and-replay approach, perhaps you will have had the best of both worlds.

And, remember: It’s always vacuum.

Stop

Last Tuesday when I learned of Trump’s crude caricature of Christine Blasey Ford’s testimony I felt my stomach turn.  The baseness and puerility, underpinned by a new depth in realpolitik, filled me with shame and disgust.  I want to say “I don’t understand how we got here”, but I think I do.

Now let’s zoom out and survey the larger landscape.

John Oliver is a genius.  I have for a long time enjoyed his show.  Lately, however, I’ve been struggling to partake of it.  It’s not that I disagree thematically with his positions.  Rather, I have increasingly found his presentation of topics disagreeable.  And that brings us to yesterday.

I am inclined to believe that on Saturday we confirmed a rapist to serve as a judge on the highest court in our land.  And yet the next day when I watched the previous week’s episode of Oliver’s “Last Week Tonight” I found the experience similarly excruciating to Trump’s performance.  I go to Oliver for, among other things, some left-leaning gallows humor.  But I do not recall laughing a single damn time during the whole episode.  I felt like I was watching an imminent train wreck while not knowing what to do about it.  I experienced Oliver’s treatment of the matter as crude, divisive, and generally unhelpful, preaching to a choir that has given up on reaching across the aisle.

The national discourse has sunk to such a level that fears of civil war are not unfounded. Instead of a calm examination of facts and testimony meshed with the perspective of subject matter experts we ran a media circus intended to inflame opinion and further polarize an already divided electorate.  Republicans and Democrats alike had very clear agendas and were each weaponizing the Kavanaugh hearings by and large along party lines in a self-serving fashion.

Judges, at least in theory, are supposed to act as dispassionate and logical arbiters of the laws that our legislative branch puts on the books.  If that were true, however, it would seem unlikely that we would find ourselves so embroiled in conversations around the ideology of judges, alternately fretful of judges promoted by a competing ideology as “legislating from the bench” or excited to pack the court with judges of our ilk to lock in our preferred version of reality.

Can we all stop being such unmitigated assholes?  On the current trajectory things ends poorly for everyone.