Category Archives: Uncategorized

Don’t Make Me Think

Consider the following two user interfaces…

2019 Dodge Grand Caravan:

dodge_grand_caravan_2019

2019 Subaru WRX:

subaru

The Caravan I just employed as an Enterprise rental from Stamford to Columbus.  Its fold-down seats are kind of awesome, allowing it to transform effortlessly from minivan to cargo van.  It proved perfect for transporting a mix of precious belongings and my feline companions, the latter of which I wanted sharing a climate controlled space with me.

The WRX I enjoy as my everyday car.  I acquired it last fall when, heartbreakingly, I lost my beloved Audi S4 in a flood.  The WRX remains one of a vanishingly small number of cars one can acquire in the US market that offer both AWD and a fully manual transmission.

But these are not the features under consideration today.

Imagine that you find yourself on the road in your rented Caravan just shy of Columbus and suddenly in an epic downpour.  The 18-wheeler in front of you brakes hard and engages its hazard lights.  You likewise brake hard and reach to engage your own hazard lights when…  fuck, where the hell is the button?  You look up and in your mirrors see another 18-wheeler barreling toward you, oblivious to the emergent conditions.  In desperation you slam on the gas and veer rightward into the breakdown lane.  The following 18-wheeler belatedly realizes the crisis it has created and veers leftward.  The two of you collaborate to thread the needle, catastrophe avoided by the narrowest of margins.

Suppose you find yourself in such a moment while driving the WRX and wish to engage your hazard lights.  Your right hand’s fingers relax, your triceps contracts, your hand finishes opening, and your palm mashes the giant red button that inhabits a space all its own.  DONE.  Crisis (hopefully) averted.

Consider, now, the same situation, but you are piloting instead the Caravan, as I was.  You look for the hazard light button in the conventional region but it is nowhere to be found.  It is hiding.  Its red matches the color that indicates heat for the climate control system.  Its size measures less than half of the adjacent buttons of purpose far less desperate.  Its location is well below the plane of where one’s eyes naturally travel and requires that your arm first drop and then thrust and furthermore poke with a single finger.

Which UX would you prefer in an emergency while piloting an unfamiliar vehicle?

How many lives have the UX designers of Dodge cost with the careless placement of a single button?

Your choices as an engineer can yield weighty consequences even if you never get to see them directly.  Have empathy for both the novice and the expert, the casual user and the crisis-beset operator…  Your efforts to mind the details may make all the difference.

Time Well Spent

Facebook continues to improve its ad targeting on me.  I’m not sure how to feel about that, but Timeular is nonetheless interesting.

I imagine Timeular would exhibit a strong Observer Effect.  That may or may not be a good thing.  Depending on the kind of work you are instrumenting, it may squeeze out wasteful time, and it may serve as harassment that prevents attainment of Flow.  For many folks, passive analysis of digital exhaust streams may prove more effective.

I recently read Silence: In The Age Of Noise by Erling Kagge after a Lunch With The FT article piqued my curiosity.  Memorable among the stories was an interview with a Space-X manager who noted that the only times he could perform deep thinking were in the toilet, in the shower, on his commute, etc.  It made me reflect on my evolving work patterns through time and their implications.

For the majority of my career, up until ~2.5 years ago, I did the bulk of my work physically located in a SCIF and digitally located on networks that were ruthlessly segmented.  With the benefit of hindsight, I look back on this arrangement as wonderful. While security concerns drove the arrangement, the benefits to knowledge work proved substantial.

You could not bring cell phones into the building.  You could not connect to the Internet from your primary work station.  Want to use your cell phone?  Walk out to your car.  Want to use the Internet?  Use a physically distinct work station.  This probably sounds crazy if you have not lived it, but actually it is kind of awesome in its own quirky way.  By imposing a transaction cost on this context switching the environment discouraged flitting between work modalities in a way that destroys focus.

I remember telling people during this time of my life that I did some of my best work sitting in the toilet at the office.  I might wander there in a trance like state, having loaded a complex problem into my head but not yet worked out a solution, and sit in a sensory deprivation chamber while I cogitated.  Now, thanks to the technology of Apple and Facebook, as well as the reduced paranoia of a non-governmental employer, I can use my toilet time to watch cat videos or read about North Korea’s nuclear ambitions.  Using that time as I perhaps ought takes a conscious effort and serious discipline.

For many years I took for granted the cognitive boundaries that my employer engineered for me.  Now I must engineer them myself.

Simplicity Begets Complexity

Aeons ago, in a pre-cloud era of my professional life, I found myself bootstrapping a software system as a (mostly) solo developer.  It would ultimately prove successful beyond my imagination, bringing along an assortment of people and systems (the key ingredients to real and lasting success), but it had humble beginnings.  It commenced with the scaffolding of Ruby On Rails, which apart from ActiveRecord quickly fell away, but its PostgreSQL core persisted.

A colleague recently remarked that “PostgreSQL is the second best choice for everything“.  That resonated with me.  As you bring a new system into existence, a needlessly complex tech stack thwarts progress.  Finding a way to leverage a single piece of coherent tech as the bedrock yields enormous benefits at the outset.  Doing so also entails substantial peril as you find yourself outgrowing yet addicted to that foundation.

git pull; rake db:migrate; touch tmp/restart.txt

During the earliest days, and for a long time, that was pretty much the deploy process for the aforementioned system.  I was able to move fast.

Of course even in that one line an insidious race condition lurks.  The disk in your environment gets new code, the memory of processor nodes for a transient period will be running different code, and some of that running code may be out-of-sync with your database’s layout which may cause data loss or corruption.

But…  Move fast and break things!  That’s a fine philosophy when working on a prototype.  It may even be fine for relatively mature products with low criticality and/or under certain definitions of “break”.  And certainly failing to have anyone care enough about your software to notice that it broke often proves the bigger risk.

Eventually, though, reality catches up and you begin to harden your system iteratively.  For me, in this particular adventure, that meant continually rearchitecting for looser coupling.

Durable message queues came to sit between my PostgreSQL and other systems.  Message processing became more decoupled, with distinct “ingest” and “digest” phases ensuring that we never lost data by first just writing down a raw document into a blob field and subsequently unpacking it into a richer structure.  Substantive changes to data formats rolled out in two or more phases to provide overlapping support that prevented data loss while allowing zero(-ish) downtime deploys.  An assortment of queues, caches, reporting faculties, and object stores accreted within the storage engine.

And so PostgreSQL found itself variously serving as data lake, data warehouse, data mart, and data bus.  This simplified certain classes of problems a lot.

Imagine that you are implementing backup-and-restore procedures.  Wouldn’t it be great if your entire state store was one engine that you could snapshot easily?

Imagine that you are a very small team supporting all of your own ops.  Wouldn’t it be great if you only had a single tech product at the core of your system to instrument and grok?

PostgreSQL was that thing.

It was wonderful.

And it was awful.

Consolidating on that one piece of tech proved pivotal in getting the system off the ground.  Eventually, however, the data processing volume, variety, and velocity ballooned, and having everything running on PostgreSQL took years off of my life (and apologies to those on whom I inflicted a similar fate).

Load shocks to internal queues made Swiss cheese of on-disk data layouts.  Reporting processes would bulldoze caches.  Novel data would confound the query planner and cause index scans through the entirety of massive tables.  The housekeeping “vacuum” processes would compete with real-time mission data processing while also running the risk of failing to complete before hitting a transaction wrap-around failure (and once cause what was possibly the most stressful day of my career).

“It’s always vacuum”, I wrote on a sticky-note that I affixed to the monitor of the guy who often took the brunt of front-line support.  “Always” was only slightly hyperbolic, true often enough to be a useful heuristic.

So simple, yet ultimately so complex.  We ended up spending a lot of time doing stabilization work.  It was stressful.  But at least we knew it was engineering muscle going into a proven and useful product.

Fast-foward to the present.

I have of late been building systems in a cloud-first/cloud-native fashion within AWS that anticipates and preempts an assortment of the aforementioned challenges.  The allure of Serverless, high availability, extreme durability, and elastic scalability is strong.  These things come, however, at a cost, often nefarious.  The raw body of tech knowledge you need to understand grows linearly with the number of pieces of component tech you have fielded.  The systems integration challenges, meanwhile, grow geometrically complex, and holding someone accountable to fix a problem at component boundaries proves maddening.

When a CloudFormation stack tries to delete a Lambda that is attached to a VPC via a shared ENI and unpredictably hangs for forty minutes because of what is likely a reference counting bug and unclear lines of responsibility, who you gonna call?  And when the fabric of your universe unravels without warning or transparency because you have embraced Serverless in all its glory and your cloud provider rolls out a change that is supposed to just be an implementation detail but that bleeds up to the surface, what you gonna do?

This other way of building apps, leveraging a suite of purpose focused tools and perhaps riding atop the PaaS-level offerings of a cloud provider, can provide an enormous amount of lift, raise the scalability ceiling, and relieve you from stressing over certain classes of unpredictability.  It does, however, come at the risk of front-loading a lot of technical complexity when you are struggling to prove your application’s viability.

In some cases the best approach may be to employ a very simple tech stack while structuring the logical boundaries of your code base in a way that anticipates an eventual lift-and-shift to a more heterogenous architecture.  In other cases wisdom may lie in reducing guarantees about correctness or timeliness, at least for a while.  Know the application you are building, remain cognizant of where you are in the application’s life cycle, make risk-based choices from that context, and beware the gravity well that you are creating.

If today you can get by on a simple snapshot-and-restore scheme and tomorrow move to a journal-and-replay approach, perhaps you will have had the best of both worlds.

And, remember: It’s always vacuum.

Stop

Last Tuesday when I learned of Trump’s crude caricature of Christine Blasey Ford’s testimony I felt my stomach turn.  The baseness and puerility, underpinned by a new depth in realpolitik, filled me with shame and disgust.  I want to say “I don’t understand how we got here”, but I think I do.

Now let’s zoom out and survey the larger landscape.

John Oliver is a genius.  I have for a long time enjoyed his show.  Lately, however, I’ve been struggling to partake of it.  It’s not that I disagree thematically with his positions.  Rather, I have increasingly found his presentation of topics disagreeable.  And that brings us to yesterday.

I am inclined to believe that on Saturday we confirmed a rapist to serve as a judge on the highest court in our land.  And yet the next day when I watched the previous week’s episode of Oliver’s “Last Week Tonight” I found the experience similarly excruciating to Trump’s performance.  I go to Oliver for, among other things, some left-leaning gallows humor.  But I do not recall laughing a single damn time during the whole episode.  I felt like I was watching an imminent train wreck while not knowing what to do about it.  I experienced Oliver’s treatment of the matter as crude, divisive, and generally unhelpful, preaching to a choir that has given up on reaching across the aisle.

The national discourse has sunk to such a level that fears of civil war are not unfounded. Instead of a calm examination of facts and testimony meshed with the perspective of subject matter experts we ran a media circus intended to inflame opinion and further polarize an already divided electorate.  Republicans and Democrats alike had very clear agendas and were each weaponizing the Kavanaugh hearings by and large along party lines in a self-serving fashion.

Judges, at least in theory, are supposed to act as dispassionate and logical arbiters of the laws that our legislative branch puts on the books.  If that were true, however, it would seem unlikely that we would find ourselves so embroiled in conversations around the ideology of judges, alternately fretful of judges promoted by a competing ideology as “legislating from the bench” or excited to pack the court with judges of our ilk to lock in our preferred version of reality.

Can we all stop being such unmitigated assholes?  On the current trajectory things ends poorly for everyone.

Check Yourself

Last summer I found myself rebooting my flight training at KDXR through Arrow Aviation with Duke Morasco as my instructor.

Things were going pretty well.  I was ~15 flight hours into the process and Duke thought I was about ready to solo.  I felt confident and capable and in control.  “PP-ASEL, here I come!”, or so I thought.

I found myself out for a lesson with Duke on Thursday 10 August 2017 and…  it was an outlier of a lesson.  I wasn’t sure what was up, but it was our worst lesson together.  I had the sense that Duke was agitated and abrupt, out of character from all of our earlier flights, but I reserved substantial probability mass for it having been my fault, the result of some rust having accumulated from a couple of weeks out of town.

I hoped that it was a fluke and scheduled another lesson with Duke on Saturday 12 August.  That lesson would never take place.

On Friday 11 August I received a call from Arrow Aviation.  Duke had been killed in an accident while up in N1727V with another student.

plane_crash

I found myself in shock, confused, and light on information.  For a long time I had little to go on, just an assortment of news articles and a preliminary NTSB report.  Was it during take-off or landing?  Was there a mechanical failure or operator error?

Somewhat insensitively Arrow asked if I wanted to schedule with a new instructor on Sunday.  I told them I need some time to reflect.  Insanely, Arrow had just lost another plane on 30 July during a failed take-off, and I did not feel like tempting fate.

For over a year I found myself wondering what had happened.  At last the NTSB has issued a final report.

Most notably…

According to GPS data, the airplane landed on and then took off from a grass airstrip, climbed about 150 ft, then collided with terrain about 1,000 ft past the end of the runway.

 

… and furthermore…

An examination of the wreckage did not reveal any evidence of a preaccident mechanical malfunction or anomaly. An examination of the flight controls revealed that the wing flaps were in the fully extended (40o) position at impact. The airplane’s operating checklist stated that normal and obstacle clearance takeoffs are performed with wing flaps up, and flap settings greater than 10o are not recommended at any time for takeoff. Upon landing on the grass runway, the flaps should have been retracted as part of the after-landing checklist, then confirmed up as part of the before takeoff and takeoff checklists. It is likely that the flap setting at the time of takeoff resulted in an aerodynamic stall and loss of control during the initial climb.

 

Well, shit.

The student pilot was apparently pretty green.  And it seems like nobody realized that the aircraft was in an excessively high-drag wing configuration prior to take-off.  This, in concert with the natural resistance of a grass-field airstrip, and in conjunction with some nasty trees beyond the threshold, presumably led to a late rotation and inadequate rate of climb that culminated in a panic, stall, and crash.

Damn.

So preventable.

Take your time.  Run your checklists.  Don’t get complacent.

And be wary of relying on “experts”.  They get over-confident or overwhelmed and make mistakes just like everyone else.

This is doubtless good advice in many contexts, professional and recreational.  If what you’re doing is complicated and dangerous, take the time in a calm and quiet moment to codify how you want to operate in every circumstance.  Your future stressed-out self will thank you.

And it’s not just about the operation’s procedures.  It’s about assessing you, the operator. Every aircraft comes with a comprehensive checklist for every stage of flight.  And yet pilots are further counseled to run the IMSAFE checklist against themselves before getting behind the controls.  The risks of illness, medication, stress, alcohol, fatigue, and emotion are all too real.  And some of those items are extremely difficult to gauge.  It’s pretty straightforward to avoid getting into a cockpit while sick, medicated, or drunk.  But how stressed, fatigued, or emotional is too much?

I wonder how to navigate these circumstances when the impacts are less dramatic and more ambiguous than crashing a plane.  How many times have I driven a car when exhausted and distracted?  How many times should I have waited to share an opinion or make a decision until I had attained a better mind-state?

Choices and consequences.

ntsb_crash_photo

Thoughtless Development

Back when I was a boy, we ran servers on bare metal and we liked it.

And then there were containers.

And then there was AWS Lambda: “Run code without thinking about servers.”

Dwindling are the folks who might even know what “lsof”, “ps”, “top”, “nc”, “traceroute”, “df”, and “ldd” are, much less when to use them.

Actually, Lambda is pretty great, and I use it a lot, but damn does it make it easy to grow your attack surface and forget that you’ve done so.  And, at the end of the day, there are servers, and that reality has implications for availability and latency in whatever system you are building.

Meanwhile, infra-as-code faculties have proliferated, and many folks are using them, but the siren’s song of infra-as-clicks is quite strong, and the potential to create a non-repeatable mess in the cloud provider of your choice is great.

Be Strong.

But let’s get more concrete…

Today I was in the pantry at the office and on the TV I saw some talking head with a green-screen behind him on which three logos were painted in a repeating pattern: New England Patriots, Dunkin Donuts, and…  Zudy:”No Code Apps”.

WTF?

Football, donuts, and faux enterprise software development.  LOL WUT.

Zudy’s marketing hype is intense: “No Code Enterprise Apps; Join The No-Code Evolution; Build game changing apps in days”.

Oh, FFS.  It was bad enough that we had to endure the No SQL shenanigans for about a decade before Make SQL Great Again got legs.  Now we’re going to pretend that we can develop apps without even thinking?

Spoiler alert: creating apps is easy; developing them over time once data has begun accumulating and people have begun broadly using them is hard.

We are witnessing a proliferation of shiny technologies that make it easy to bring new capabilities into existence, with the promise of old baggage being jettisoned, but we are not seeing commensurate faculties to manage and evolve these capabilities as we attempt to navigate a full system lifecycle.

I’m sorry, but the majority of the code written for a mature software system centers on logging, testing, data modeling, exception handling, security hardening, performance tuning, configuration management, release management, and inter-version compatibility.  This is the inescapable bread-and-butter engineering work of taking the kernel of an idea to a robust system that can handle day-to-day usage by an army of users in a way that is not completely maddening.

This is not new.  But the frequency with which products like this crop up is increasing.  We see examples of it in such offerings as SplunkPhantom, and NiFi.  And yet the well of uncomfortable truths tells us that “you’ll never find a programming language that frees you from the burden of clarifying your ideas”.

But, fear not…  If you get yourself wrapped around the axel, Zudy has an “AppFactory” and is more than happy to “Let Zudy’s experts build your apps for you.”  Congratulations.   You just built yourself a thicket of tech debt and hired some third rate contract programmers who will hold you hostage in perpetuity.

There are two kinds of enterprises: the kind who create and manage software deliberately and wittingly, and the kind who do so accidentally and unwittingly.  Which will you be?

Rage Against The Machine

At last week’s Strata Conference the buzzword exhibiting the highest frequency count appeared to be “Explainable” as prepended to “Artificial Intelligence”.  We have collectively transcended “can we make it work?” and landed squarely in “why did it make that decision?” territory.

In highly regulated industries the government applies a strong back pressure on non-explainable algorithmic decisions.  This serves as a check against runaway and impenetrable automation of decision making.  Yet clearly not all AI-driven industries that can exert an enormous impact on our lives find themselves subject to such controlling forces.  And from one country to another the degree of regulation for a given industry can vary greatly.

The UAE’s Daman gave an interesting talk on how they applied Natural Language Processing techniques to non-textual data in the healthcare claims adjudication space.  The strategy appeared to enjoy substantial and measurable success.  What creeped me out, though, was their seeming heavy reliance on customer complaints to act as the corrective force on falsely flagging claims as invalid.  The presenter offered the opinion that if a customer did not fight a claim rejection then the claim was probably invalid or unimportant anyway.

This feels like data scientists engaging in cost externalization to customers who exist in a fairly disadvantaged position and who must now fight back against a maddeningly opaque decision engine.  This appeared especially so in the case of Daman who apparently controls 80% of the health care market in the UAE (cited by one of the presenters as a reason why this particular data set was super cool to work on).

What force would stop such a company from taking the next logical step in profit optimization?  Auto-tune the rejection of valid claims to the sweet spot where statistically customers don’t fight it because getting their due does not justify the cost.

There has been much talk of how we must not allow the “Kill Decision” to fall into the hands of robots in warfare.  How easy it would be to make the same mistake in less sensational contexts.