Why Security And Scalability Are Two Sides Of The Same Coin, or… How To Win Friends And Starve Invading Armies

“Argh, again with the root file system writes!”, I lamented recently, while helping a client port the enterprise version of an open source product from Heroku to AWS. While configuring it to run on Amazon’s ECS, I had reflexively enabled the readonlyRootFileSystem option, knowing that not only to be a good practice generally, but also key to preserving recent hard-won gains in their AWS SecurityHub score.

Sadly, this vendor product, like another I had recently wrangled for them, was following the “wrap my legacy server installation process in a container image and call it a day” anti-pattern for providing containerized deployment support, leaving integrators either to accept the related risk or to mitigate it via a handful of unsavory options.

This pattern, in turn, reflects a broader problem, exemplified by Larry Wall’s observation that C programmers, finding occasion to write Perl programs, would essentially write C programs in Perl. In a nutshell — people often, out of habit, build systems with familiar patterns instead of the locally idiomatic ones, thereby forgoing many of the benefits on offer. Just as Larry wrestled decades ago with recalcitrant C programmers who ported their habits to Perl, so, too, have I struggled with engineers forged in the government and corporate data centers of yore, who continue to build for the cloud as if it were just racks of bare-metal servers in a local data center.

I empathize — I spent roughly the first half of my career in the weird and wonderful world of the U.S. Defense Department during a period that spanned the birth of AWS, the ascendancy of public cloud, and the rise of DevOps as a discipline. A major reason for my reluctant departure from the civil service in 2016 involved a creeping sense of obsolescence, a feeling that the way I was building systems was fast becoming yesterday’s news. From TDYs to conferences and late-night blog binges, I could tell that a massive train was leaving the station, one I had best board before it was too late.

It’s not like the collocated applications of yesteryear wouldn’t have benefited from better distributed systems engineering rigor, but the public cloud, with its potent pairing of promise and peril, rendered certain tactics and techniques far less optional.

Sheltered within a conventional network perimeter, life seemed safer, though that feeling may often have been little more than a comforting lie. Running on physical servers, the underlying compute substrate felt reliable, with downtime (and even restarts) limited mostly to freak events such as power outages, disk overflows, or bungled software patches. Serving relatively well-understood enterprise audiences, workloads proved fairly predictable, a situation where static infrastructure and long planning horizons felt, if not great, then at least tolerable. Blithely relying on infrastructure maintained by and for just your own company, “cross-tenant spillage” wasn’t even in the vocabulary.

Now contrast this to containerized applications run by a cluster scheduler in a public cloud, add to that dependencies on an assortment of shared services for storage and networking, and top it off with a need for elastic horizontal scalability. Small wonder, then, that so-called “lift and shift” activities so often fail spectacularly across the categories of economics, performance, reliability, and security — once tolerable cold-start delays prove punishing, rare data corruptions become vexingly common, borders turn terrifyingly porous, and assumptions about shared state thwart scaling. The application architecture of a cloud native app, if not the exact components, can readily translate from a public cloud to a private data center, but applications whose ancestral DNA traces to private data centers habitually struggle to migrate in the other direction.

Let’s return now to the recurring frustration that first prompted this piece. In the earlier example, after some reverse engineering that included installing diagnostic tools on the offending container, I discovered that its entry point referenced a self-extracting Python application. The later example, while differing in detail, committed fundamentally the same offenses with its just-in-time Java application installer.

It’s not hard to imagine how things became thus — someone bootstrapped an open source project, support emerged for the chosen language’s preferred package manager, a community began to form, a business coalesced around a streamlined product offering, requirements grew messy and diverse, a trickle of requirements turned firehose, short-cuts became tempting, dogs began living with cats, humans were sacrificed… mass hysteria!

Perhaps, during early days, one class of client warranted a bullet-proof single-file push-button server-based installer. Then, some time later, another client insisted that they needed a container image on DockerHub. Sounding like a straightforward task, management gave it to the intern, and the intern, having heard that code reuse is a good thing, just wrapped the self-extracting file in a container image and made that the entry point. Mission Accomplished!

Now let’s elevate our concern to a more strategic context. When dealing with the prospect of an invading cyber army, success requires heeding the key asymmetries, both accepting the inevitability of some manner of compromise and seeking to leverage one’s own advantages.

In the earliest phases of an operation, an attacker need only be right once, but beyond conducting reconnaissance, gaining initial access, and establishing a beachhead, their advantages begin to wane — to evade detection, they must be right or lucky every step of the way; to sustain themselves, they must live off the land or summon reinforcements; to act effectively, they need the ability to move laterally, exfiltrate intelligence, and perhaps receive additional orders.

The defender’s task, then, involves placing the adversary in a tortuous maze, erecting frustrating barriers at chokepoints, stringing up trip wires under watchful eyes, clamping avenues of communication, ruthlessly stripping the land of extraneous resources, and readying a quick reaction force. Victory hinges not on perfection, for no such system exists, but rather on simply making life a bloody nuisance for the would-be invader at every turn.

Yes, surely we could forgo some of this bother if only we lived in a world composed entirely of friendly, ethical, law-abiding actors, but many of the techniques that bolster security also benefit matters of operational safety, system performance, unit economics, and general comprehensibility.

Pare a container image to the bare minimum of installed components and runtime permissions, for instance, and you not only hobble the ability of attackers to engage in shenanigans, but also tamp down complexity, reduce load times, shrink execution requirements, and minimize storage footprints.

Further craft application code running on such lightweight containers to expect chaotic ephemerality, doing so by designing for local statelessness and idempotent processing, and for good measure further configuring such workers to auto-terminate after a fixed duration or a capped number of events, and voila — you have not only engineered your system for robust safety in the face of hardware faults and made supporting horizontal scalability a trivial exercise, but also forged an infuriatingly tenuous platform for an adversary clinging to network access by their finger nails.

Configure network connectivity and resource permissions to support just what a task needs and you have not only hampered an adversary’s ability to infiltrate tools and commands, exfiltrate intelligence, and spread laterally through your network, but you have also helpfully documented what the heck that damn thing is supposed to do when it is functioning as intended, a feature as valuable to the maintainers of the application as to the designers of anomaly detection analytics.

Capture all of this not just with repeatable container image builds that pin and minimize dependencies, but also with Infrastructure As Code that defines every facet of a system’s provisioning, and you begin to approach DevSecOps nirvana. On the security front, codifying a machine-readable baseline empowers us to detect configuration drift, analyze third-party risk, and proactively sense attacks. In the realms of DevEx, QA, and DevOps, driving the marginal cost of deploying copies of systems to zero accelerates the ability of developers and testers to prototype and validate features while providing operators the ability to roll systems forward and backward with confidence, all while maintaining Separation Of Duties. And, were that not enough, all of this together establishes a solid foundation for automated disaster recovery, a feature equally valuable in the face of fat fingers, Acts Of God, insider threats, and outsider malice.

Systems integration will always be hard — requirements evolve, context matters, specialization proliferates, complexity explodes, hackers gonna hack, and mistakes will be made. Yet though it may often prove infuriating, it will rarely engender boredom, and the more holistically we can treat the whole affair, the happier we’ll all be — artists, soldiers, managers, investors, customers, and accountants alike.


Discover more from All The Things

Subscribe to get the latest posts sent to your email.

Leave a Reply

Scroll to Top

Discover more from All The Things

Subscribe now to keep reading and get access to the full archive.

Continue reading