Category Archives: Ruminations

The Great Forgetting

To The Atlantic Editors,

While overall I agree with the thesis of Carr’s November 2013 piece, The Great Forgetting, I am disappointed with at least one item of proffered evidence that suffers from multiple substantial flaws. In particular, he cites “one recent study, conducted by Australian researchers, [that] examined the effects of [software] systems used by three international accounting firms”. From the limited presentation of the study, it would seem quite likely to conflate causation and correlation, rendering it largely useless. Furthermore, the attribution of the study merely to unnamed “Australian researchers”, with no mention of a study name, institution, publication, or date, makes it impractical to find the source.

We are told that “two of the firms employed highly advanced software” while a third firm “used simpler software”, the former providing a large degree of decision support functionality compared to the latter. Subjected to “a test measuring their expertise”, we’re told that individuals using the simpler software “displayed a significantly stronger understanding of different forms of risk”. And… What are we to believe about this?

The study’s presentation seems to imply that the overly helpful software atrophied the brains of the workers in the two firms using it. Maybe that is true, but we don’t actually know that such a causal relationship exists. As an alternate explanation, perhaps the two firms that use the (purportedly) more sophisticated software generally hire lower caliber accountants and have decided that more intrusive software is the only way to get acceptable results from them. Furthermore… Was proficiency of individuals measured both before and after exposure to the firms’ software? How long did the individuals use the software? Were the sample sizes large enough to avoid statistical noise? Were there any meaningful controls in place for this study?

Carr has apparently interpreted this study in a way that makes it convenient to weave into the larger narrative of the piece, but as it was presented it fails to support his thesis. Such careless cherry picking undermines a very real and otherwise well articulated issue.

— AWG

StarCraft Life Lessons

After a hiatus of seven years, StarCraft has come roaring back into my life with the release of StarCraft 2. Given the game’s phenomenal richness and complexity, I cannot help but draw parallels between its challenges and those of life writ large.

To succeed at StarCraft, a player must be able to shift between macro and micro management tasks as well as cycle with fanatic discipline through a variety of distinct but intimately related macro management tasks. Focus too long on micro management tasks, and though you may win present battles, you’ll find yourself without troops to micro manage later. Focus overly much on particular macro management tasks, and you’ll eventually find those tasks impossible and undermined. The various facets of StarCraft make up a tightly coupled system of systems that all must operate well to be successful. So, too, with “real” life…

A solid StarCraft player is running a tight loop in which he/she polls a variety of subsystems and issues commands to them to optimize the system’s state. Such high level queries follow…

  • Are adequate workers being built?
  • Do I have idle workers?
  • Do I have adequate supplies?
  • Do I have enough resource flow?
  • Do I have enough production capacity?
  • Am I spending my resources?
  • Do I know what kinds and quantities of units my adversaries are building?
  • Do I have enough expansion bases?
  • Can I see enough of the map?
  • Have my adversaries acquired expansion bases?
  • Do I know where my opponents’ units are?
  • Are my units well positioned?
  • Have I upgraded my units’ strength and abilities?
  • Am I about to be attacked?
  • Am I harassing my adversaries enough?
  • Can I handle cloaked enemy units?
  • Can I handle highly mobile enemy units?
  • Can I handle long range enemy units?

You do not want to switch to a sub-task because it became a problem. Rather, you want to cycle through tasks and hit each one before it becomes an issue. If you’re frantically building Turrets in your base because you’re being overrun by cloaked Banshees, you’re doin’ it wrong. If you’re popping out additional Supply Depots because your supplies are maxed out, you’re being reactive instead of proactive. More generally, if you’re only doing a task because some other task is actively blocking on it then you are doomed.

The exigencies of life have a way of making one focus on the most obviously looming problems at the expense of other issues until they start to provide more concrete impact. Therein lies the path to unhappiness and ineffectiveness. If you’re expending all of your effort on your project at the office, then you’re neglecting your health, neglecting financial management, neglecting continual learning, neglecting to keep up on current events, neglecting your relationships, and so on. Eventually these neglects will undermine the thing on which you are attempting to focus, causing you to fail at everything.

To break out of such a rut requires a conscious and concerted effort. Build a loop in which you poll the various facets of your life. Keep asking “Am I making progress in this arena of my life?”. Keep a trail of your progress and/or set and observe concrete milestones. Keep a journal. Keep a log book. Keep shelves of books you have read. And so on…

But don’t go for 300 Actions Per Minute in your daily life… The frenetic pace of StarCraft is not the pace one wants in life as a whole. That said, the regular and proactive monitoring and management of multiple tightly related sub-goals can serve as the underpinnings of a satisfying life.

To be more relevant to a purported software blog… A successful software project presumably also results from a well run “health” management loop…

  • Am I delivering useful features to customers?
  • Am I returning to messy code and cleaning it up?
  • Am I writing enough automated tests?
  • Is there enough documentation?
  • Do I have good configuration management?
  • Am I validating the usability of the user interfaces?
  • Am I exposing good APIs?
  • Is the system going to scale?
  • Am I managing compliance issues successfully?
  • Am I making good use of existing technologies?
  • Am I anticipating the impact of coming technologies?
  • Are my team members in good spirits and growing?
  • Am I aware of my competitors’ status and plans?
  • Am I keeping my partner organizations adequately informed?
  • Am I keeping my management apprised of political and resourcing issues?

— AWG

Presentation Fail

OSCON has reminded me over and over again this week that most people do not know how to give an engaging presentation. Worse still, OSCON in nature attracts a diverse crowd, making “deep dive” talks generally a bad idea, and yet so many presenters dive right into the deep end without offering the audience adequate context. Finally, far too many presenters fail to realize that entertaining an audience vastly improves the chances of educating an audience.

Three questions inhabit the fore of your audience members’ minds. Why should I care about this issue? What do you have to offer that others haven’t already? Why should I believe you? Sadly, many presenters jump straight into describing some system they built or research they conducted without answering the first question. Consequently, whatever they have to offer for the second question is lost. If they get around to the third question at all the audience has lost interest unless they were already domain experts.

A simple template for success follows:

  1. describe a real world problem that any layman could understand to offer context for why one should care
  2. go through some crude solutions and explain their limitations
  3. describe superior solutions and how they address the limitations of the simpler ones
  4. provide concrete examples of your solutions in practice, offer statistical data that support your thesis, and include rich yet elegantly compact visualizations

Also, please, please, PLEASE do not fill up all of your slides with a bunch of text. Your slides should exist as a conversational backdrop and a visual aid. They should provide complementary presentations of the words coming out of your mouth. If you must, use note cards for cues on where you want to take the conversation, not your slides. Filling your slides with a bunch of prose fragments squanders a valuable information channel.

Lastly, and before giving the presentation to a large audience, conduct the analog of a “hallway usability test”. Grab one of your hapless colleagues or friends, subject him to your presentation, carefully assess his involuntary reactions, and ask for feedback (maybe even quiz him on the key points). Was he board and fidgety? Did he fail to grasp and retain the key points? If so, then either the presentation is boring or your delivery is flawed. Refine the content. Practice the delivery on another colleague.

Too many people mistake “making slides” for “preparing to give a presentation”. Don’t be one of them. Don’t assume that people will love your talk just because you love the subject of the talk. Have some empathy for your would-be audience and then test your presentation theories. Don’t speak in a monotone. Have some showmanship. Give the crowd opportunities to be active participants. Your audience will thank you.

— AWG

OSCON Tutorial Fail

The notion of a “tutorial” at OSCON (or any O’Reilly conference) always seems like a good idea, but far too often these tutorials crater spectacularly. For the money people are paying, the class experience ought to unfold like polished ballet, and yet it regularly manifests as a stumbling experience of library issues, network problems, general confusion, and mounting frustration.

Every participant brings a laptop. With any luck people have pre-installed some number of dependencies. Regularly instructors have not finalized their tutorial code until the eleventh hour. They might gloss over “boring” details such as how account management and configurations differ from distro to distro. The session I’m attending at the present, Test Driven Database Development, has burned half of its allotted time and yet has accomplished little more than getting the first test case to run for most of the students.

I should not need to have anything on my laptop apart from an SSH client. The conference should host a local VMWare cluster in which each student has a private VM cloned from a VM that a class’s instructor has tested to death. Or maybe I should be able to download a VM from a local file server and fire it up on my laptop (or at least have this as an option).

Given how things often go, you might end up believing that the conference organizers and/or tutorial presenters haven’t ever managed a real software application or done anything resembling system administration. Package versions matter. Configuration quirks matter. Unambiguous instructions matter. Telling people to download a bunch of packages (often with loose guidance on which version to use), having them run them on a variety of OSes, and expecting anything other than general mayhem for a classroom full of people is unreasonably optimistic.

Simple things will trip you up. My Postgres instance blew up because of a disagreement between it and the OS about reasonable shared memory buffer sizes after a config file was overwritten while installing a required package. Pulling down the instructor’s code with a “git clone” took forever because the instructor had a large file in his repository and the conference’s wireless network is underprovisioned. My version of pgTAP installed its functions in a schema subtly different than what the instructor’s code expected.

I’m perfectly capable of working through such system administrivia, but in a class environment, competing for the instructor’s time with dozens of other students, and working with a poor network connection, the three hours of class time just goes up in smoke with far too little value captured.

O’Reilly should know better by now.

— AWG

No SQL, No Class, No Thanks

“This is actually the NoSQL Conference” quipped one of the RailsConf 2010 presenters half way through his presentation, offering cutting commentary on the NoSQL fanboy aura of the week. This resonated with me since by Wednesday afternoon I was thoroughly sick of the nonsense, my daily dose of schadenfreude provided by his tale of MongoDB crapping itself unceremoniously and comprehensively upon ingesting too much data. “If you drink the KoolAid and believe the hype then you get what you deserve” he added.

Earlier that day I had sat through HashRocket’s “Million Dollar Mongo” presentation by Durran Jordan and Obie Fernandez. I was thoroughly unimpressed with their arguments in favor of MongoDB. It stunk of bad science and sleight of hand all the way through with a finale of utter juvenility.

Many of their arguments centered on the failings of MySQL. A memorable comment centered on how awesomely the system they built on MongoDB performed and how attrociously the predecessor MySQL-based system performed. It did not occur to them that comparing a poorly implemented solution built atop MySQL to a (possibly) well implemented solution atop MongoDB did not make for good science. Firstly, the folks building the original system may have had poor knowledge of how to use an RDBMS. Secondly, a rewrite of a system can benefit from the hard learned lessons of its predecessor, assuming it avoids the dreaded Second System Effect.

They went on to mention how their migration code that pulled from the existing MySQL system and placed data into MongoDB caused MySQL to saturate the CPU while MongoDB was all chill. The audience was supposed to be impressed that making a poorly designed relational database put together documents was CPU intensive while dumping files to a disk wasn’t. Oh, the insights…

Meanwhile, a common theme centered on how “schema free data makes migrations painless”. Anyone incapable of smelling the BS in this from a mile away has clearly been relieved of his olfactory senses. If one goes galavanting about in a schema free world, one of the following must be true…

  1. new code can’t operate on old data
  2. new code must be able to parse all pre-existing data formats
  3. old data must be migrated to mesh with new code

Of these, the first option would cause any business intelligence analyst to burst into flames, the second would cause massive bloat in the data access layer of the code base, and the third option sounds an awful lot like pretending not to have a schema when you clearly do. I noted that a guy a couple rows up from me was scribbling notes in his laptop, and one bullet was “painless migration? heh… what do you do with you production data?”, so apparently I wasn’t alone.

HashRocket also bemoaned how MySQL locks up the database for acceptably long periods of time when migrating the schema for a large system. This is indeed a known problem. That said, MySQL is not the end all and be all of RDBMSes. Others, such as Postgres, do not have this issue, so comparing MongoDB to MySQL with the intent of proving that document databases are better than relational databases comes across as a pile of logic fail.

I wanted to raise these issues and see how the HashRocket guys would respond, but then the talk took an unexpected turn as the presenters decided to air dirty laundry on stage. Apparently the client for the project under discussion had concerns about MongoDB and asked another firm to give a second opinion. This firm purportedly offered the opinion that HashRocket “had made the desicion to use MongoDB as an academic exercise”. I guffawed at this and then realized I was the only one in the audience laughing. In any case, the amusement of hearing that this other firm had voiced exactly the opinion I’d been holding for the whole presentation was unbearably ironic. For all the disdain coming from HashRocket about this, earlier in the talk Durran had quipped that he liked the idea of doing a MongoDB solution because it was “an opportunity to escape from the typical boring CRUD Rails app where you’ve got your gems and you’re just gluing things together” (paraphrasing as best I can).

HashRocket then went on to out the other firm, saying that they were represented at RailsConf, and naming them as Intridea. Not satisfied with this, they tried to make themselves out as having been horribly wronged by Intridea, saying that they were undermining the community spirit of the Ruby and Rails world. I wanted to barf, but they got a round of applause from the audience. By the end of this nonsense I was too disgusted to ask any technical questions so I just kept my mouth shut.

After the next session I ran into a friend in the hallway and as we walked out of the building to go to lunch I discussed the talk with him. In addition to the generally unprofessional behavior of the presenters, I raised my concern that the whole “painless migrations” thrust was utter nonsense. Riding down the escalator, I noticed that a guy behind us was leaning in close, seemingly to eavesdrop. I continued to talk, not really caring, and then this guy entered the discussion. It turned out he was HashRocket’s Tim Pope and he had voiced concerns similar to mine at the start of the project. I mentioned the three scenarios from above, one of which one must accept if living in a schema free world. “Aren’t you really just living with a schema and refusing to admit it with a document database?”, I asked. “Yeah, when you go to production that advantage pretty much evaporates.”, he responded. When I pressed him further on the issue all I got was “I don’t wish to offer further comment on this.”.

I’m frankly tired of the loads of terrible-at-SQL developers who hope this movement invalidates their weakness. Relational databases have their problems and limitations, but if you know when and how to use them they are pretty awesome. MongoDB itself readily admits that it is “less well suited” to “traditional business intelligence”, among other things.

Pushing and pulling documents may be easy and fit certain use cases just fine, but the general strategy of “embed everything” has onerous implications for many things. How do you update an embedded entity? You have to do it everywhere. How do you do analysis on embedded entities? You have to rip through all of the containing documents.

Let’s not forget that for many tasks computers are far more capable at optimization than application developers. Decades ago when relational databases were young and dinosaurs roamed the earth there was contention over whether the query language for RDBMSes ought be procedural or declarative. SQL, a declarative language, won out in the end. For the vast majority of cases, it is far better to specify the data you want and let a query planner optimize its retrieval. Yet all of the NoSQL zealots seem determined to abandon this knowledge with their document databases and procedural MapReduce jobs.

SQL isn’t going anywhere anytime soon. RDBMSes and document databases can reasonably co-exist in an architecture, possibly with the latter serving as a caching layer for the former as well as a home for very specific data. I’m cool with the idea of document databases, but I can’t wait for NoSQL to die.

— AWG

VMWare Performance Tricks

The deeper the technology stack runs, the more administrators and engineers struggle to keep everything in good working order. VMWare offers a wonderful suite of benefits, but virtualization opens up a real can of worms, and with the host OS trying to be clever some really frustrating issues can appear.

At some point in the last few months, a VM on my laptop (an Ubuntu Intrepid Ibex image for some contract work I’m doing) started behaving really poorly. I wasn’t sure whether it was an issue on the host OS (which went from Jaunty to Koala), an issue on the guest OS (on which I installed X), or just a matter of time marching on (fragmentation).

After trying a few different fixes, I am of the opinion that my problem stemmed from CPU frequency scaling on the host OS playing havoc with VMWare. As best I can tell, this was causing the guest OS to be CPU starved, presumably due to a VMWare bug. That said, I (unscientifically) have done four total things since going from a state of fail to a state of grace…

  • found a way to set the scaling governor on the host CPUs
  • tweaked the VM to use only a single CPU
  • defragmented the VM’s image
  • disabled memory page trimming for the VM

Under Ubuntu (Karmic Koala, anyway) one can use the cpufreq-selector program to tweak the scaling governor for the host CPUs. For me, this meant running the following two commands on my dual CPU system…

cpufreq-selector -c 0 -g performance
cpufreq-selector -c 1 -g performance

It’s easy enough to validate that things got set…

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor

If you need a list of possible values for the scaling governor…

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governor
cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_available_governor

To make VMWare not suck, I changed the values from “ondemand” to “performance”. The other possibilities were “conservative”, “userspace” and “powersave”, none of which I have yet explored. The downside to “performance” of course is that your laptop is going to blow through its battery. Of course, if you’re plugged into the wall, you probably don’t care as much. You’re just destroying the planet.

At some point before I found the CPU scaling frequency fix I had momentarily convinced myself that the other three tweaks had fixed things. I have the feeling that I was fooled as the result of my laptop’s power management software fiddling with the scaling governors; as confirmation, I note that other folks on the Internet state they have resorted to having scripts that set the governors after they return from a hibernate/suspend operation.

I am happy to have (apparently) fixed my problem, but I’m left with a nagging sensation of our technology stack gradually getting out of control. It has so many moving parts, and in this feature hungry world not nearly enough time gets allocated to robustness and clarity.

It’s digital turtles all the way down.

— AWG

Stored Procedure Management With Rails

Maintaining synchrony between application code and a database schema in a production system can prove a burdensome affair. As soon as version N hits the production system, people start using it (with some luck), and this causes information to pile up in the database against what will ultimately be an obsolete schema. Developers continue to write code for what will become version N+1 of not just the application but also the database schema, and meanwhile users are working against schema version N. Willful ignorance of this will yield a painful day of reckoning when the time comes for release of version N+1.

As with many things in life, an as-you-go approach can greatly reduce pain and risk. Ruby On Rails’ database management package ActiveRecord provides not just a handy DSL for manipulating database schemas but also a way to chunk usage of this DSL into discrete Migrations stamped with version numbers. They end up with file names of the form 20091101123456_table_foobar_created or 20091102356412_table_foobar_adds_column_baz. The numerical piece at the front of the file name gets used as the version number of the migration and has the current date as its initial piece if generated with”ruby script/generate migration”.

The version numbers govern the order in which a Rake task applies them and the structure of the numbers handily results in a listing of the containing directory rendering them in the order that the task will apply them. In addition to the tables that an application declares explicitly, Rails also manages a table called schema_versions that contains just a single “version” column that it uses to track which migrations it has already processed. Upon invocation of”rake db:migrate”, a Rake task looks at all of the migrations contained in an application’s “db/migrations” directory, sorts them by version number, reconciles this list against the the schema_migrations table, then runs (in order) the migrations not yet applied, and lastly updates the schema_migrations table to record the newly processed migrations.

This usually results in fairly painless production environment upgrades. A simple migration validation involves taking a snapshot of the production database, placing it in a test environment, running “rake db:migrate”, and lastly ensuring that things are as they should be. The Rake task applies only the migrations necessary to bring the database schema from version N to version N+1 by examining the schema_migrations table in the production system and finding the migrations it needs to run. Likewise, the code sandboxes of multiple contributing developers typically benefit from this process.

All of this is quite lovely, but recall that Rails’ philosophy is very much against usage of “smart” databases, so much so that foreign key management in ActiveRecord Migrations (ARMs) are enabled by a plug-in that does not live in core Rails, and the only way to manage stored procedures in Migrations is to place their declarations within “execute %{}” blocks that simply run arbitrary SQL to create or change them. This works, for sure, but starts to exhibit friction when viewed in the context of versioning and change tracking.

Stored procedures inhabit a realm somewhere between data and application code. They are chunks of procedural code, but they live in the database and thus get manipulated in the same fashion as database tables. This raises the issue of where to place them in the code base. A developer will likely initially place them within ARMs that do various other things, e.g. create a migration that creates a table and also creates a stored procedure to operate on it and then binds it with a trigger declaration. This works, insofar as it gets the procedure installed, but the error of this may become evident later when the procedure requires a change to its definition.

Where should the developer apply the change?

If the last migration to manipulate the procedure has not yet been published, then it is possible to simply go back and edit it within that migration. What to do, though, if that migration has already made it to production, or even just been applied by other developers to their sandboxes? There needs to be another migration that redefines the procedure… This seems somewhat icky as it entails a copy-paste from the earlier migration into a new one, an action that will prevent usage of the version control system’s diffing faculties. This isn’t so bad, as one can resort to using a stand-alone diff operation, but this diff in the best case gets cluttered and in the worst case proves unreadable due to the copy-pasted file being diffed against a file that has more than just the stored procedure declaration in it.

Thus the temptation to pull out the stored procedure into a separate file that the VCS can version like other application code and that an ARM can reference indirectly… Sadly, this introduces subtle bugs and thus cannot serve as a safe solution. Whereas “rake db:migrate”applies the files in the db/migrations directory in a specific order, anything referenced by those files will be a file just hanging out in a directory. This means that a migration could update a stored procedure to a later version prematurely.

Consider the following… In version N of the application (a VCS version, not a release version) there is a particular revision of a stored procedure. In version N+1 a developer tweaks this stored procedure. In version N+2 a developer has an ARM that manipulates the database in a way that hinges on the behavior of the stored procedure. In version N+3 a developer again tweaks the stored procedure. If a developer whose sandbox is at revision N invokes the VCS’ “update” command, and furthermore if the management strategy involves having a stored procedure not stored as a literal in an ARM but rather as being the contents of some other file referenced by an ARM, then the ARM for version N+2 will operate under the auspices of the stored procedure at revision N+3 (a future revision), potentially yielding incorrect results. So having a stored procedure live in a file referenced by an ARM can’t work for folks who care to guarantee correctness.

Things brings us to the only workable solution, a hybrid of the two aforementioned solutions. Specifically, one must have the stored procedure be in a file of its own, and that file must be an ARM. The former property ensures clean diff operations and the latter property ensures that procedures are updated when they ought to be and no sooner. The actual diff operation entails using a stand-alone diff as opposed to the VCS’s diff but this proves tolerable enough (the author furthermore recommends the KDE program “kompare”).

Of course, one might already be well into a Rails project by the time such a realization occurs. As such a bit of history revisionism may be in order. There may be an ARM that creates a table, then creates a stored procedure, then creates a trigger to bind the stored procedure. To get the proverbial house in order, break this ARM into three distinct ARMs, one that firstly creates the table, another that secondly creates the stored procedure, and finally one that binds the stored procedure with a trigger. There is, however, a gotcha in all this. Recall the way in which”rake db:migrate” figures out which migrations to apply. To hack around this, one must do the ARM-break-apart operation and then fake out the Rake task by inserting the version numbers of those newly created tasks into the schema_versions table, thus preventing the Rake task from blowing up when, say, it cannot create a table that already exists.

Having done all this, the VCS may prove a little confounded in its commentary. This fix-up basically “explodes” one file into three. When the author first attempted this with Git as his VCS, a “git status” command reported there as being three operations, namely a move, a rename, and a copy. Git makes a valiant effort to be clever by implementing “code following”, but doing something really wacky such as this leaves it making some moderately humorous guesses as to what the hell has happened.

Coming to the end of our journey, let us consider the implications of homing stored procedures in ARMs. Whereas in the case of storing functions in “regular” files one can find them easily, storing them in ARMs poses the “where the hell is the latest definition?” problem. Naming conventions to the rescue… A simple but consistent way of naming ARMs makes answering such a question trivial. Use names such as table_foo_adds_column_baz, proc_bar_caches_calculated_baz_value, and so forth, and finding the file containing the latest definition for the “bar” function becomes a simple matter of “ls db/migrations | grep proc_bar | tail -n 1”.

You just have to know how to use your tools.

— AWG