Category Archives: Ruminations

OSCON Tutorial Fail

The notion of a “tutorial” at OSCON (or any O’Reilly conference) always seems like a good idea, but far too often these tutorials crater spectacularly. For the money people are paying, the class experience ought to unfold like polished ballet, and yet it regularly manifests as a stumbling experience of library issues, network problems, general confusion, and mounting frustration.

Every participant brings a laptop. With any luck people have pre-installed some number of dependencies. Regularly instructors have not finalized their tutorial code until the eleventh hour. They might gloss over “boring” details such as how account management and configurations differ from distro to distro. The session I’m attending at the present, Test Driven Database Development, has burned half of its allotted time and yet has accomplished little more than getting the first test case to run for most of the students.

I should not need to have anything on my laptop apart from an SSH client. The conference should host a local VMWare cluster in which each student has a private VM cloned from a VM that a class’s instructor has tested to death. Or maybe I should be able to download a VM from a local file server and fire it up on my laptop (or at least have this as an option).

Given how things often go, you might end up believing that the conference organizers and/or tutorial presenters haven’t ever managed a real software application or done anything resembling system administration. Package versions matter. Configuration quirks matter. Unambiguous instructions matter. Telling people to download a bunch of packages (often with loose guidance on which version to use), having them run them on a variety of OSes, and expecting anything other than general mayhem for a classroom full of people is unreasonably optimistic.

Simple things will trip you up. My Postgres instance blew up because of a disagreement between it and the OS about reasonable shared memory buffer sizes after a config file was overwritten while installing a required package. Pulling down the instructor’s code with a “git clone” took forever because the instructor had a large file in his repository and the conference’s wireless network is underprovisioned. My version of pgTAP installed its functions in a schema subtly different than what the instructor’s code expected.

I’m perfectly capable of working through such system administrivia, but in a class environment, competing for the instructor’s time with dozens of other students, and working with a poor network connection, the three hours of class time just goes up in smoke with far too little value captured.

O’Reilly should know better by now.

— AWG

No SQL, No Class, No Thanks

“This is actually the NoSQL Conference” quipped one of the RailsConf 2010 presenters half way through his presentation, offering cutting commentary on the NoSQL fanboy aura of the week. This resonated with me since by Wednesday afternoon I was thoroughly sick of the nonsense, my daily dose of schadenfreude provided by his tale of MongoDB crapping itself unceremoniously and comprehensively upon ingesting too much data. “If you drink the KoolAid and believe the hype then you get what you deserve” he added.

Earlier that day I had sat through HashRocket’s “Million Dollar Mongo” presentation by Durran Jordan and Obie Fernandez. I was thoroughly unimpressed with their arguments in favor of MongoDB. It stunk of bad science and sleight of hand all the way through with a finale of utter juvenility.

Many of their arguments centered on the failings of MySQL. A memorable comment centered on how awesomely the system they built on MongoDB performed and how attrociously the predecessor MySQL-based system performed. It did not occur to them that comparing a poorly implemented solution built atop MySQL to a (possibly) well implemented solution atop MongoDB did not make for good science. Firstly, the folks building the original system may have had poor knowledge of how to use an RDBMS. Secondly, a rewrite of a system can benefit from the hard learned lessons of its predecessor, assuming it avoids the dreaded Second System Effect.

They went on to mention how their migration code that pulled from the existing MySQL system and placed data into MongoDB caused MySQL to saturate the CPU while MongoDB was all chill. The audience was supposed to be impressed that making a poorly designed relational database put together documents was CPU intensive while dumping files to a disk wasn’t. Oh, the insights…

Meanwhile, a common theme centered on how “schema free data makes migrations painless”. Anyone incapable of smelling the BS in this from a mile away has clearly been relieved of his olfactory senses. If one goes galavanting about in a schema free world, one of the following must be true…

  1. new code can’t operate on old data
  2. new code must be able to parse all pre-existing data formats
  3. old data must be migrated to mesh with new code

Of these, the first option would cause any business intelligence analyst to burst into flames, the second would cause massive bloat in the data access layer of the code base, and the third option sounds an awful lot like pretending not to have a schema when you clearly do. I noted that a guy a couple rows up from me was scribbling notes in his laptop, and one bullet was “painless migration? heh… what do you do with you production data?”, so apparently I wasn’t alone.

HashRocket also bemoaned how MySQL locks up the database for acceptably long periods of time when migrating the schema for a large system. This is indeed a known problem. That said, MySQL is not the end all and be all of RDBMSes. Others, such as Postgres, do not have this issue, so comparing MongoDB to MySQL with the intent of proving that document databases are better than relational databases comes across as a pile of logic fail.

I wanted to raise these issues and see how the HashRocket guys would respond, but then the talk took an unexpected turn as the presenters decided to air dirty laundry on stage. Apparently the client for the project under discussion had concerns about MongoDB and asked another firm to give a second opinion. This firm purportedly offered the opinion that HashRocket “had made the desicion to use MongoDB as an academic exercise”. I guffawed at this and then realized I was the only one in the audience laughing. In any case, the amusement of hearing that this other firm had voiced exactly the opinion I’d been holding for the whole presentation was unbearably ironic. For all the disdain coming from HashRocket about this, earlier in the talk Durran had quipped that he liked the idea of doing a MongoDB solution because it was “an opportunity to escape from the typical boring CRUD Rails app where you’ve got your gems and you’re just gluing things together” (paraphrasing as best I can).

HashRocket then went on to out the other firm, saying that they were represented at RailsConf, and naming them as Intridea. Not satisfied with this, they tried to make themselves out as having been horribly wronged by Intridea, saying that they were undermining the community spirit of the Ruby and Rails world. I wanted to barf, but they got a round of applause from the audience. By the end of this nonsense I was too disgusted to ask any technical questions so I just kept my mouth shut.

After the next session I ran into a friend in the hallway and as we walked out of the building to go to lunch I discussed the talk with him. In addition to the generally unprofessional behavior of the presenters, I raised my concern that the whole “painless migrations” thrust was utter nonsense. Riding down the escalator, I noticed that a guy behind us was leaning in close, seemingly to eavesdrop. I continued to talk, not really caring, and then this guy entered the discussion. It turned out he was HashRocket’s Tim Pope and he had voiced concerns similar to mine at the start of the project. I mentioned the three scenarios from above, one of which one must accept if living in a schema free world. “Aren’t you really just living with a schema and refusing to admit it with a document database?”, I asked. “Yeah, when you go to production that advantage pretty much evaporates.”, he responded. When I pressed him further on the issue all I got was “I don’t wish to offer further comment on this.”.

I’m frankly tired of the loads of terrible-at-SQL developers who hope this movement invalidates their weakness. Relational databases have their problems and limitations, but if you know when and how to use them they are pretty awesome. MongoDB itself readily admits that it is “less well suited” to “traditional business intelligence”, among other things.

Pushing and pulling documents may be easy and fit certain use cases just fine, but the general strategy of “embed everything” has onerous implications for many things. How do you update an embedded entity? You have to do it everywhere. How do you do analysis on embedded entities? You have to rip through all of the containing documents.

Let’s not forget that for many tasks computers are far more capable at optimization than application developers. Decades ago when relational databases were young and dinosaurs roamed the earth there was contention over whether the query language for RDBMSes ought be procedural or declarative. SQL, a declarative language, won out in the end. For the vast majority of cases, it is far better to specify the data you want and let a query planner optimize its retrieval. Yet all of the NoSQL zealots seem determined to abandon this knowledge with their document databases and procedural MapReduce jobs.

SQL isn’t going anywhere anytime soon. RDBMSes and document databases can reasonably co-exist in an architecture, possibly with the latter serving as a caching layer for the former as well as a home for very specific data. I’m cool with the idea of document databases, but I can’t wait for NoSQL to die.

— AWG

VMWare Performance Tricks

The deeper the technology stack runs, the more administrators and engineers struggle to keep everything in good working order. VMWare offers a wonderful suite of benefits, but virtualization opens up a real can of worms, and with the host OS trying to be clever some really frustrating issues can appear.

At some point in the last few months, a VM on my laptop (an Ubuntu Intrepid Ibex image for some contract work I’m doing) started behaving really poorly. I wasn’t sure whether it was an issue on the host OS (which went from Jaunty to Koala), an issue on the guest OS (on which I installed X), or just a matter of time marching on (fragmentation).

After trying a few different fixes, I am of the opinion that my problem stemmed from CPU frequency scaling on the host OS playing havoc with VMWare. As best I can tell, this was causing the guest OS to be CPU starved, presumably due to a VMWare bug. That said, I (unscientifically) have done four total things since going from a state of fail to a state of grace…

  • found a way to set the scaling governor on the host CPUs
  • tweaked the VM to use only a single CPU
  • defragmented the VM’s image
  • disabled memory page trimming for the VM

Under Ubuntu (Karmic Koala, anyway) one can use the cpufreq-selector program to tweak the scaling governor for the host CPUs. For me, this meant running the following two commands on my dual CPU system…

cpufreq-selector -c 0 -g performance
cpufreq-selector -c 1 -g performance

It’s easy enough to validate that things got set…

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor

If you need a list of possible values for the scaling governor…

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governor
cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_available_governor

To make VMWare not suck, I changed the values from “ondemand” to “performance”. The other possibilities were “conservative”, “userspace” and “powersave”, none of which I have yet explored. The downside to “performance” of course is that your laptop is going to blow through its battery. Of course, if you’re plugged into the wall, you probably don’t care as much. You’re just destroying the planet.

At some point before I found the CPU scaling frequency fix I had momentarily convinced myself that the other three tweaks had fixed things. I have the feeling that I was fooled as the result of my laptop’s power management software fiddling with the scaling governors; as confirmation, I note that other folks on the Internet state they have resorted to having scripts that set the governors after they return from a hibernate/suspend operation.

I am happy to have (apparently) fixed my problem, but I’m left with a nagging sensation of our technology stack gradually getting out of control. It has so many moving parts, and in this feature hungry world not nearly enough time gets allocated to robustness and clarity.

It’s digital turtles all the way down.

— AWG

Stored Procedure Management With Rails

Maintaining synchrony between application code and a database schema in a production system can prove a burdensome affair. As soon as version N hits the production system, people start using it (with some luck), and this causes information to pile up in the database against what will ultimately be an obsolete schema. Developers continue to write code for what will become version N+1 of not just the application but also the database schema, and meanwhile users are working against schema version N. Willful ignorance of this will yield a painful day of reckoning when the time comes for release of version N+1.

As with many things in life, an as-you-go approach can greatly reduce pain and risk. Ruby On Rails’ database management package ActiveRecord provides not just a handy DSL for manipulating database schemas but also a way to chunk usage of this DSL into discrete Migrations stamped with version numbers. They end up with file names of the form 20091101123456_table_foobar_created or 20091102356412_table_foobar_adds_column_baz. The numerical piece at the front of the file name gets used as the version number of the migration and has the current date as its initial piece if generated with”ruby script/generate migration”.

The version numbers govern the order in which a Rake task applies them and the structure of the numbers handily results in a listing of the containing directory rendering them in the order that the task will apply them. In addition to the tables that an application declares explicitly, Rails also manages a table called schema_versions that contains just a single “version” column that it uses to track which migrations it has already processed. Upon invocation of”rake db:migrate”, a Rake task looks at all of the migrations contained in an application’s “db/migrations” directory, sorts them by version number, reconciles this list against the the schema_migrations table, then runs (in order) the migrations not yet applied, and lastly updates the schema_migrations table to record the newly processed migrations.

This usually results in fairly painless production environment upgrades. A simple migration validation involves taking a snapshot of the production database, placing it in a test environment, running “rake db:migrate”, and lastly ensuring that things are as they should be. The Rake task applies only the migrations necessary to bring the database schema from version N to version N+1 by examining the schema_migrations table in the production system and finding the migrations it needs to run. Likewise, the code sandboxes of multiple contributing developers typically benefit from this process.

All of this is quite lovely, but recall that Rails’ philosophy is very much against usage of “smart” databases, so much so that foreign key management in ActiveRecord Migrations (ARMs) are enabled by a plug-in that does not live in core Rails, and the only way to manage stored procedures in Migrations is to place their declarations within “execute %{}” blocks that simply run arbitrary SQL to create or change them. This works, for sure, but starts to exhibit friction when viewed in the context of versioning and change tracking.

Stored procedures inhabit a realm somewhere between data and application code. They are chunks of procedural code, but they live in the database and thus get manipulated in the same fashion as database tables. This raises the issue of where to place them in the code base. A developer will likely initially place them within ARMs that do various other things, e.g. create a migration that creates a table and also creates a stored procedure to operate on it and then binds it with a trigger declaration. This works, insofar as it gets the procedure installed, but the error of this may become evident later when the procedure requires a change to its definition.

Where should the developer apply the change?

If the last migration to manipulate the procedure has not yet been published, then it is possible to simply go back and edit it within that migration. What to do, though, if that migration has already made it to production, or even just been applied by other developers to their sandboxes? There needs to be another migration that redefines the procedure… This seems somewhat icky as it entails a copy-paste from the earlier migration into a new one, an action that will prevent usage of the version control system’s diffing faculties. This isn’t so bad, as one can resort to using a stand-alone diff operation, but this diff in the best case gets cluttered and in the worst case proves unreadable due to the copy-pasted file being diffed against a file that has more than just the stored procedure declaration in it.

Thus the temptation to pull out the stored procedure into a separate file that the VCS can version like other application code and that an ARM can reference indirectly… Sadly, this introduces subtle bugs and thus cannot serve as a safe solution. Whereas “rake db:migrate”applies the files in the db/migrations directory in a specific order, anything referenced by those files will be a file just hanging out in a directory. This means that a migration could update a stored procedure to a later version prematurely.

Consider the following… In version N of the application (a VCS version, not a release version) there is a particular revision of a stored procedure. In version N+1 a developer tweaks this stored procedure. In version N+2 a developer has an ARM that manipulates the database in a way that hinges on the behavior of the stored procedure. In version N+3 a developer again tweaks the stored procedure. If a developer whose sandbox is at revision N invokes the VCS’ “update” command, and furthermore if the management strategy involves having a stored procedure not stored as a literal in an ARM but rather as being the contents of some other file referenced by an ARM, then the ARM for version N+2 will operate under the auspices of the stored procedure at revision N+3 (a future revision), potentially yielding incorrect results. So having a stored procedure live in a file referenced by an ARM can’t work for folks who care to guarantee correctness.

Things brings us to the only workable solution, a hybrid of the two aforementioned solutions. Specifically, one must have the stored procedure be in a file of its own, and that file must be an ARM. The former property ensures clean diff operations and the latter property ensures that procedures are updated when they ought to be and no sooner. The actual diff operation entails using a stand-alone diff as opposed to the VCS’s diff but this proves tolerable enough (the author furthermore recommends the KDE program “kompare”).

Of course, one might already be well into a Rails project by the time such a realization occurs. As such a bit of history revisionism may be in order. There may be an ARM that creates a table, then creates a stored procedure, then creates a trigger to bind the stored procedure. To get the proverbial house in order, break this ARM into three distinct ARMs, one that firstly creates the table, another that secondly creates the stored procedure, and finally one that binds the stored procedure with a trigger. There is, however, a gotcha in all this. Recall the way in which”rake db:migrate” figures out which migrations to apply. To hack around this, one must do the ARM-break-apart operation and then fake out the Rake task by inserting the version numbers of those newly created tasks into the schema_versions table, thus preventing the Rake task from blowing up when, say, it cannot create a table that already exists.

Having done all this, the VCS may prove a little confounded in its commentary. This fix-up basically “explodes” one file into three. When the author first attempted this with Git as his VCS, a “git status” command reported there as being three operations, namely a move, a rename, and a copy. Git makes a valiant effort to be clever by implementing “code following”, but doing something really wacky such as this leaves it making some moderately humorous guesses as to what the hell has happened.

Coming to the end of our journey, let us consider the implications of homing stored procedures in ARMs. Whereas in the case of storing functions in “regular” files one can find them easily, storing them in ARMs poses the “where the hell is the latest definition?” problem. Naming conventions to the rescue… A simple but consistent way of naming ARMs makes answering such a question trivial. Use names such as table_foo_adds_column_baz, proc_bar_caches_calculated_baz_value, and so forth, and finding the file containing the latest definition for the “bar” function becomes a simple matter of “ls db/migrations | grep proc_bar | tail -n 1”.

You just have to know how to use your tools.

— AWG

Rail Rash

I’m not supposed to be writing this essay right now. I’m supposed to be well into a three hour bike ride. Instead, five minutes into the ride, I hit one of the rails of an old freight siding at too small an angle while traveling at a non-trivial velocity and ate some pavement. Though I was apparently not sufficiently skilled to avoid the fall, if there’s one thing I know how to do it’s hit the ground under control, a skill perhaps learned through some combination of falling while skiing, diving while playing volley ball, and slide tackling while serving as a soccer goalie. I had the presence of mind to shoot my hands out, which took the brunt of the fall yet were unscathed thanks to the gloves I destroyed. My left elbow, left forearm, left knee and left hip took the secondary impact, as did various portions of my bike as evidenced by assorted things being scuffed or knocked out of alignment. My arm got pretty shredded and bled a bit, my hip was almost fully protected from abrasion (though bruised) by my cycling pants which did not give way, and my knee managed to tear through the pants and get a minor abrasion but suffer more from impact. I’m happy to report that my instinct to arch my back and pull my head backward means I still have all of my teeth and original facial characteristics (for better or worse). I also had occasion to remember why I carry a folding set of Allen wrenches which proved quite useful in doing field realignments.

But that’s not the main focus of this story. It’s supposed to be tale of frustration with another kind of rail, specifically Ruby On Rails. For a while I’ve been meaning to document some of my experiences with ROR’s ActiveRecord, its module for object/relational mapping. While AR has not caused me substantial physical pain in the fashion that another kind of rail recently did, it has been the source of much recent anguish.

In the course of writing a Rails app intended to be part user interface, part data warehouse, and part task management system, I’ve repeatedly stretched AR to its breaking point or at the very least stumbled into some of its darker corners. I’ve managed to get it to fail in two ways, firstly by generating syntactically invalid SQL, and secondly by generating semantically invalid SQL. Syntax fail is annoying, but in a very immediate sort of way. You know you’ve got a problem the moment you try to run the code. Semantics fail, however, is far more nefarious, causing subtle misbehavior that may not manifest right away or even fly under the radar when it does.

AR performs reasonably well when loading from a single table or when managing simple relationships. I was, however, enticed by one of AR’s slicker recent offerings, named_scope. A named_scope provides you with an auto-generated function that returns a collection of entities narrowed by the “scope” which can be defined to have one or more conditions. There’s even a nifty plug-in that auto-generates the negated version of all named_scopes. And, most excitingly, you can chain invocations of named_scopes and it narrows the returned collection by all of the scopes by under the hood building a comprehensive SQL “WHERE” clause. This sounds great until you realize that AR is not nearly smart enough to do this correctly.

One major failure comes from the named_scope implementation not being well synchronized with other query logic. If you have the misfortune of using a named_scope with a :joins attribute and calling “find” with an :include attribute that pulls relatives from one or more of the same tables via a has_one declaration, AR passes syntactically invalid SQL to the database with duplicate table aliases. This results from Rails defaulting to loading has_one relatives with a LEFT OUTER JOIN whereas has_many is done with a distinct query with a “WHERE id IN (…)” clause. There is a workaround for this, but it is undocumented and extremely obscure. You have to override X class method Y always to return Z. Doing so makes AR always use the “WHERE id IN (…)” style, thus avoiding the problem. At least this bug was generating syntactically invalid SQL and thus causing an immediate crash.

Another obvious failure to synchronize regions of the AR code base manifests when you have a :joins clause in a named_scope and also in an invocation of “find” with the intent to apply an :order parameter to joined entities. This somehow confuses AR under certain circumstances and causes the result set to contain duplicate rows. The only solution to this, short of ditching the :joins in one place or the other, is apparently to add a :select parameter to the “find” invocation of the form “DISTINCT table_name.*”. Unlike the former bug, this one generates semantically invalid SQL which the database is happy to execute for you and then return a subtly incorrect result set.

As if this subtle misbehavior were not bad enough, it gets worse. Suppose you declare multiple named_scopes on an entity, multiple such scopes have :joins attributes that reference the same tables, and these scopes furthermore have :conditions that reference the joined table. If you use just one such scope, things will occur as you expect. If you use two or more of them, however, AR performs multiple distinct joins, leaves the table name in the first join unaliased, and aliases each subsequent join in an unpredictable fashion. This leads you into the trap of having the :conditions on different joins all reference the table from the first join, causing very unpredictable results (semantics fail again). There does not appear to be any good resolution for this if your conditions are not simple equality tests (meaning that you have no choice but to use the string form of :conditions). Your only apparent option is to rewrite the :conditions to use not a join but a subquery. I would have thought that the :conditions parameter would have allowed the string form to contain a macro reference to the joined table that would be expanded by AR once the alias had been determined. No such luck…

Another maddening limitation of AR is the restriction of :joins parameters only to be used for inner join operations. If you wish to perform an outer join, then instead of referencing other tables by their symbolic identifiers, you’re stuck having your :joins parameter be a literal string that contains one or more joins operations. Theoretically you’re supposed to be able to pass an array that mixes symbolic table references and literal strings, but due to an apparent bug that causes the documentation to disagree with the code’s execution, passing an array that mixes strings and symbols causes AR to treat them all as symbols, the result of which is error messages like “ActiveRecord cannot find relationship ‘LEFT OUTER JOIN bar on bar.id = foo.bar_id'”. The solution to this problem is to use one big string :joins that contains both your inner and outer joins. The problem with this, however, is that AR fails to realize that it needs to alias table names in a way that does not collide, meaning that an unfortunate combination of a named_scope and a “find” operation will generate (semi-thankfully) syntactically invalid SQL. Again, the best solution appears to be to rewrite the :conditions in your named_scope to use a subquery instead of a join.

ActiveRecord has many charms, but it also falls flat on its face in many circumstances. At this point, I feel sufficiently well versed in its foibles that I can avoid them while leveraging its strengths. This means leaning more heavily on the database than I would have expected was required with Rails. Subqueries and database views can come to your rescue, and perhaps for many things those are the more elegant solutions, but being forced to use them, and more to the point being forced to realize their necessity by suffering repeated AR failures to behave as one might expect, is a long way from ideal. AR has enough escape hatches that you can work around its limitations by using various string forms of arguments instead of symbolic forms, but when you do this AR finds itself unsure of what you’re doing and you will start to trip over one another, necessitating more “fixes”, in a fashion that often feels like a race to the bottom in which you don’t really get to use many or AR’s niceties.

To far too great an extent, ActiveRecord feels like it is slapping together SQL, throwing it over the wall to the RDBMS, and just hoping for the best, then washing its hands of the matter if things go less than perfectly.

  — AWG

(Un)Focused

“There’s no shortage of things on which I’m asking our R&D and acquisition folks to focus.”

— Ashton Carter, September 2009 (the Pentagon’s top acquisition, technology and logistics executive)

If such a thing as institutional Attention Deficit Hyperactivity Disorder exists, then the DoD most certainly suffers from it. Solving challenging technical problems requires a deep understanding that comes only with ruthlessly sustained focus. There is, however, no real cap on the number of inchoate messes that one can generate, modulo public outrage at cost-overruns or a battlefield failure so epic that even the best propaganda engine could not hope to paper over it.

Despite what the US federal government’s recent profligacy might lead a person to believe, financial resources are finite and creditors will eventually call us to account. Even if cost were of little concern, there exists a finite labor pool of the kind of technical and managerial talent required to convert money into useful systems.

Fools believe that pouring more money into the fire results in a linear increase in productivity without bound. More reasonable folk assume a decreasing ROI for each unit of currency spent beyond some threshold. In fact, it’s even worse than this. Cross another threshold and one not only sees terrible bang-for-buck, but total production for the system actually begins to plummet.

This productive decline takes root in three things. First, a burgeoning budget results in increased scrutiny, causing engineers and their managers to spend an inordinate amount of time justifying their choices and defending funding. Second, as the size of an organization grows, the quality of employees almost inevitably declines, causing top-flight engineers to expend their brain-cycles not thinking deep thoughts but rather cajoling, pleading and fighting to prevent others from making poor design choices or executing shoddy implementations. Third, as costs increase, high level management aims to increase efficiency by compartmentalizing organizations into service-oriented specialty shops, this causing bureaucratic churn that results in a combination of waste and cronyism-based coalition building.

Sometimes a timely infusion of cash provides the necessary activation energy for a project to attain escape velocity. Beyond a point, though, less is more. Patience, humility, determination, discipline and mastery… These are the qualities we need in our engineers and managers. Too often are people enticed by the shiny in the race for the next promotion, the campaign for the next election, that comfy private sector sinecure, or the opportunity simply to avoid rocking the boat because that’s the easy way to coast to retirement. We can’t solve all of everyone’s problems forever right now. The choice is between providing decent solutions of moderate duration for some problems in a moderate amount of time or trying to do it all while delivering nothing at great cost.

— AWG

iBrick

Oh, the joys of getting to the gate at the airport 3.5 hours before your scheduled takeoff… $7.95 to use the wireless network? Really? FINE! See if I ever fly through here again, DTW. In any case, I’d been putting off upgrading my iPhone to the 3.0 software due to a fear of bricking my phone in the midst of three weeks of travel. What better thing to do with my 3.5 hours? Sprung the money, bricked my iPhone… FAIL.

I had a nagging feeling that this might happen. I’m running Ubuntu Linux as the host OS on my laptop. You can’t install the iTunes software under Linux, so I’m running it under WinXP as a guest OS on VMWare. Theoretically this should be fine, but somewhere in the back of my mind something was saying “don’t do it!”. I did it anyway, albeit at a point in time where the nuisance of bricking my phone was minimized. I won’t say it was a mistake, but things didn’t go as smoothly as I had hoped.

I plugged my iPhone into a USB port and iTunes suggested that I update the software so I got on with it. After a gigantic download on a not-so-fast network, iTunes got to the business of doing the flash. That’s when things went wrong. Apparently the update process causes the iPhone to reboot, which in turn causes a reconnect to the attached system, thus precipitating a pissing contest between the host and guest OSes. iTunes under WinXP saw the device go away (which it did), then VMWare started throwing errors/warnings about a module on the host OS having claimed the device, requiring manual intervention to reconnect it to the guest OS. By this time the flash had failed and I was the proud owner of a brick. iTunes informed me that I could do a recover operation. Gritting my teeth, I proceeded. I got the same conflict again. Also, VMWare managed to wedge my whole machine, requiring a battery-pull and unplugging to recover.

I did a Google search on the terms “iphone”, “vmware”, and “upgrade”. I ended up here which gave me the hint that I needed to give my kernel modules a stern talking-to. Another Google search on the terms “linux”, “prevent”, “kernel”, “module”, and “load” yielded this. Following my nose, I added the following two lines to /etc/modprobe.d/blacklist.conf:

  • blacklist usbhid
  • blacklist snd-usb-audio

After this I gave the recovery process another whirl and, voila, I had an iPhone that allows portrait-mode text messaging as well as copy-paste, albeit an iPhone suffering amnesia and still unwilling to tether. I’d been meaning to clean up my contact list for a few years. There was even a number for a friend who died a couple of years ago that I couldn’t bring myself to delete. Now it’s done.

Expected task time: 15 minutes

Actual task time: about 1.5 hours

Also, my flight is delayed for another hour. The bar down the hall is looking awfully tempting but that would likely cost me my power outlet. Decisions, decisions…

— AWG

Early And Often

Presently in the midst of taking the 50th annual two-week Human Factors Engineering short course offered by the University of Michigan, I find myself struck by parallels between HF and another domain in which I have non-trivial interest and knowledge, namely security. If you mingle in security research circles, you likely find two maxims oft repeated. First, “security is a process, not a product”. Second, “security has to be baked into a system”. Substitute “security” with “human factors” and one captures the frustrations of human factors engineers as voiced by many of the lecturers in my present course. Security experts dread having a “complete” system thrown over the wall to them with the odious task to “make the system secure”. Likewise HF folks chafe at the prospect of having a “complete” system tossed in their laps with the directive to “make it like the iPhone”. Given the (not so?) well known principle that fundamental defects become exponentially more expensive to repair as a project progresses, one marvels at the software development world’s need to have such independent “revelations” in various sub-disciplines.

The parallels do not end there. Practitioners of both subject matters have notorious difficulty quantifying costs, benefits, and risks. Leaky Abstractions furthermore pose myriad problems. As such, early and ongoing consideration seem the key to success. Iterative and Agile development practices are nothing new, but the industry regularly forgets to exert sufficient pressure on various sub-systems to shake out significant flaws before they become cripplingly expensive. User interfaces, security, networking, storage systems… As elegantly as we may modularize our systems, the inherent leakiness of abstraction boundaries between these sub-systems manages to burn us again and again. Consequently, we must aggressively and regularly conduct usability tests, performance tests, and penetration tests. One cannot overestimate the criticality of regular integration testing and sanity checking. Even seemingly simple systems are hard to get right.

— AWG

UI Tomfoolery

Brevity is the soul of wit, or so we’re told, though BP Oil wasn’t listening when crafting the user interface for their gasoline pumps. I’m put of mind of Joel Spolsky’s article about the umpteen different ways that one can instruct Windows Vista to “sleep”.

While not quite so prolific as this, BP’s pumps still manage to offer the user a bewilderingly non-orthogonal set of “OK” buttons. These include such ambiguously different options as “Enter”, “Start”, and “Yes”. You’re expected to hit “Enter” after entering your credit card number’s zip code. After you’ve lifted the nozzle for the desired grade you must press “Start”. If you desire a receipt at the end of the transaction, the “Yes” button is your friend.

Invariably I manage to use the wrong button at every step. More infuriatingly still, pressing the wrong “OK” button or pressing the right button with insufficient pressure yield indistinguishable results. A little bit of tactile and/or audio feedback go a long way.

For the love of god, Don’t Make Me Think. I want a big, fat, obvious “Go” button at every step. I want to unthinkingly jam a button that clearly means “I’m done entering my zip code”, “I have selected my gasoline grade”, or “give me a receipt”. Either give me a single physical “Proceed” button or dynamically render a more descriptive “Go” button at every step without leaving other ambiguous options available. Don’t make my distracted brain struggle to ascertain which of three “Just Do It” buttons is the correct one to get what I want. Never mind that these three buttons are scattered all over the damn place, perhaps in a crude attempt to hide that they are really all the same button…

Updated 09/06/2010 to include new sighting on the same machine…

Also, thanks for trashing the Gulf Coast, BP… You guys rock! I’m semi-embarrassed to buy gas at your station but you’re the only one in my neighborhood for some reason.

— AWG

Overhead

A long time ago in a galaxy far away, I was a high school student. In hindsight, this involved many artificially tedious things that I took for granted in the same way that I accepted various laws of Physics. The want of superior technology resulted in all manner of needless inefficiencies and cumbersome overhead. Such impedances ultimately drove down the rate at which I could ram knowledge into the confines of my skull. One particularly good example thereof involves my studies of the French language. Specifically, looking up words that I did not know entailed an inordinate amount of work.

When reading a piece of text in a foreign language, one can choose from two modes of operation when dealing with unknown words. Either one can stop at every unknown word and look it up, or one can read all the way through the article without worrying overly much about comprehension, recording unknown words along the way and translating them in batch mode at the end, and then read through the article again. Both have their advantages and drawbacks. The as-you-go approach avoids an extra reading, but the regular interruption to go to a dictionary proves highly disruptive. The batching approach reduces disruption, but in some ways it results in extra work, such as looking up words you might have understood from context if some other word had been known sooner rather than later.

This represents a false dichotomy, thrust upon the student by non-availability of technology that reduces the expense of context switches. Instant translation that does not force a disruptive modality changes the rules of the game. The appearance of online dictionaries improved things significantly. Not having to thumb through a dictionary reduces the context switch overhead enormously. It does, however, still involve an unfortunate amount of clumsiness and flow disruption. Ideally, when reading a sentence that contains an unknown word, one ought with a single keystroke be able to start a parallel process that brings up a translation without disrupting completion of the sentence being read. Having to copy and paste a word into a web browser and then wait for the response soon becomes irksome.

I have for some time now had a decent solution to this problem embodied in an Emacs extension. During my vacation of this week I decided to improve and generalize it. The reader may find the full code at the end of this essay.

The extension makes use of the W3M library for browsing in Emacs and a modicum of custom elisp to glue everything together. The extension exposes a simple plug-in system of its own that allows the user to integrate with different translation systems. The code exhibits sufficient compactness and descriptiveness that its mechanics ought to be mostly self-evident.

The end result is that I can press F5 and enter a URL, say http://www.lemonde.fr, and commence browsing French articles within Emacs. When I encounter a French word that I don’t know, I put the cursor over it and press F6. My current window stays open so I can continue to read. Eventually my definition appears and I peruse it. When finished with it, I press F8 and the definition vanishes. The whole process is the same except for hitting F7 instead of F6 if I happen to be perusing, say, http://www.welt.de.

One mildly obnoxious thing about the W3M library is the inability to specify a callback function to be invoked when an asynchronous W3M function completes. Since I wanted my translation plug-ins to have such a callback faculty, I had to switch the W3M library’s mode of operation from asynchronous to synchronous. A word to the wise for library writers: if you expose an asynchronous API, you should always provide a hook via which “after” callbacks can be invoked. Failure to do so renders the lives of API users needlessly difficult.


(add-to-list 'load-path "~/emacs-w3m-1.4.4")
(require 'w3m-load)

(defcustom w3m-async-exec nil
  "*Non-nil means execute the w3m command asynchronously in Emacs process."
  :group 'w3m
  :type 'boolean)

(defun grab-word-under-cursor ()
  (forward-char 1)
  (backward-word 1)
  (mark-word 1)
  (copy-region-as-kill (point) (mark))
  (car kill-ring))

(setq translator-hash (make-hash-table :test 'equal))

(defun add-translator (source-lang target-lang url-constructor post-load-hook)
  (puthash (concat source-lang ":" target-lang)
           (cons url-constructor (cons post-load-hook nil))
           translator-hash))

(defun get-url-constructor (source-lang target-lang)
  (car (gethash (concat source-lang ":" target-lang) translator-hash)))

(defun get-post-load-hook (source-lang target-lang)
  (car (cdr (gethash (concat source-lang ":" target-lang) translator-hash))))

(defun translate-word (word source-lang target-lang)
  (split-window-vertically)
  (other-window 1)
  (w3m-goto-url-new-session
   (apply (get-url-constructor source-lang target-lang) word nil))
  (let ((post-load-hook (get-post-load-hook source-lang target-lang)))
    (if (not (equal nil post-load-hook)) (apply post-load-hook nil))))

(defun translate-current-word (source-lang target-lang)
  (translate-word (grab-word-under-cursor) source-lang target-lang))

(add-translator
 "french" "english"
 (lambda (word) (concat "http://www.wordreference.com/fren/" word))
 nil)

(add-translator
 "german" "english"
 (lambda (word) (concat "http://dictionary.reverso.net/german-english/" word))
 (lambda () (search-forward "See also:") (recenter 0) (beginning-of-line)))

(defun translate-current-word-french-to-english ()
  (interactive)
  (translate-current-word "french" "english"))

(defun translate-current-word-german-to-english ()
  (interactive)
  (translate-current-word "german" "english"))

(defun kill-and-close ()
  (interactive)
  (kill-this-buffer)
  (delete-window (selected-window)))

(global-set-key [(f5)] 'w3m-goto-url)
(global-set-key [(f6)] 'translate-current-word-french-to-english)
(global-set-key [(f7)] 'translate-current-word-german-to-english)
(global-set-key [(f8)] 'kill-and-close)

— AWG