Maintaining synchrony between application code and a database schema in a production system can prove a burdensome affair. As soon as version N hits the production system, people start using it (with some luck), and this causes information to pile up in the database against what will ultimately be an obsolete schema. Developers continue to write code for what will become version N+1 of not just the application but also the database schema, and meanwhile users are working against schema version N. Willful ignorance of this will yield a painful day of reckoning when the time comes for release of version N+1.
As with many things in life, an as-you-go approach can greatly reduce pain and risk. Ruby On Rails’ database management package ActiveRecord provides not just a handy DSL for manipulating database schemas but also a way to chunk usage of this DSL into discrete Migrations stamped with version numbers. They end up with file names of the form 20091101123456_table_foobar_created or 20091102356412_table_foobar_adds_column_baz. The numerical piece at the front of the file name gets used as the version number of the migration and has the current date as its initial piece if generated with”ruby script/generate migration”.
The version numbers govern the order in which a Rake task applies them and the structure of the numbers handily results in a listing of the containing directory rendering them in the order that the task will apply them. In addition to the tables that an application declares explicitly, Rails also manages a table called schema_versions that contains just a single “version” column that it uses to track which migrations it has already processed. Upon invocation of”rake db:migrate”, a Rake task looks at all of the migrations contained in an application’s “db/migrations” directory, sorts them by version number, reconciles this list against the the schema_migrations table, then runs (in order) the migrations not yet applied, and lastly updates the schema_migrations table to record the newly processed migrations.
This usually results in fairly painless production environment upgrades. A simple migration validation involves taking a snapshot of the production database, placing it in a test environment, running “rake db:migrate”, and lastly ensuring that things are as they should be. The Rake task applies only the migrations necessary to bring the database schema from version N to version N+1 by examining the schema_migrations table in the production system and finding the migrations it needs to run. Likewise, the code sandboxes of multiple contributing developers typically benefit from this process.
All of this is quite lovely, but recall that Rails’ philosophy is very much against usage of “smart” databases, so much so that foreign key management in ActiveRecord Migrations (ARMs) are enabled by a plug-in that does not live in core Rails, and the only way to manage stored procedures in Migrations is to place their declarations within “execute %{}” blocks that simply run arbitrary SQL to create or change them. This works, for sure, but starts to exhibit friction when viewed in the context of versioning and change tracking.
Stored procedures inhabit a realm somewhere between data and application code. They are chunks of procedural code, but they live in the database and thus get manipulated in the same fashion as database tables. This raises the issue of where to place them in the code base. A developer will likely initially place them within ARMs that do various other things, e.g. create a migration that creates a table and also creates a stored procedure to operate on it and then binds it with a trigger declaration. This works, insofar as it gets the procedure installed, but the error of this may become evident later when the procedure requires a change to its definition.
Where should the developer apply the change?
If the last migration to manipulate the procedure has not yet been published, then it is possible to simply go back and edit it within that migration. What to do, though, if that migration has already made it to production, or even just been applied by other developers to their sandboxes? There needs to be another migration that redefines the procedure… This seems somewhat icky as it entails a copy-paste from the earlier migration into a new one, an action that will prevent usage of the version control system’s diffing faculties. This isn’t so bad, as one can resort to using a stand-alone diff operation, but this diff in the best case gets cluttered and in the worst case proves unreadable due to the copy-pasted file being diffed against a file that has more than just the stored procedure declaration in it.
Thus the temptation to pull out the stored procedure into a separate file that the VCS can version like other application code and that an ARM can reference indirectly… Sadly, this introduces subtle bugs and thus cannot serve as a safe solution. Whereas “rake db:migrate”applies the files in the db/migrations directory in a specific order, anything referenced by those files will be a file just hanging out in a directory. This means that a migration could update a stored procedure to a later version prematurely.
Consider the following… In version N of the application (a VCS version, not a release version) there is a particular revision of a stored procedure. In version N+1 a developer tweaks this stored procedure. In version N+2 a developer has an ARM that manipulates the database in a way that hinges on the behavior of the stored procedure. In version N+3 a developer again tweaks the stored procedure. If a developer whose sandbox is at revision N invokes the VCS’ “update” command, and furthermore if the management strategy involves having a stored procedure not stored as a literal in an ARM but rather as being the contents of some other file referenced by an ARM, then the ARM for version N+2 will operate under the auspices of the stored procedure at revision N+3 (a future revision), potentially yielding incorrect results. So having a stored procedure live in a file referenced by an ARM can’t work for folks who care to guarantee correctness.
Things brings us to the only workable solution, a hybrid of the two aforementioned solutions. Specifically, one must have the stored procedure be in a file of its own, and that file must be an ARM. The former property ensures clean diff operations and the latter property ensures that procedures are updated when they ought to be and no sooner. The actual diff operation entails using a stand-alone diff as opposed to the VCS’s diff but this proves tolerable enough (the author furthermore recommends the KDE program “kompare”).
Of course, one might already be well into a Rails project by the time such a realization occurs. As such a bit of history revisionism may be in order. There may be an ARM that creates a table, then creates a stored procedure, then creates a trigger to bind the stored procedure. To get the proverbial house in order, break this ARM into three distinct ARMs, one that firstly creates the table, another that secondly creates the stored procedure, and finally one that binds the stored procedure with a trigger. There is, however, a gotcha in all this. Recall the way in which”rake db:migrate” figures out which migrations to apply. To hack around this, one must do the ARM-break-apart operation and then fake out the Rake task by inserting the version numbers of those newly created tasks into the schema_versions table, thus preventing the Rake task from blowing up when, say, it cannot create a table that already exists.
Having done all this, the VCS may prove a little confounded in its commentary. This fix-up basically “explodes” one file into three. When the author first attempted this with Git as his VCS, a “git status” command reported there as being three operations, namely a move, a rename, and a copy. Git makes a valiant effort to be clever by implementing “code following”, but doing something really wacky such as this leaves it making some moderately humorous guesses as to what the hell has happened.
Coming to the end of our journey, let us consider the implications of homing stored procedures in ARMs. Whereas in the case of storing functions in “regular” files one can find them easily, storing them in ARMs poses the “where the hell is the latest definition?” problem. Naming conventions to the rescue… A simple but consistent way of naming ARMs makes answering such a question trivial. Use names such as table_foo_adds_column_baz, proc_bar_caches_calculated_baz_value, and so forth, and finding the file containing the latest definition for the “bar” function becomes a simple matter of “ls db/migrations | grep proc_bar | tail -n 1”.
You just have to know how to use your tools.
— AWG