At the Forge

Database Integrity and Web Applications

Reuven M. Lerner

Issue #220, August 2012

Want to improve the integrity of your data? Place constraints in the database, as well as in your application.

NoSQL, the catchall phrase for non-relational databases, is all the rage among Web developers. However, it's somewhat unfair and unhelpful to use the term NoSQL to describe them, given the variety of technologies involved. Even so, there are some fundamental differences between traditional relational databases and their NoSQL counterparts. For one, as the name implies, NoSQL databases don't use the standard SQL query language, and use either their own SQL-like language (for example, MongoDB) or an object-oriented API. Another difference is the lack of two-dimensional tables; whereas SQL databases operate solely with such tables, NoSQL databases eschew them in favor of name-value pairs or hash-like objects. And finally, NoSQL databases typically lack the features that led to the development of relational databases, namely transactions and data integrity.

There's no doubt that the flexibility NoSQL databases offer is attractive on a number of levels. Just as I enjoy working with dynamic languages in which I don't have to declare my variables (or their types) before I use them, it's nice to be able to store objects in my database without having to define the object structure in advance. If I want to add a new field to my Person object, I just do so, and the database magically catches up.

At the same time, there are many cases when I want the database to be tough with me and enforce the integrity of my data. That is, I want to be sure that even if I have made a mistake in my application, or if a user enters a value that shouldn't be allowed, the database won't allow that bad data to be stored. And yes, I believe that it's good to have such checks at the database level, rather than just at the application level—not only because it provides an additional guard against corrupt data, but also because the database often is accessed directly, outside the application itself. I often have to do what I refer to as “database surgery” on applications that are running for my clients, and it's always reassuring to know I cannot make a change manually that would corrupt the data.

Readers of this column know I'm a fan of both Ruby on Rails and of PostgreSQL, and I often use them together on projects. However, because Rails originally was developed under MySQL, which lacks many of the data-integrity aspects of PostgreSQL, the standard Active Record package fails to include such items as foreign keys in its base implementation. This means that although Rails will support PostgreSQL out of the box, it doesn't provide support for foreign keys, let alone data-integrity checks.

So, in this article, I look at some Ruby gems that make it possible to include foreign-key support in your Active Record models. I also describe ways you can take advantage of PostgreSQL's CHECK mechanism to ensure that your data is as safe as possible.

Where Do You Check Your Data?

Before I go into the actual code and implementation, I admit there are several philosophies of where and how to check your data. As I mentioned previously, I prefer to have the data checked at both the database and application levels, although sometimes I've been lazy and used the default in Rails that does so only at the application level.

Checking your data only at the application level is attractive in many ways. It makes the implementation easier, and it often will work just fine. It also, in the case of Rails, allows you to work in a single language (that is, Ruby) and put the integrity checks in your model files, where you're going to be using them.

But as I already said, if I have my checks only in the application, it's quite possible that when I try to access the database from outside my application, I'll end up messing things up, if only accidentally. Now, I'll admit this is the way many Web applications are designed, and it's not a fatal flaw. But it is something that I'll try to avoid in this article.

It would be a bigger mistake to try to do things the other way around—that is, to have the checks only in the database, rather than at the application level. Imagine, for example, if you were to have a “NOT NULL” constraint in the database, such that a particular column could not contain a NULL value. If you didn't protect against such things at the application layer, you might end up trying to put that data into your database. And although the database itself would not be corrupted, the application would generate an error, confusing and frustrating users.

So, upsetting and annoying as it might be, the best way to do things probably is to duplicate some work—adding the constraints on the database and then having them on the application as well. You could argue that you should have the constraints in place in a third location—in the HTML form the user often will use to submit information to the server. Fortunately, there is at least one technique for handling this sort of thing automatically. The client_side_validations gem for Rails copies the validations as best as possible, putting them into JavaScript within the user's views.

Foreign Keys

With this background, let's now assume that I want to implement a simple appointment calendar. In order to implement my calendar, I need to keep track of people (each of whom has a first name, a last name, an e-mail address and a telephone number) and appointments (which will indicate the date/time of the appointment, the person with whom I'm meeting and a note about the meeting itself).

At the database level, the tables will be fairly straightforward. This is how I would create the tables if I were doing so manually:

CREATE TABLE People (
    id SERIAL,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    email TEXT NOT NULL,
    phone TEXT NOT NULL,
    PRIMARY KEY(id)
);

CREATE TABLE Appointments (
    id SERIAL,
    person_id INTEGER REFERENCES People NOT NULL,
    notes TEXT NOT NULL,
    PRIMARY KEY(id)
);

The important thing here, for the purposes of this discussion, is the REFERENCES People clause in the definition of the Appointments table. With the REFERENCES clause in place, I can do the following:

INSERT INTO People (first_name, last_name, email, phone)
    VALUES ('Reuven', 'Lerner', 'reuven@lerner.co.il',
    ↪'847-230-9795');
INSERT INTO Appointments (person_id, notes) VALUES (1, 'Meet with
 ↪myself');

Now, here's what happens if I try to delete myself from the People table:

atf=# DELETE FROM People WHERE id = 1;
ERROR:  update or delete on table "people" violates foreign key
constraint "appointments_person_id_fkey" on table "appointments"
DETAIL:  Key (id)=(1) is still referenced from table "appointments".

Because Appointments.person_id is a foreign key to People.id, I cannot remove a row from People if Appointments refers to it. This sort of relational integrity is a great thing for my application. No matter how you slice it, it means I cannot remove people from the system if they are scheduled for an appointment.

Rails Integration

If these tables were part of a Web application that I was developing in Ruby on Rails, I would use database migrations to create them. The migrations would look something like this (if combined into a single file):


class CreatePeopleAndAppointments < ActiveRecord::Migration

  def change
    create_table :people do |t|
      t.string :first_name, :null => false
      t.string :last_name, :null => false
      t.string :email, :null => false
      t.string :phone, :null => false

      t.timestamps
    end

    create_table :appointments do |t|
      t.integer :person_id, :null => false
      t.text :notes, :null => false

      t.timestamps
    end
  end
end

Now, this will give me the table definitions, but it won't give me the foreign keys, meaning that I could corrupt the database by deleting a single “People” record. In order to do that, at least in the default Rails configuration, I need to use an “execute” command in the migration to send explicit SQL to the database.

Fortunately, there is an easier way. The Foreigner gem, written by Matthew Higgins, which works with MySQL as well as with PostgreSQL, adds syntax to let you create and remove foreign keys within your migrations. For example, with Foreigner active—putting it in the Gemfile and then running bundle install—I can add a new migration that does the following:


class AddForeignKey < ActiveRecord::Migration
  def up
    add_foreign_key :appointments, :people
  end

  def down
    remove_foreign_key :appointments, :people
  end
end

Sure enough, after running this migration, I have the foreign key that I was hoping for. Again, it's still important for me to have this check in my Rails model; otherwise, I easily could get myself into a situation that the database forbids but the model allows and, thus, generate a bad error for my users.

Note: if you add the foreign key after you already have populated the database with some data, you might run into trouble. That's because PostgreSQL won't let you add a foreign key if one or more rows fail to abide by the declared constraint. The NOT NULL constraint additionally ensures that you point to some person in the system. In other words, every appointment has to be with someone, and it has to be with someone in the People table.

Now, let's say you already have been working on a project, but you have neglected to define foreign keys. You could go through each table, figure out which is pointing to which, and then handle it accordingly, adding the sorts of migrations shown above. But the Immigrant gem (also written and released by Matthew Higgins) looks at Rails models and adds foreign keys for every has_one and has_many column that points to a belongs_to column in another model.

Additional Checks

Foreigner is a great step forward, helping ensure the integrity of your data. But there is another issue, more subtle than that of foreign keys, to which I'm still susceptible. While I have ensured that the person_id will not be NULL and will point to a record in the People table, I haven't ensured that the People table will contain valid and reasonable values. That is, although the first_name, last_name, e-mail and phone columns will not contain NULLs, they might contain empty strings or values (for example, e-mail addresses) that are invalid.

At the database level, I could handle this sort of problem with a CHECK clause. Such a clause ensures that illegal data—for whatever “illegal” values I want to define—cannot be placed in the database. This could be anything from a certain minimum or maximum length of text string, to a pattern in the text, to minimum or maximum numbers. For example, I often like to indicate that the database may not contain a price lower than 0, and that e-mail addresses need to match a very basic regular expression. (Note that matching e-mail addresses is not for the faint of heart, at least if you want to do it correctly.)

So given my People table, I could define a set of CHECK clauses that would ensure that the first_name field is non-empty. In other words, the first_name cannot be NULL, but it also cannot be the empty string:


ALTER TABLE People ADD CONSTRAINT
    people_first_name_non_empty_chks CHECK (first_name <> '');

Note that although I could add a number of checks within this single constraint, I prefer not to do so. That gives me greater flexibility to add and remove constraints, and it also ensures that when a constraint is violated, PostgreSQL will accurately tell me which one it was.

Now, how can I implement this in my Rails migration, and do I want to? My answer, as you can imagine, is that this would indeed be a good thing to include in the database.

Once again, a Ruby gem comes to the rescue. This one, sexy_pg_constraints, was written by Maxim Chernyak, but it has since forked and is maintained by several other people.

I can include it in my Rails application by adding the following to my Gemfile:

gem 'Empact-sexy_pg_constraints', :require => 'sexy_pg_constraints'

and by uncommenting the following line in config/application.rb:

config.active_record.schema_format = :sql

Simply put, sexy_pg_constraints adds a number of additional attributes that you can pass to an add_column or change_column invocation within your migration. For example, let's say I want to make sure, as before, that the first_name column is never blank—neither NULL nor the empty string. I can do this by saying:


class AddConstraints < ActiveRecord::Migration
  def up
    constrain :people, :first_name, :not_blank => true
  end

  def down
    deconstrain :people, :first_name
  end
end

After I apply this migration, I discover the following in my table definition:

"people_first_name_not_blank" CHECK 
 ↪(length(btrim(first_name::text)) > 0)

In other words, I am checking to ensure that after removing all whitespace from either side of the string, the length is greater than 0. Sounds like it worked to me!

sexy_pg_constraints comes with a large number of options, including whitelisting, blacklisting, matching e-mail addresses and checking the format of data. The only thing this gem is missing, by my estimate, is a way to get the model file to communicate automatically with the database and/or the migrations, so that you don't have to add these in both places manually. Even so, by adding these constraints, you improve the integrity of your data without having to go too far out of the Rails migration framework.

Conclusion

Database constraints are there to save people from themselves, and they are a great feature offered by relational databases. In exchange for a bit of work up front, and for some small performance penalties during runtime, you can ensure that your data remains intact. I would argue that doing this sort of validation is important at both the database and the application levels. A number of Ruby gems, as I explained here, make it possible to do this sort of integration within Ruby on Rails.