The problem of language-to-database mapping.
I urge our readers to have a look at the “Vietnam of Computer Science” by Ted Neward, which compares the quagmire of the Vietnam War to the current quagmire that results from our attempts to blend object-oriented languages with relational or even object-relational databases. A link to Ted's article can be found at www.odbms.org/vietnam.html.
The article discusses a problem called object-relational impedance mismatch. Here's how I'd sum up the problem:
We have two great technologies at our disposal: object-oriented languages and relational databases. Problems occur, however, when you try to blend the two, because neither is designed to work seamlessly with the other. A Query-By-Example style of programming may solve the problem, but this works only for simple database access. Mapping classes to tables may work, but the normalization of databases makes this approach difficult. For example, a “customer” table is not likely to include the city and state where the customer lives. A database administrator (DBA) will likely pull out that data and store it by zip code in another table. An option is simply to pass query strings to the database and sort out the fields manually. This is likely to lead to performance problems and breakage if the data types are not handled properly. To confuse matters more, even if you get everything right, there's always the possibility that a DBA will change the database in such a way that it breaks your code.
Languages like PHP include tricks to help smooth out the language-to-database mapping, but even these tricks can be undermined by certain changes to the database schema. The problem lies in the fact that these options are merely tricks, not true database-to-language mapping techniques.
LJ: Does that sum up things fairly well or is there something you want to add?
Ted: There's a lot of issues at stake here, one of which is the fundamental tension between developers and DBAs over “who owns the data”, and more important, the schema corresponding to that data. Developers who own the schema will create something that's comfortable for them to use, but awkward for the DBAs to manipulate, and DBAs who own the schema will create something that's easy for them to maintain and report from, but which in turn creates awkwardness for the developers. To suggest that a technology—any technology—is going to eliminate completely those problems that are fundamentally rooted in politics is somewhat foolish.
Christof: The object/relational, dual-schema approach, which puts data above the application and into the hands of a DBA, has organizational advantages in traditional, centralized enterprise IT environments. There is, however, a whole class of applications, where the data store is entirely embedded and invisible to the end user—for example, in device software, in packaged software on your cell phone or PC, in real-time control systems and in SOA applications. These zero-admin database scenarios have no benefit from having two schemata, but incur all the cost of reconciling these inherently incompatible models. Distributed and mobile software architectures, proliferated in a networked world, drive demand for zero-administration database engines, so we will hear a lot more of them in the future.
LJ: I realize that not all databases that call themselves “relational” actually conform to the Codd and Date academic view of relational databases. But, that aside, is this problem limited to relational database mapping to OO languages? Why don't object-relational databases (ORDBMSes) solve this problem? Or do they?
Ted: In a lot of ways, the impedance mismatch isn't limited only to objects and relations; trying to stuff objects into a hierarchical data format (like XML) suffers from some of the same problems. So, to suggest that this is “just” an object-relational problem is misleading. Date's position on this is interesting; he insists that an ORDBMS really doesn't exist, that the fields of an object are, in fact, simply nothing more than columns in a table (or attributes in a relation, to use his more formal terminology). That also implies that inheritance is nothing more than a simple association between tables, somehow silently joined together in a manner that he doesn't seem to specify clearly (at least, not in his eighth edition of Introduction to Database Systems). While I'm not going to try to debate relational theory with him, I suspect that his views of the object world are somewhat skewed and therefore not entirely accurate.
LJ: Some have questioned why you drew the comparison between this technical problem and the Vietnam War, and in fact suggested that it's an entirely inappropriate comparison. What are your thoughts on that?
Ted: I have two reactions, really. First of all, to all those people who are offended that someone would draw an analogy to Vietnam that wasn't somehow rooted in war or political conflict, I'm sorry that they're offended, but American involvement in Vietnam ended more than 30 years ago, and it's time that we as a nation grow up and stop nursing old wounds. Yes, bad things happened, and some of them happened to us, and some of them happened because of us. It's high time we start looking at Vietnam critically, instead of in a knee-jerk emotional reaction state.
Second, Vietnam is in many ways a perfect analogy to what goes on with many object/relational-mapping tools, not just because Vietnam is the synonym to “quagmire” these days, but because, according to Robert McNamara's recently published memoirs, American leadership knew that they were getting into a potential quagmire, and thought they could manage it somehow. To many, Vietnam was the definition of unclear goals, but McNamara's memoirs make it clear that America was, in fact, trying to “win” the war, which meant winning “the hearts and minds” of the Vietnamese people. It just wasn't clear how they could accomplish that with the tools available to them. O/R-M is a similar situation: it's clear what we want to have happen, it's just not clear how we can make it work.
LJ: I got the impression last time we spoke that db4o users were well aware of this impedance mismatch, and that's why they contributed code and requests that addressed this very problem rather than pressure you to embrace the SQL model. Do I read that right?
Christof: Yes, what happened was that some managers wanted to see a check mark next to “SQL” in their evaluation spreadsheets. However, OO developers don't want SQL to access their data, unless they have to or are unaware of the alternatives. In fact, SQL is a DBA language, not a developer language. Our developers speak Java and .NET. So, db4o has “Native Queries”, the ability to query the database with native Java or .NET semantics, a type-safe and 100% OO approach, for instance.
LJ: And what about reporting, for instance?
Christof: If you run a distributed application with db4o, you usually don't need reporting (do you run Crystal Reports in your car?). If you still need to link your data at some stage to your back-end RDBMS, then you can use the db4o Replication System (dRS), which uses Hibernate to sync persisted objects into a central relational data warehouse for analysis, backup and so on.
Reporting, specifically, may actually go OO. Several vendors in the Java space (Actuate, Elixir, JasperSoft) and Microsoft in .NET (Visual Studio 2005—ReportViewer) have brought OO reporting tools to the market. And, people may find it easier to report against a plain business object, say “customer”, rather than umpteen normalized tables with cryptic names.
LJ: How exactly does db4o address this impedance mismatch?
Ted: The db4o approach, like other OODBMSes, avoids the impedance mismatch because we're not trying to store anything other than objects into the system. In other words, there's no “mapping”, per se, because there's nothing to map to. (Obviously, internally db4o is doing some storage tricks to avoid blatant inefficiencies, but these are the same tricks that any relational database plays and are, for the most part, entirely black box and removed from the end user's perspective.) This means that the schema of the stored data is that of the objects themselves, thus avoiding the “dual-schema problem” I mentioned in the Vietnam essay.
Christof: Class model == database schema.
LJ: Is it fair to say that you want db4o to appear as an extension of Java, thus avoiding an impedance mismatch? Is that even possible to accomplish?
Ted: I'm not sure I'd say that it's an extension of Java, so much as a mostly transparent persistence system. There have been numerous research projects over the years that have tried to make the persistence entirely transparent, including several within the Java space, and they play interesting tricks like hooking constructors to create persistent objects on “new” calls, and loading objects out of persistent space when invoking non-default constructors, and so on. Most of these haven't made it out of the research space, for a variety of reasons, so I'd be a bit wary of suggesting that db4o “extends” Java (or .NET, for that matter).
Christof: Strictly speaking, you're right, Ted. But if we have a design philosophy, it is exactly that—let's be as transparent as possible. Let's use the semantics and behavior of Java or .NET for persistence wherever we can to make it least intrusive and most intuitive to developers.
LJ: So the learning curve for db4o is not very steep?
Christof: No. In fact, there is a podcast on odbmsjournal.org (Episode 2) that shows that you are up and running with db4o in five minutes—including the download! What's more challenging though, is that some people have to unlearn bad (=non-OO) habits. They ask, “Where's my primary key?” (There is no primary key in OOP.) So, for us, it is actually easiest to work with young developers, especially in Asia, who have no mental legacy and enjoy a ten-times higher productivity when writing their persistence-related code.
LJ: When I talk to database designers and programmers, I often hear them sing the praises of multi-value databases like PICK. Do you get any demand from your users and developers for multi-value fields?
Ted: My experience has been different—that multi-value databases are awkward and difficult to work with. I think what ultimately drives the discussion is what one's own experience is like, and what you find to be obvious and intuitive to you. For myself, and I think Date would agree with this, multi-value fields are anathema and something to be avoided, because I personally believe pretty strongly in the power of the relational model for data storage and manipulation.
LJ: How does that fit in with O/R mapping? Does that present more problems?
Ted: In some respects, no, because it would be “just” a List (or other collection) stored as an attribute of the class. But in other ways, it would represent a significant problem for O/R-M tools, because now trying to decide if a List inside a class should map to a multi-value field or an association to another table would require yet another annotation/attribute on the field to control the mapping, creating even more coupling between the object model and the database schema.
LJ: Does the GPL-ization of Java factor into your work in any way? Does it present problems or opportunities for your business?
Christof: The GPL-ization, in fact, the dual licensing of Java, is great news for db4o on three accounts. First, open-sourcing Java certainly fosters the Java ecosystem, from IBM to Eclipse to many open-source projects and startups. Second, with open source, we can look to build a much closer integration of db4o's persistence solution with the Java VM than previously possible. Third, the dual-license model itself, as used by MySQL, db4objects, Trolltech and many others, has received further endorsement as a viable open-source business model, which also makes our life as a company much easier.
LJ: How have your customers reacted to the fact that Java is going GPL? Do you see any increase or decrease in interest in Java? A decrease or increase in interest in db4o?
Christof: We have seen a huge increase in Java users of db4o during the last few months, both in absolute terms as well as in proportion to our other platform, .NET. We don't really know whether this is going back to open-sourcing Java, but that would provide a good explanation.
LJ: Thank you so much for taking the time and effort to speak with us.