LJ Archive

MySQL Introduction

David Axmark

Michael xxxxxxx Widenius

Issue #67, November 1999

A look at the MySQL database—where it's been, where it is now, and where it's going.

MySQL's history goes back to 1979 when TcX, the company that developed MySQL, started working with database programs. This first version was a screen builder/reporting tool written in BASIC. At that time, state-of-the-art computers had 4MHz Z80 processors and 16KB of RAM. This tool was moved to UNIX and further developed during the following years. In the mid-1990s, we started having problems with customers who liked the results the tool produced but wanted something they had heard about before (a buzzword was needed). So we started looking at making an SQL (an appropriate buzzword) front end to our low-level libraries. We found mSQL, but it did not work for our purposes. So we started to write an SQL engine from scratch. However, since the mSQL API was useful, we used it as the basis for our own API. This made it easy to port some applications we needed that were available for the mSQL API.

Since this tool would be usable by others, we decided to release it according to the business model pioneered by Peter Deutsch at Aladdin Enterprises with Ghostscript. This copyright is much more free than the mSQL copyright and allows commercial use as long as you don't distribute the MySQL server commercially.

It is not perfectly clear where the name MySQL came from. We have used the prefix “my” for libraries and path names since the mid-1980s. The main MySQL developer's daughter is named My—a fairly common name among Swedish-speaking Finns—so naming our database MySQL was very natural.

In May 1996, MySQL version 1.0 was released to a limited group of four people, and in October 1996, MySQL 3.11.1 was released to the public as a binary distribution for Solaris. A month later, a Linux binary and the source distribution were released. The MySQL release included an ODBC driver in source form. This also included many free MySQL clients ported to MySQL.

Platforms

The initial version of MySQL worked only on Linux and Solaris. The biggest problem in porting to other platforms was that MySQL needed a working POSIX thread library; in January 1997 a modified version of MIT-pthreads was included in the distribution.

Table 1.

MySQL Language APIs

Table 2.

To be able to use MySQL from your favorite language, you need an API. The first MySQL version included only C and Perl APIs. Now there are many (see Table 2). With the exception of the Java API, all of these use the C API to communicate with the MySQL server. So, as you can see, MySQL can be used from many popular languages.

MySQL Benchmarks and crash-me

When we had gotten a nice working system, we wanted to test it against old versions and against other databases, so we started looking for good benchmarks. We found that most benchmarks (like the TCP ones) represent an SQL server's performance as a single number, often as transactions/second. We regard these to be almost worthless, since comparatively few users run applications that do the same thing as these benchmarks. There is usually no way to determine your application's performance from the numbers given by this type benchmark.

The MySQL benchmarks are designed to show how fast a SQL server is for common operations, such as establishing a connection, performing simple inserts or joining two tables using a key. This also makes it possible to calculate loads on a web site when you know the mix of operations. Of course, you need to actually understand your own application to judge its performance with any database.

Over time, we got many requests on the MySQL mailing lists about MySQL's features and how it compares feature-wise with other databases. As Michael (the main developer) didn't want to dig into old inaccurate reference manuals to find this out, he thought of a program that automatically detects what a SQL server has to offer. He also thought it would be a nice test of how stable the MySQL code is when you start to send it “abnormal” queries.

While working with the benchmarks, we needed a list of capabilities for all supported databases. Since doing this by hand was very tedious, we made a tool to do it automatically. While trying early versions of this tool on some different servers, bad things happened—the servers crashed. All this crashing led us to name this tool crash-me. In fact, the only SQL server that has gone through this testing without crashing is Oracle. Of course, all bugs found in MySQL were fixed immediately.

crash-me (and the benchmarks) are implemented as a Perl DBI/DBD program that sends thousands of queries to a database to find out how things work in real life. By doing this, it finds many limits in the server, such as the supported column/query/variable/index lengths.

crash-me is also a nice tool for helping you write portable SQL, since it can provide a list of which functions, types and limits exist in the server you wish to use. Currently, we have crash-me results from Adabas-D, Access, DB2, Empress, Informix, MS-SQL, MySQL, Oracle, PostgreSQL, Solid and Sybase. As the crash-me table is big and very detailed, we will not include it here, but it is available at http://www.mysql.com/crash-me-choose.htmy/.

What operations does the benchmark test? First and foremost, the basic SQL operations are tested: INSERT, UPDATE, DELETE and SELECT. Other tests include a connect followed by a select, and creation of tables and indexes.

The individual tests should give a good indication of how fast an SQL server is for that specific operation. Do not use the “total time” as an overall measure of the value of an SQL server. This is because the tests are not weighted against each other. Some tests are run more times with different options and numbers of rows. An SQL server may be extremely bad at some “unimportant things”, while it's very good at the things for which you actually intend to use it.

We use the total time to compare things like the same database engine on different operating systems. We also use it to see how new versions of MySQL stack up against old ones.

Since all benchmark tables take even more space than the crash-me results, we include only a few observations on how well MySQL runs on different platforms.

  • Linux 2.2 is much faster than Linux 2.0 on a multi-CPU machine. This is because the Perl and the MySQL server run on different processors and the new SMP code is faster.

  • Linux is 7% faster than Windows 98 and 49% faster than NT on the same machine.

  • Windows 98 is 27% faster than NT on the same machine.

  • A Pentium II 400MHz machine running Linux 2.2 is much faster than a Sun Ultrasparc 2/CPU 400MHz machine running Solaris 2.7. The primary reason for this difference is that Linux caches the file system much better than Solaris; this result might be different under higher load. We will include a threaded test in the next generation of benchmarks to test things like this.

  • If you do many inserts on Solaris, you will get only a 22% speed increase by upgrading your processor speed by 100%.

  • The overhead of using MyODBC, and probably any ODBC driver, is at least 19%.

Note that while benchmarking, it was still possible to work on the Linux machine without any problems. However, NT became so slow that it was impossible to do any other work, even simple editing. It took about 30 times longer to start up a new DOS window, and we had to wait 10 seconds or so before typed characters showed up.

There are still many things to be done for both crash-me and the benchmarks. For example, we would like crash-me to report if there are identical functions that do the same thing (such as, instead of CONCAT one can use “||”). Also, many new tests should be added to test which sub-select constructs an SQL server allows. Of course, the documentation and presentation of the results could be much improved.

Both these tools give invaluable information to any developer who uses more than one SQL server. If they do not test the feature you need, please contribute a new test. More test results can be found at www.mysql.com/benchmark.html.

Why MySQL Is Fast

If you checked out the benchmark page above, you will notice that MySQL is many times faster than the competition on almost everything. Why is this so? Some of the speed comes from things that are not supported in MySQL, such as transactions, foreign keys and triggers.

Because the MySQL server is coded mainly by one person with many years of coding experience, very little redundant code is in it. Most of the basic algorithms also come from an era of slow CPUs and small amounts of memory. The algorithms have mostly been extended to use larger caches if there is available memory.

As a result, MySQL has a compact fast design (the code size of the server is less than 1MB on an i386) which normally uses very little memory, but can be configured to take advantage of large amounts of memory.

MySQL has many useful optimizations for maximum speed. Some examples are:

  • Most joins are done in one sweep.

  • Very few normal joins require temporary tables. Joins involving a GROUP BY and an ORDER BY on something other than the GROUP BY part will create a (usually memory-based) temporary table.

  • Memory-based temporary tables have dynamic hashing.

  • MySQL has a user configurable key cache and a record cache to quickly scan tables. Open tables are cached in a table cache.

  • An index optimizer quickly finds which possible index to use with a specific WHERE clause.

  • A range optimizer finds the range for an index which will then be used to find the matching rows. The range optimizer can optimize queries that use a combination of >=, >, =, <=, < and LIKE (of type “keyword%”) on a key column. When many possible indexes can be used, the range optimizer will choose the index that matches the smallest number of rows.

  • A join optimizer rearranges the tables in a SELECT statement in the optimal order. In the rare case the MySQL join optimizer gets this wrong, one can force the optimizer to join the tables in a particular order with the STRAIGHT_JOIN keyword.

  • For each sub-join, a simple WHERE is constructed to get a fast WHERE evaluation for each sub-join and to skip records as soon as possible.

  • A WHERE optimizer removes constant conditions.

  • Early detection finds invalid constant expressions. MySQL quickly detects that some SELECT statements are impossible and returns no rows.

  • SQL functions are implemented through a highly optimized class library. Almost all parsing and calculating is done in a local memory store. No memory overhead is needed for small items, and the normal slow memory allocation and freeing is avoided. Memory is allocated only for unexpectedly large strings by using malloc and free.

  • In queries of type

    SELECT ... WHERE col IN (a,b,c,d,e,f,...)
    

  • the IN part is sorted and checked through binary searching.

  • MIN/MAX on an indexed column is done with one key fetch.The MIN/MAX optimizer can also find the best value when one has specified all preceding index columns in the WHERE condition:

    SELECT MAX(index_part2) FROM tbl_name WHERE
    index_part_1=constant;
    

  • COUNT(*) on a single table without a WHERE is retrieved directly from the table information.

  • If all used columns for some table are numeric and form a leftmost prefix for some key, the values may be retrieved from the index tree rather than the data file for greater speed.

  • Indexes are used to find rows matching

    WHERE index_column LIKE "prefix%".
    

  • GROUP BY and DISTINCT are optimized by creating a temporary HEAP table with the GROUP BY (or whole row) as a unique index.

  • INSERT DELAYED inserts rows in a queue to the SQL server, which is useful if you are using MySQL for logging and can't afford to wait until the INSERT is completed.

  • Index compression is used to get smaller and faster indexes.

No Transactions and Many Simultaneous Users

One of the design decisions that has resulted in the greatest number of questions is the lack of transactions. There are, of course, applications that require transactions to work, but a wide range of applications work very well without them.

Some people believe that since MySQL does not support transactions, it cannot support many simultaneous users. Each MySQL client gets a dedicated thread in the MySQL server, which allows different users to access the same tables at the same time. All MySQL operations are atomic: no other users can change the result for a running query.

When designing MySQL, we had a greater need for speed than for transactions. It's no use having transactions if the SQL server becomes so slow it's unusable for what you need to get done.

Another common misunderstanding is that transactions provide extra robustness through the redo logs. This extra security can be had by simple means in MySQL. That is, do normal backups and then apply the update log to the backup. The update log is a file containing all SQL statements that change any data.

MySQL also has external utility programs to check, optimize and repair individual tables.

MySQL User Base

As MySQL has about 50 mirrors over the world, and we don't get download statistics from them, it's hard to tell how many MySQL installations are out there.

The WWW and FTP log at http://www.mysql.com/ gives us the information shown in Tables 3 and 4; all counts are based on the number of distinct IPs.

Table 3.

Table 4.

In the Linux community, many sites use MySQL as a back end for dynamic web pages. Among those are http://slashdot.org/, http://freshmeat.net/ and http://www.linux.com/.

On linux.com, every page does somewhere between 10 and 20 queries to the database. And linux.com does anywhere between 500K and 800K page views per day. They run MySQL on its own server, a dual Xeon system with huge amounts of RAM and hard-disk space.

While writing this, I asked Linux Journal what they use as a web back end, and learned they also use MySQL. Among the awards we have been given, we highly value the “Most Used Database” 1998 award we got from Linux Journal's readers.

Where MySQL Is Today

  • Client/server

  • Multi-threaded, multi-user and very fast

  • APIs to many different languages

  • A good, free ODBC driver

  • Very portable

  • Many different column types which support all ANSI 92 and all ODBC 2.50 types as well as some new ones

  • Support for almost all ODBC 3.0 and SQL ANSI92 functions

  • Full support for SQL GROUP BY and ORDER BY clauses; support for group functions (COUNT, AVG, STD, SUM, MAX and MIN)

  • Ability to mix tables from different databases in the same query

  • Very flexible privilege system where privilege is based on host and user

  • Support for LEFT OUTER JOIN with both ANSI SQL and ODBC syntax

  • Fixed-length and variable-length records

  • Handles large databases; at TcX, we are using MySQL with some databases that contain over 50 million records.

  • Very robust with no memory leaks; all reported memory leaks have been in non-MySQL libraries, most notably some versions of glibc.

  • Ability to configure many different character sets, e.g., Japanese/Chinese

  • Error messages available in many languages

  • Many utilities and much contributed software

  • MySQL is extensively documented. Most questions can be resolved by reading the MySQL manual. We try to document everything to avoid getting too many questions on the MySQL mailing lists. The current manual has recently been improved considerably, thanks to the great work done by Paul DuBois.

  • Many small, extremely useful extensions that help you get your work done

New in the Latest Development Version (3.23)

  • Binary portable table format—it is now possible to copy MySQL table files between different architectures.

  • More and longer indexes—maximum is 32 which can be 500 bytes long (16/128 previously).

  • Even better index compression—it is faster and uses even less disk space.

  • Indexes on BLOB/TEXT columns just like a CHAR column.

  • Support for tables greater than 4GB on file systems which support files that big. The new limit is about 9 million terrabytes.

  • Has better fragmentation handling for the dynamic row format.

  • Added in-memory tables with hashed keys—an extremely fast way to have lookup tables.

  • Allows true floating-point columns with values such as 1.0E+10.

  • Includes example C code for a procedure that analyses the result from a SELECT.

  • Faster SELECT DISTINCT handling has been added.

  • Added much useful information in SHOW TABLE STATUS.

  • CREATE TABLE (...) SELECT * from a,c where something. This creates a table using data from a SELECT in one step. The data types and field names are automatically generated from the SELECT.

  • Removed the old limitation with big GROUP BY queries (with SQL_BIG_TABLES=0) that resulted in a “table is full” error.

  • Loads BLOBS from files with the LOAD_FILE function.

  • COUNT(DISTINCT) is supported.

Future

We have no intention of stopping development of MySQL. Over time, MySQL will be 100% ANSI 92 compatible. As we still want MySQL to be fast, we will always give the user the option of specifying the removal of features which make a normal SQL server slow.

As an example, the GRANT system will not have any speed impact unless you use this to restrict table or column access.

The current “TODO” list can be found in the MySQL manual at www.mysql.com/doc.html. Everything in this list is in the order we plan to implement it.

The MySQL License

We have worked many years with GNU/BSD and other programs from the Net and have always believed that programs should be available in source. Because of this, we chose to use the same license as Aladdin GhostScript for the MySQL server on UNIX, and we made the client's completely free.

By the time this article reaches publication, there should be an old version of MySQL (3.20) with a GPL copyright available. We will continue releasing old versions under the GPL.

This means that for normal (even commercial) internal use on UNIX systems, MySQL costs nothing. You do not have to pay us if you do not want to. A license is required only if:

  1. You sell the MySQL server directly or as a part of another product or service.

  2. You charge for installing and maintaining a MySQL server at some client's site.

  3. You include MySQL in a distribution that is non-redistributable, and you charge for some part of that distribution.

  4. You use MySQL on a Win32 (Windows 95, 98, NT or Windows 2000) system.

For circumstances under which a MySQL license is required, you need a license per machine that runs the mysqld server. However, a multiple-CPU machine needs only one license, and there is no restriction on the number of MySQL servers that can run on one machine, or on the number of clients concurrently connected to a server running on that machine.

The following points set forth the philosophy behind our licensing policy:

  • The SQL client library should be totally free so that it can be included in commercial products without limitation.

  • People who want free access to the software into which we have put much work can have it, as long as they do not try to make money directly by distributing it for profit.

  • People who want the right to keep their own software proprietary, but also want the value from our work, can pay for the privilege.

  • That means normal in-house use is free. But if you use MySQL for something important to you, you may want to help further its development by purchasing a support contract or by contributing documentation, code samples or something else.

  • Our policy is that no one should have to pay for normal upgrades. In the future, we may require a new license for major upgrades with major new features (like transaction support). This means that in the long run, MySQL will be a very good investment compared to other databases.

As Win32 is a highly commercial OS with very high development costs (and development pains), we see no other alternative than to provide MySQL-Win32 only to paying customers, users who have helped us with MySQL in some way or users who think they can contribute to any part of MySQL. If we did this in any other way, we could not afford to continue developing MySQL on Win32 or even keep this version up to date with the UNIX version. In effect, we let users who run Win32 pay for the development of tools of our other operating systems.

David Axmark (david@detron.se) lives in Uppsala, Sweden with his plants and computers. He has been working as a software consultant for over 15 years.

Michael Widenius lives in Helsingfors, Finland with his wife and his two kids My and Max. He also has been working as a software consultant for over 15 years.

Among the things both authors have worked with are software for a one-card computer used by power companies, a video-rental system, a state-of-the-art market research system, advanced business graphics, a word processor that could handle Z80 Assembler+Basic, and a full operating system for an 8-bit computer (Z80) and many other other projects.

LJ Archive