Book HomeMySQL and mSQLSearch this book

6.3. SQL Datatypes

In a table, each column has a type. As we mentioned earlier, a SQL datatype is similar to a datatype in traditional programming languages. While many languages define a bare-minimum set of types necessary for completeness, SQL goes out of its way to provide types such as MONEY and DATE that will be useful to every day users. You could store a MONEY type in a more basic numeric type, but having a type specifically dedicated to the nuances of money processing helps add to SQL's ease of use -- one of SQL's primary goals.

Chapter 15, "SQL Reference", provides a full reference of SQL types supported by MySQL or mSQL. Table 6-1 is an abbreviated listing of the most common types supported in both languages.

Table 6-1. The Most Often Used Datatypes Common to Both MySQL and mSQL

Datatype

Description

INT

An integer value. MySQL allows an INT to be either signed or unsigned, while mSQL provides a distinct type, UINT, for unsigned integers.

REAL

A floating point value. This type offers a greater range and precision than the INT type, but it does not have the exactness of an INT.

CHAR(length)

A fixed-length character value. No CHAR fields can hold strings greater in length than the specified value. Fields of lesser length are padded with spaces. This type is likely the most commonly used type in any SQL implementation.

TEXT(length)

A variable length character value. In mSQL, the given length is used as a suggestion as to how long the strings being stored will be. You may store larger values, but at a performance cost. Under MySQL, TEXT is just one of many variable-length datatypes.

DATE

A standard date value. While the format for storing a date differs between MySQL and mSQL, both database engines are capable of using the DATE type to store arbitrary dates for the past, present, and future. Both database engines are Y2K compliant in their date storage.

TIME

A standard time value. This type stores the time of day independent of a particular date. When used together with a date, a specific date and time can be stored. MySQL additionally supplies a DATETIME type that will store date and time together in one field.

NOTE

MySQL supports the UNSIGNED attribute for all numeric types. This modifier forces the column to accept only positive (unsigned) numbers. Unsigned fields have an upper limit that is double that of their signed counterparts. An unsigned TINYINT -- MySQL's single byte numeric type -- has a range of to 255 instead of the -127 to 127 range of its signed counterpart.

Both database engines provide more types than those mentioned above. MySQL, in particular, is very rich in the number of datatypes it supports. In day-to-day programming, however, you will find yourself using mostly the types mentioned earlier. With mSQL, choosing a datatype is pretty much as simple as picking the type that most closely resembles the data you want to store. The size of the data you wish to store, however, plays a much larger role in designing MySQL tables.

6.3.1. Numeric Types

Before you create a table, you should have a good idea of what kind of data you wish to store in the table. Beyond obvious decisions about whether your data is character-based or numeric, you should know the approximate size of the data to be stored. If it is a numeric field, what is its maximum possible value? What is its minimum possible value? Could that change in the future? If the minimum is always positive, you should consider an unsigned type. You should always choose the smallest numeric type that can support your largest conceivable value. If, for example, we had a field that represented the population of a state, we would use an unsigned INT field. No state can have a negative population. Furthermore, in order for an unsigned INT field not to be able to hold a number representing a state's population, that state's population would have to be roughly the population of the entire Earth.

6.3.2. Character Types

Managing character types is a little more complicated. Not only do you have to worry about the minimum and maximum string lengths, but you also have to worry about the average size, the amount of variation likely, and the need for indexing. For our current purposes, an index is a field or combination of fields on which you plan to search -- basically, the fields in your WHERE clause. Indexing is, however, much more complicated than this simplistic description, and we will cover indexing later in the chapter. The important fact to note here is that indexing one character fields works best when the field is fixed length. In fact, mSQL does not even provide an indexible variable-length character field! If there is little -- or, preferably, no -- variation in the length of your character-based fields, then a CHAR type is likely the right answer. An example of a good candidate for a CHAR field is a country code. The ISO provides a comprehensive list of standard two-character representations of country codes (US for the U.S.A., FR for France, etc.).[9] Since these codes are always exactly two characters, a CHAR(2) is always the right answer for this field.

[9]Don't be lulled into believing states/provinces work this way. If you want to write an application that works in an international environment and stores state/province codes, make sure to make it a CHAR(3) since Australia uses three-character state codes. Also note that there is a 3-character ISO country-code standard.

A value does not need to be invariant in its length to be a candidate for a CHAR field. It should, however, have very little variance. Phone numbers, for example, can be stored safely in a CHAR(13) field even though phone number length varies from nation to nation. The variance simply is not that great, so there is no value to making a phone number field variable in length. The important thing to keep in mind with a CHAR field is that no matter how big the actual string being stored is, the field always takes up exactly the number of characters specified as the field's size -- no more, no less. Any difference between the length of the text being stored and the length of the field is made up by padding the value with spaces. While the few potential extra characters being wasted on a subset of the phone number data is not anything to worry about, you do not want to be wasting much more. Variable-length text fields meet this need.

A good, common example of a field that demands a variable-length datatype is a web URL. Most web addresses can fit into a relatively small amount of space -- http://www.ora.com, http://www.hughes.com.au, http://www.mysql.com -- and consequentially do not represent a problem. Occasionally, however, you will run into web addresses like:

http://www.winespectator.com/Wine/Spectator/_notes|5527293926834323221480431354?Xv11=&Xr5=&Xv1=&type-region-search-code=&Xa14=flora+springs&Xv4=.

If you construct a CHAR field large enough to hold that URL, you will be wasting a significant amount of space for most every other URL being stored. Variable-length fields let you define a field length that can store the odd, long-length value while not wasting all that space for the common, short-length values. MySQL and mSQL each take separate approaches to this problem.

6.3.2.1. Variable-length character fields in MySQL

If you are using only mSQL, you can skip this section. The advantage of variable-length text fields under MySQL is that such fields use precisely the minimum storage space required to store an individual field. A VARCHAR(255) column that holds the string "hello world," for example, only takes up 12 bytes (one byte for each character plus an extra byte to store the length).

NOTE

In opposition to the ANSI standard, VARCHAR in MySQL fields are not padded. Any extra spaces are removed from a value before it is stored.

You cannot store strings whose lengths are greater than the field length you have specified. With a VARCHAR(4) field, you can store at most a string with 4 characters. If you attempt to store the string "happy birthday," MySQL will truncate the string to "happ." The downside of the MySQL approach to variable-length text fields over the mSQL approach is that there is no way to store the odd string that exceeds your designated field size. Table 6-1 shows the storage space required to store the 144 character Wine Spectator URL shown above along with an average-sized 30 character URL.

Table 6-1. The Storage Space Required by the Different MySQL Character Types

Datatype

Storage for a 144 Character String

Storage for a 30 Character String

Maximum String Size

CHAR(150)

150

150

255

VARCHAR(150)

145

31

255

TINYTEXT(150)

145

31

255

TEXT(150)

146

32

65535

MEDIUMTEXT(150)

147

33

16777215

LONGTEXT(150)

148

34

4294967295

If, after years of uptime with your database, you find that the world has changed and a field that once comfortably existed as a VARCHAR(25) now must be able to hold strings as long as 30 characters, you are not out of luck. MySQL provides a command called ALTER TABLE that enables you to redefine a field type without losing any data.

ALTER TABLE mytable MODIFY mycolumn LONGTEXT

6.3.2.2. Variable-length character fields in mSQL

You can skip this section if you are only interested in MySQL. Variable-length character fields in mSQL enable you to define a field's length to be the size of the average character string length it will hold. While every value you insert into this field will still take up at least the amount you specify, it can hold more. The database does this by creating an overflow table to hold the extra data. The downside of this approach comes in the form of performance and the inability to index variable-length fields.

Let's take a moment to examine the impact of different design choices with mSQL. In order to store all of the above URLs in a CHAR field, we would need to have a CHAR(144) column. Under this scenario, the four URLs in question would take up 576 bytes (144x3), even though you are only actually storing 216 bytes of data. The other 360 bytes is simply wasted space. If you multiple that times thousands or millions of rows, you can easily see how this becomes a serious problem. Using a variable-length TEXT(30) field, however, only 234 bytes (30x3+144) are required to store the 216 bytes of data. Only 18 bytes are wasted. That is a 41% savings!

6.3.3. Binary Datatypes

mSQL has no support for binary data. MySQL, on the other hand, provides a set of binary datatypes that closely mirror their character counterparts. The MySQL binary types are CHAR BINARY, VARCHAR BINARY , TINYBLOB , BLOB , MEDIUMBLOB, and LONGBLOB . The practical distinction between character types and their binary counterparts is the concept of encoding. Binary data is basically just a chunk of data that MySQL makes no effort to interpret. Character data, on the other hand, is assumed to represent textual data from human alphabets. It thus is encoded and sorted based on rules appropriate to the character set in question. Specifically, MySQL sorts binary in a case-insensitive, ASCII order.

6.3.4. Enumerations and Sets

MySQL provides two other special kinds of types with no mSQL analog. The ENUM type allows you specify at table creation a list of possible values that can be inserted into that field. For example, if you had a column named fruit into which you wanted to allow only "apple," "orange," "kiwi," or "banana," you would assign this column the type ENUM:

CREATE TABLE meal(meal_id INT NOT NULL PRIMARY KEY,
                  fruit ENUM(`apple', `orange', `kiwi',
                             `banana'))

When you insert a value into that column, it must be one of the specified fruits. Because MySQL knows ahead of time what valid values are for the column, it can abstract them to some underlying numeric type. In other words, instead of storing "apple" in the column as a string, it stores it as a single byte number. You just use "apple" when you call the table or when you view results from the table.

The MySQL SET type works in the same way, except it lets you store multiple values in a field at the same time.

6.3.5. Other Kinds of Data

Every piece of data you will ever encounter can be stored using numeric or character types. Technically, you could even store numbers as character types. Just because you can do so, however, does not mean that you should do so. Consider, for example, storing money in the database. You could store that as an INT or a REAL. While a REAL might seem more intuitive -- money requires decimal places, after all -- an INT actually makes more sense. With floating point values like REAL fields, it is often impossible to capture a number with a specific decimal value. If, for example, you insert the number 0.43 to represent $0.43, MySQL and mSQL may store that as 0.42999998. This small difference can be problematic when applied to a large number of mathematical operations. By storing the number as an INT and inserting the decimal into the right place, you can be certain that the value represents exactly what you intend it to represent.

Isn't all of that a major pain? Wouldn't it be nice if MySQL and mSQL provided some sort of datatype specifically suited to money values? MySQL and, to a lesser degree, mSQL both provide special datatypes to handle special kinds of data. MONEY is an example of one of these kinds of data. DATE is another. For a full description of all datatypes, see Chapter 17, "MySQL and mSQL Programs and Utilities".



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.