Book HomeProgramming the Perl DBISearch this book

6.2. Handling LONG/LOB Data

The DBI requires some additional information to allow you to query back LONG/LOB (long/large object) datatypes from a database. As we discussed earlier in the section on the LongReadLen and LongTruncLen attributes, the DBI is unable to determine how large a buffer to allocate when fetching columns containing LOB data. Therefore, we cannot simply issue a SELECT statement and expect it to work.

Selecting LOB data is straightforward and essentially identical to selecting any other column of another datatype, with the important exception that you should set at least the LongReadLen attribute value prior to preparing the statement that will return the LOB. For example:

### We're not expecting binary data of more than 512 KB...
$dbh->{LongReadLen} = 512 * 1024;

### Select the raw media data from the database
$sth = $dbh->prepare( "
            SELECT mega.name, med.media_data
            FROM megaliths mega, media med
            WHERE mega.id = med.megaliths_id
       " );
$sth->execute();
while ( ($name, $data) = $sth->fetchrow_array ) {
    ...
}

Without the all-important setting of LongReadLen, the fetchrow_array() call would likely fail when fetching the first row, because the default value for LongReadLen is very small -- typically 80 or less.

What happens if there's a rogue column in the database that is longer than LongReadLen? How would the code in the previous example cope? What would happen?

When the length of the fetched LOB data exceeds the value of LongReadLen, an error occurs unless you have set the LongTruncOk attribute to a true value. The DBI defaults LongTruncOk to false to ensure that accidental truncation is an error.

But there's a potential problem here if RaiseError is not enabled. How does the snippet of code above behave if it tries to fetch a row with a LOB field that exceeds the value of LongReadLen? The fetchrow_array() returns an empty list if there's an error when trying to fetch a row. But fetchrow_array() also returns an empty list when there's no more data to fetch. The while loop will simply end and any code following it will be executed. If the loop should have fetched 50 records it might stop after 45 if the 46th record was too big. Without error checking, you may never realize that you're missing some rows! The same applies to loops using other fetchrow methods such as fetchrow_hashref().

Few people remember to check for errors after fetch loops and that's a common cause of problems with code that handles LONG/LOB fields. Even when not handling special datatypes it's always a good idea to check for errors after fetch loops, or let the DBI do it for you by enabling RaiseError, as we discussed in Chapter 4, "Programming with the DBI ".

Getting back to our little snippet of code, let's assume that we are happy for values longer than LongReadLen to be silently truncated without causing an error. The following code stub would correctly handle this eventuality:

### We are interested in the first 512 KB of data
$dbh->{LongReadLen} = 512 * 1024;
$dbh->{LongTruncOk} = 1;    ### We're happy to truncate any excess

### Select the raw media data from the database
$sth = $dbh->prepare( "
            SELECT mega.name, med.media_data
            FROM megaliths mega, media med
            WHERE mega.id = med.megaliths_id
          " );
$sth->execute();
while ( ($name, $data) = $sth->fetchrow_arrayref ) {
    ...
}

The only change, apart from comments, is the addition of a line setting the LongTruncOk attribute to a true value.

The ability to truncate LOB data when overly large is quite useful for text and some forms of binary data, but not for others. Storing streaming media that is interpreted on a temporal basis doesn't unduly suffer from being truncated, as you will be able to view or listen to the stream up until the point of truncation. However, binary files such as ZIP files that store a checksum at the very end will be rendered useless when truncated. With this sort of data, it's not recommended that LongTruncOk be enabled, as it will allow truncated, and hence corrupted, data to be returned with no indication that there's a problem. In that situation, you won't be able to determine whether or not the column contains corrupted data, or if the column has been chopped by DBI. Caveat emptor!

One thing to be aware of when writing portable code to fetch LOB data from a database is that the format of that data may vary on a per-database and datatype basis. For example, in Oracle, a column with a LONG RAW datatype, rather than a simple LONG type, is passed to and from the database encoded as a pair of hexadecimal digits for each byte. So after fetching the hex string, you'd need to decode it using unpack("H*",...) to get the original binary value. For historical reasons, for these datatypes, the LongReadLen attribute refers to the length of the binary data, so hex-encoded strings up to twice that length may be fetched.

The DBI currently defines no way to fetch LONG/LOB values piece-wise, in other words, piece-by-piece. That means you're limited to fetching values that will fit into your available memory. It also means you can't stream the data out while still fetching it from the database. Some drivers do implement an unofficial blob_read() method, so take a look at your driver documentation if you need piece-wise fetches.

6.2.1. Inserting and Updating LONG/LOB Columns

Some databases let you insert into LONG/LOB columns using SQL statements with literal strings, like this:

INSERT INTO table_name (key_num, long_description) VALUES (42, '...')

Ignoring portability for the moment, that's fine for simple short textual strings, but soon runs into problems for anything else. Firstly, most databases have a limit on the maximum length of an SQL statement, and it's usually far shorter than the maximum length of a LONG/LOB column. Secondly, most databases have limits on which characters can be included in literal strings. The DBD driver's quote() method will do its best, but it's often not possible to put all possible binary data values into a string. Finally, coming back to portability, many databases are strict about data typing and just don't let you assign literal strings to LONG/LOB columns.

So how do we avoid these problems? Here's where placeholders come to our aid once again. We discussed placeholders in some detail in Chapter 5, "Interacting with the Database " so we'll only cover LONG/LOB issues here.

To use placeholders, we'd implement the statement above using the DBI as:

use DBI qw(:sql_types);

$sth = $dbh->prepare( "
    INSERT INTO table_name (key_num, long_description) VALUES (?, ?)
" );
$sth->bind_param( 1, 42 );
$sth->bind_param( 2, $long_description, SQL_LONGVARCHAR);
$sth->execute();

Passing SQL_LONGVARCHAR as the optional TYPE parameter to bind_ param() gives the driver a strong hint that you're binding a LONG/LOB type. Some drivers don't need the hint but it's always a good idea to include it.

The DBI currently defines no way to insert or update LONG/LOB values piece-wise, in other words, piece by piece. That means you're limited to handling values that will fit into your available memory.



Library Navigation Links

Copyright © 2001 O'Reilly & Associates. All rights reserved.