Book HomePHP CookbookSearch this book

18.26. Reading and Writing Compressed Files

18.26.1. Problem

You want to read or write compressed files.

18.26.2. Solution

Use PHP's zlib extension to read or write gzip'ed files. To read a compressed file:

$zh = gzopen('file.gz','r') or die("can't open: $php_errormsg");
while ($line = gzgets($zh,1024)) {
    // $line is the next line of uncompressed data, up to 1024 bytes 
}
gzclose($zh) or die("can't close: $php_errormsg");

Here's how to write a compressed file:

$zh = gzopen('file.gz','w') or die("can't open: $php_errormsg");
if (-1 == gzwrite($zh,$s))   { die("can't write: $php_errormsg"); }
gzclose($zh)                or die("can't close: $php_errormsg");

18.26.3. Discussion

The zlib extension contains versions of many file-access functions, such as fopen( ), fread( ), and fwrite( ) (called gzopen( ) , gzread( ), gzwrite( ), etc.) that transparently compress data when writing and uncompress data when reading. The compression algorithm that zlib uses is compatible with the gzip and gunzip utilities.

For example, gzgets($zp,1024) works like fgets($fh,1024). It reads up to 1023 bytes, stopping earlier if it reaches EOF or a newline. For gzgets( ), this means 1023 uncompressed bytes.

However, gzseek( ) works differently than fseek( ). It only supports seeking a specified number of bytes from the beginning of the file stream (the SEEK_SET argument to fseek( )). Seeking forward (from the current position) is only supported in files opened for writing (the file is padded with a sequence of compressed zeroes). Seeking backwards is supported in files opened for reading, but it is very slow.

The zlib extension also has some functions to create compressed strings. The function gzencode( ) compresses a string and gives it the correct headers and formatting to be compatible with gunzip. Here's a simple gzip program:

$in_file = $_SERVER['argv'][1];
$out_file = $_SERVER['argv'][1].'.gz';

$ifh = fopen($in_file,'rb')  or die("can't open $in_file: $php_errormsg");
$ofh = fopen($out_file,'wb') or die("can't open $out_file: $php_errormsg");

$encoded = gzencode(fread($ifh,filesize($in_file)))
                             or die("can't encode data: $php_errormsg");

if (-1 == fwrite($ofh,$encoded)) { die("can't write: $php_errormsg"); }
fclose($ofh)                 or die("can't close $out_file: $php_errormsg");
fclose($ifh)                 or die("can't close $in_file: $php_errormsg");

The guts of this program are the lines:

$encoded = gzencode(fread($ifh,filesize($in_file)))
                             or die("can't encode data: $php_errormsg);
if (-1 == fwrite($ofh,$encoded)) { die("can't write: $php_errormsg"); }

The compressed contents of $in_file are stored in $encoded and then written to $out_file with fwrite( ).

You can pass a second argument to gzencode( ) that indicates compression level. Set no compression with 0 and maximum compression with 9. The default level is 1. To adjust the simple gzip program for maximum compression, the encoding line becomes:

$encoded = gzencode(fread($ifh,filesize($in_file)),9)
                             or die("can't encode data: $php_errormsg);

You can also compress and uncompress strings without the gzip-compatibility headers by using gzcompress( ) and gzuncompress( ).

18.26.4. See Also

Section 18.27 for a program that extracts files from a ZIP archive; documentation on the zlib extension at http://www.php.net/zlib; you can download zlib at http://www.gzip.org/zlib/; the zlib algorithm is detailed in RFCs 1950 (http://www.faqs.org/rfcs/rfc1950.html) and 1951 (http://www.faqs.org/rfcs/rfc1951.html).



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.