Perl CookbookPerl CookbookSearch this book

1.17. Reformatting Paragraphs

1.17.1. Problem

Your string is too big to fit the screen, and you want to break it up into lines of words, without splitting a word between lines. For instance, a style correction script might read a text file a paragraph at a time, replacing bad phrases with good ones. Replacing a phrase like utilizes the inherent functionality of with uses will change the length of lines, so it must somehow reformat the paragraphs when they're output.

1.17.2. Solution

Use the standard Text::Wrap module to put line breaks at the right place:

use Text::Wrap;
@output = wrap($leadtab, $nexttab, @para);

Or use the more discerning CPAN module, Text::Autoformat, instead:

use Text::Autoformat;
$formatted = autoformat $rawtext;

1.17.3. Discussion

The Text::Wrap module provides the wrap function, shown in Example 1-3, which takes a list of lines and reformats them into a paragraph with no line more than $Text::Wrap::columns characters long. We set $columns to 20, ensuring that no line will be longer than 20 characters. We pass wrap two arguments before the list of lines: the first is the indent for the first line of output, the second the indent for every subsequent line.

Example 1-3. wrapdemo

  #!/usr/bin/perl -w
  # wrapdemo - show how Text::Wrap works
  @input = ("Folding and splicing is the work of an editor,",
            "not a mere collection of silicon",
            "and",
            "mobile electrons!");
  use Text::Wrap qw($columns &wrap);
  $columns = 20;
  print "0123456789" x 2, "\n";
  print wrap("    ", "  ", @input), "\n";

The result of this program is:

01234567890123456789
    Folding and
  splicing is the
  work of an
  editor, not a
  mere collection
  of silicon and
  mobile electrons!

We get back a single string, with newlines ending each line but the last:

# merge multiple lines into one, then wrap one long line
use Text::Wrap;
undef $/;
print wrap('', '', split(/\s*\n\s*/, <>));

If you have the Term::ReadKey module (available from CPAN) on your system, you can determine your window size so you can wrap lines to fit the current screen size. If you don't have the module, sometimes the screen size can be found in $ENV{COLUMNS} or by parsing the output of the stty(1) command.

The following program tries to reformat both short and long lines within a paragraph, similar to the fmt(1) program, by setting the input record separator $/ to the empty string (causing <> to read paragraphs) and the output record separator $\ to two newlines. Then the paragraph is converted into one long line by changing all newlines and any surrounding whitespace to single spaces. Finally, we call the wrap function with leading and subsequent tab strings set to the empty string so we can have block paragraphs.

use Text::Wrap      qw(&wrap $columns);
use Term::ReadKey   qw(GetTerminalSize);
($columns) = GetTerminalSize( );
($/, $\)  = ('', "\n\n");   # read by paragraph, output 2 newlines
while (<>) {                # grab a full paragraph
    s/\s*\n\s*/ /g;         # convert intervening newlines to spaces
    print wrap('', '', $_); # and format
}

The CPAN module Text::Autoformat is much more clever. For one thing, it tries to avoid "widows," that is, very short lines at the end. More remarkably, it correctly copes with reformatting paragraphs that have multiple, deeply nested citations. An example from that module's manpage shows how the module can painlessly convert:

In comp.lang.perl.misc you wrote:
: > <CN = Clooless Noobie> writes:
: > CN> PERL sux because:
: > CN>    * It doesn't have a switch statement and you have to put $
: > CN>signs in front of everything
: > CN>    * There are too many OR operators: having |, || and 'or'
: > CN>operators is confusing
: > CN>    * VB rools, yeah!!!!!!!!!
: > CN> So anyway, how can I stop reloads on a web page?
: > CN> Email replies only, thanks - I don't read this newsgroup.
: >
: > Begone, sirrah! You are a pathetic, Bill-loving, microcephalic
: > script-infant.
: Sheesh, what's with this group - ask a question, get toasted! And how
: *dare* you accuse me of Ianuphilia!

into:

In comp.lang.perl.misc you wrote:
: > <CN = Clooless Noobie> writes:
: > CN> PERL sux because:
: > CN>    * It doesn't have a switch statement and you
: > CN>      have to put $ signs in front of everything
: > CN>    * There are too many OR operators: having |, ||
: > CN>      and 'or' operators is confusing
: > CN>    * VB rools, yeah!!!!!!!!! So anyway, how can I
: > CN>      stop reloads on a web page? Email replies
: > CN>      only, thanks - I don't read this newsgroup.
: >
: > Begone, sirrah! You are a pathetic, Bill-loving,
: > microcephalic script-infant.
: Sheesh, what's with this group - ask a question, get toasted!
: And how *dare* you accuse me of Ianuphilia!

simply via print autoformat($badparagraph). Pretty impressive, eh?

Here's a miniprogram that uses that module to reformat each paragraph of its input stream:

use Text::Autoformat;
$/ = '';
while (<>) {
    print autoformat($_, {squeeze => 0, all => 1}), "\n";
}

1.17.4. See Also

The split and join functions in perlfunc(1) and Chapter 29 of Programming Perl; the manpage for the standard Text::Wrap module; the CPAN module Term::ReadKey, and its use in Recipe 15.6 and the CPAN module Text::Autoformat



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.