JavaScript: The Definitive GuideJavaScript: The Definitive GuideSearch this book

10.3. The RegExp Object

As mentioned at the beginning of this chapter, regular expressions are represented as RegExp objects. In addition to the RegExp( ) constructor, RegExp objects support three methods and a number of properties. An unusual feature of the RegExp class is that it defines both class (or static) properties and instance properties. That is, it defines global properties that belong to the RegExp( ) constructor as well as other properties that belong to individual RegExp objects. RegExp pattern-matching methods and properties are described in the next two sections.

The RegExp( ) constructor takes one or two string arguments and creates a new RegExp object. The first argument to this constructor is a string that contains the body of the regular expression -- the text that would appear within slashes in a regular expression literal. Note that both string literals and regular expressions use the \ character for escape sequences, so when you pass a regular expression to RegExp( ) as a string literal, you must replace each \ character with \\. The second argument to RegExp( ) is optional. If supplied, it indicates the regular expression flags. It should be g, i, m, or a combination of those letters. For example:

// Find all five digit numbers in a string. Note the double \\ in this case.
var zipcode = new RegExp("\\d{5}", "g"); 

The RegExp( ) constructor is useful when a regular expression is being dynamically created and thus cannot be represented with the regular expression literal syntax. For example, to search for a string entered by the user, a regular expression must be created at runtime with RegExp( ).

10.3.1. RegExp Methods for Pattern Matching

RegExp objects define two methods that perform pattern-matching operations; they behave similarly to the String methods described earlier. The main RegExp pattern-matching method is exec( ). It is similar to the String match( ) method described above, except that it is a RegExp method that takes a string, rather than a String method that takes a RegExp. The exec( ) method executes a regular expression on the specified string. That is, it searches the string for a match. If it finds none, it returns null. If it does find one, however, it returns an array just like the array returned by the match( ) method for nonglobal searches. Element 0 of the array contains the string that matched the regular expression, and any subsequent array elements contain the substrings that matched any parenthesized subexpressions. Furthermore, the index property contains the character position at which the match occurred, and the input property refers to the string that was searched.

Unlike the match( ) method, exec( ) returns the same kind of array whether or not the regular expression has the global g flag. Recall that match( ) returns an array of matches when passed a global regular expression. exec( ), by contrast, always returns a single match and provides complete information about that match. When exec( ) is called for a regular expression that has the g flag, it sets the lastIndex property of the regular expression object to the character position immediately following the matched substring. When exec( ) is invoked a second time for the same regular expression, it begins its search at the character position indicated by the lastIndex property. If exec( ) does not find a match, it resets lastIndex to 0. (You can also set lastIndex to 0 at any time, which you should do whenever you quit a search before you find the last match in one string and begin searching another string with the same RegExp object.) This special behavior allows us to call exec( ) repeatedly in order to loop through all the regular expression matches in a string. For example:

var pattern = /Java/g;
var text = "JavaScript is more fun than Java!";
var result;
while((result = pattern.exec(text)) != null) {
    alert("Matched `" + result[0] + "'" +
        " at position " + result.index +
        "; next search begins at " + pattern.lastIndex);
} 

The other RegExp method is test( ). test( ) is a much simpler method than exec( ). It takes a string and returns true if the string matches the regular expression:

var pattern = /java/i;
pattern.test("JavaScript");  // Returns true 

Calling test( ) is equivalent to calling exec( ) and returning true if the return value of exec( ) is not null. Because of this equivalence, the test( ) method behaves the same way as the exec( ) method when invoked for a global regular expression: it begins searching the specified string at the position specified by lastIndex, and if it finds a match, it sets lastIndex to the position of the character immediately following the match. Thus, we can loop through a string using the test( ) method just as we can with the exec( ) method.

The String methods search( ) , replace( ), and match( ) do not use the lastIndex property as exec( ) and test( ) do. In fact, the String methods simply reset lastIndex( ) to 0. If you use exec( ) or test( ) on a pattern that has the g flag set and you are searching multiple strings, you must either find all the matches in each string, so that lastIndex is automatically reset to zero (this happens when the last search fails), or you must explicitly set the lastIndex property to 0 yourself. If you forget to do this, you may start searching a new string at some arbitrary position within the string rather than from the beginning. Finally, remember that this special lastIndex behavior occurs only for regular expressions with the g flag. exec( ) and test( ) ignore the lastIndex property of RegExp objects that do not have the g flag.

10.3.2. RegExp Instance Properties

Each RegExp object has five properties. The source property is a read-only string that contains the text of the regular expression. The global property is a read-only boolean value that specifies whether the regular expression has the g flag. The ignoreCase property is a read-only boolean value that specifies whether the regular expression has the i flag. The multiline property is a read-only boolean value that specifies whether the regular expression has the m flag. The final property is lastIndex, a read-write integer. For patterns with the g flag, this property stores the position in the string at which the next search is to begin. It is used by the exec( ) and test( ) methods, as described in the previous section.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.