LJ Archive

Keeping Up with Python: the 2.2 Release

Wesley J. Chun

Issue #99, July 2002

Python 2.2 resolves some well known deficiencies of the language and introduces some new powerful constructs that are key strengths of other object-oriented languages.

Python 2.2 made its debut at the end of 2001, and its first bug-fix version 2.2.1 was recently released from the core developers at PythonLabs. The 2.2.x family is full of new features and capabilities, some considered significant additions and improvements to the language. These updates give Python developers a significant boost in terms of flexibility.

Python is a simple yet robust language combining the ease of scripting tools with the application-building power of compiled object-oriented programming languages. With Jython, the Java-compiled edition of the Python interpreter, Java programmers are discovering a tool that raises their productivity and development speed to a new level.

You can stay up-to-speed on these changes by reading the PEPs (Python Enhancement Proposals), which are created to give any reasonable idea an ear from the Python community. Before consideration is made for any update to the language, the problems and proposed solutions are presented along with the rationale for, and details behind, the change. Not only can you get the exact details on a PEP at the web site (see Resources), but you can also find out the status of a PEP. After reaching a consensus, a subset of PEPs is approved and slated for each release. For example, the changes in 2.2 (meaning the entire 2.2.x set of releases) consist primarily of five major PEPs: 234, 238, 252, 253 and 255.

For starters, 2.2 begins the process of unifying Python integers and long integers. Integer calculations can no longer raise overflow errors because they will automatically be cast into longs if the value overflows. Statically nested scopes, introduced in 2.1 and now standard, free Python from its restrictive two-scope model (see PEP 227). Previously, one had to put from __future__ import nested_scopes at the start of the script to enable nested scopes. Now that directive is no longer necessary as it has become standard. Unicode support has also been upgraded for UCS-4 (32-bit unsigned integers; see PEP 261). Minor updates to the Python Standard Library include a new e-mail package, a new XML-RPC module, the ability to add IPv6 support to the socket module and the new hot-shot profiler (PEP).

The most significant changes and additions to 2.2 are iterators and generators, changing the division operator and unifying types and classes.

Iterators

Iterators give the programmer the ability to traverse or “iterate through” elements of a data set. They are especially useful when the items of such sets are of differing types. Python has already simplified part of this programming process, as its sequence data types (lists, strings, tuples) are already heterogenous, and iterating through them is as simple as a “for” loop without having to create any special mechanism.

The new iteration support in Python works seamlessly with Python sequences but now also allows programmers to iterate through nonsequence types, including user-defined objects. An additional benefit is the improvement of iteration through other Python types.

Now, that all sounds good, but why iterators in Python? In particular, PEP 234 cites that the enhancement will:

  • Provide an extensible iterator interface.

  • Bring performance enhancements to list iteration.

  • Allow for big performance improvements in dictionary iteration.

  • Allow for the creation of a true iteration interface as opposed to overriding methods originally meant for random element access.

  • Be backward-compatible with all existing user-defined classes and extension objects that emulate sequences and mappings.

  • Result in more concise and readable code that iterates over nonsequence collections (mappings and files for instance).

Iterators can be created directly by using the new iter() built-in function or implicitly for objects that come with their own iteration interface. For example, lists have a built-in iteration interface, so “for eachItem in myList” will not change at all.

Calling iter(obj) returns an iterator for that type of object. An iterator has a single method, next(), that returns the next item in the set. A new exception, StopIteration, signals the end of the set.

Iterators do have restrictions, however. You can't move backward, go back to the beginning or copy an iterator. If you want to iterate over the same objects again (or simultaneously), you have to create another iterator object.

Sequences

As mentioned before, iterating through Python sequence types is as expected:

>>> myTuple = (123, 'xyz', 45.67)
>>> i = iter(myTuple)
>>> i.next()
123
>>> i.next()
'xyz'
>>> i.next()
45.67
>>> i.next()
Traceback (most recent call last):
  File "", line 1, in ?
StopIteration

If this had been an actual program, we would have enclosed the code inside a try-except block. Sequences now automatically produce their own iterators, so a “for” loop:

for i in seq:
    do_something_to(i)
under the covers now really behaves like this:
fetch = iter(seq)
while 1:
    try:
        i = fetch.next()
    except StopIteration:
        break
    do_something_to(i)
However, your code doesn't need to change because the “for” loop itself calls the iterator's next() method.

There is also another form of the iter() built-in function, iter(callable, sentinel), which returns an iterator as before. The difference is that each call to the iterator's next() method will invoke callable() to obtain successive values and raise StopIteration when the value sentinel is returned.

Dictionaries

Dictionaries and files are two other Python data types that received the iteration makeover. A dictionary's iterator traverses its keys. The idiom “for eachKey in myDict.keys()” can be shortened to “for eachKey in myDict”, as shown in Listing 1.

Listing 1. Looping through a Dictionary

In addition, three new built-in dictionary methods have been introduced to define the iteration: myDict.iterkeys() (iterate through the keys), myDict.itervalues() (iterate through the values) and myDict.iteritems() (iterate through key/value pairs). Note that the “in” operator has been modified to check a dictionary's keys. This means the Boolean expression myDict.has_key(anyKey) can be simplified as “anyKey in myDict”.

Files

File objects produce an iterator that calls the readline() method. Thus, they loop through all lines of a text file, allowing the programmer to replace essentially “for eachLine in myFile.readlines()” with the more simplistic “or eachLine in myFile”:

>>> myFile = open('config-win.txt')
>>> for eachLine in myFile:
...     print eachLine,   # comma suppresses extra \n
...
[EditorWindow]
font-name: courier new
font-size: 10
>>> myFile.close()

Classes

You can also create custom iterators for your own classes. This allows you to avoid the hack of overloading the __getitem__() special class method. Overloading __getitem__() implies the user can ask for any subscript in any order. But some objects do not logically allow this. Using an iterator rather than overloading __getitem__() makes explicit what the user can or cannot do.

To add iteration to your classes, override the __iter__() special method to return itself (making the object its own iterator). Then override the next() method:

def __iter__(self):
    return self
def next(self):
    # return next item or raise StopIteration

We can tweak our code for a similar example. This time, we choose to return a random element from the sequence (Listing 2). This example demonstrates some unusual things we can do with custom class iterations. One is infinite iteration. Because we read the sequence nondestructively, we never run out of elements, so we never need to raise StopIteration.

Listing 2. Custom Class Iterations

In Listing 3, we create an iterator object using our class, but rather than iterating through one item at a time, we give the next() method an argument telling how many items to return.

Listing 3. Creating an Iterator Object Using Our Class

Now let's try it out:

>>> a = AnyIter(range(10))
>>> i = iter(a)
>>> for j in range(1,5):
>>>     print j, ':', i.next(j)
1 : [0]
2 : [1, 2]
3 : [3, 4, 5]
4 : [6, 7, 8, 9]

Mutable Objects and Iterators

Before we move on to generators, remember that interfering with mutable objects while you are iterating them is not a good idea. This was a problem before iterators appeared. One popular example of this is to loop through a list and remove items from it if certain criteria are met (or not):

for eachURL in allURLs:
    if not eachURL.startswith('http://'):
        allURLs.remove(eachURL)            # YIKES!!

All sequences are immutable except lists, so the danger occurs only there. A sequence's iterator only keeps track of the Nth element you are on, so if you change elements around during iteration, those updates will be reflected as you traverse through the items. If you run out, then StopIteration will be raised, but you can continue with the iteration if you add items to the end and resume, as shown in Listing 4.

Listing 4. Iteration Example

When iterating through keys of a dictionary, you must not modify the dictionary. Using a dictionary's keys() method is okay because keys() returns a list that is independent of the dictionary.

But iterators are tied much more intimately with the actual object and will not let us play that game anymore:

>>> myDict = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
>>> for eachKey in myDict:
...   print eachKey, myDict[eachKey]
...   del myDict[eachKey]
...
a 1
Traceback (most recent call last):
  File "", line 1, in ?
RuntimeError: dictionary changed size during
              iteration

This will help prevent buggy code. For full details on iterators, see PEP 234.

Generators

Generators extend from the idea of iterators. The main motivation for generators, however, comes from a different angle: they allow saving state across function calls. Static variables, such as in C functions, have the ability to maintain their value across multiple calls of that function. This partially solves the state problem, but what would be really nice would be to yield a value just like an iterator but be able to freeze execution only to resume exactly where you left off when it is called again. This is exactly what generators do. They represent the idea of merging iteration with state along with functions that are resumable. When they do, they pick up right where they left off, keeping intact all the state information they need to deliver the next item. Note that we use the term yield here for two reasons: to hint that it's not a true return (along with frame object stack pop) and to introduce the new keyword yield.

For backward compatibility (in case there is code out there that uses yield as an identifier), you must include the “from __future__ import generators” directive to use generators. Generators will become standard soon (2.3?) so importing will not be a necessity. Generators behave in another manner similar to iterators: when a real return or end-of-function is reached and there are no more values to yield, a StopIteration exception is raised. Here's a simple example:

def simpleGen():
    yield 1
    yield '2 --> punch!'

Now that we have our function, let's call it and get a generator object:

>>> myG = simpleGen()
>>> myG.next()
1
>>> myG.next()
'2 --> punch!'
>>> myG.next()
Traceback (most recent call last):
  File "", line 1, in ?
    myG.next()
StopIteration
or more aptly: for eachItem in simpleGen(): print eachItem. Of course that was a silly example. I mean, why not use a real iterator for that? More motivation comes from being able to iterate through a sequence that requires the power of a function, rather than static objects already sitting in some sequence.

In the following example, we are going to create a random iterator that takes a sequence and returns a random item from that sequence:

from random import randint
def randIter(seq):
    while len(seq) > 0:
        yield seq.pop(randint(0, len(seq)-1))

The difference is that each item returned is also consumed from that sequence, sort of like a combination of list.pop() and random.choice():

>>> for eachItem in randIter([123, 'xyz',
    45.678, 9j]):
...     print eachItem
...
'xyz'
9j
45.678
123
Table 1 is a summary of the differences between iterators and generators. You can find more details on both iterators and generators in their respective PEPs (234 and 255).

Table 1. Differences between Iterators and Generators

Initiating the Process of Changing the Division Operator

This is perhaps the most controversial update to Python so far. There are many pros and cons, but finally those who believe in true division have won out. To highlight this change let's define (or redefine) some division terms and their functionality with integer and floating-point operands.

Classic Division

When presented with integer operands, classic division truncates the decimal place, returning an integer (see the “Floor Division” section below). When given a pair of floating-point operands, it returns the actual floating-point quotient (see the “True Division” section). Here is an example of what Python's division has been and still is today (actually a mix of true and floor division):

>>> 1 / 2          # perform integer result (floor)
0
>>> 1.0 / 2.0      # returns real quotient
0.5

True Division

This is the case where the result should always be the actual quotient, regardless of the type of the operands. This is the big change that is to come our way when Python 3.0 nears reality. For now, to take advantage of true division, one must give the from __future__ import division directive. Once that happens, the division operator ( / ) performs only true division:

>>> from __future__ import division
>>>
>>> 1 / 2               # returns real quotient
0.5
>>> 1.0 / 2.0           # returns real quotient
0.5

Floor Division

A new divisor operator ( // ) has been created that always truncates the fraction and rounds it to the next smallest whole number toward the left on the number line, regardless of the operands' numeric types. This operator works starting in 2.2 and does not require the __future__ directive above.

>>> 1 // 2          # floors result, returns integer
0
>>> 1.0 // 2.0      # floors result, returns float
0.0
>>> -1 // 2         # move left on number line
-1

Without getting into the arguments of this change, the feeling is that perhaps Python's division operator has been flawed since the beginning, especially because Python is a strong choice as a first programming language for people who aren't used to floor division. One of the examples Guido uses in his “What's New in Python 2.2” ZPUG talk is:

def velocity(distance, totalTime):
    rate = distance / totalTime
This is bad because this function is not numeric-type-independent. Your results with a pair of floats certainly differs from that of sending in a pair of integers. To bridge the dichotomy, you must resolve the following intransitivity in your head:
>>> 1 == 1.0
1
>>> 2 == 2.0
1
>>> 1 / 2 == 1.0 / 2.0            # classic division
0
If you use Python's new model of division, the universe is at peace once again:
>>> from __future__ import division
>>> 1 / 2 == 1.0 / 2.0            # true division
1
>>> 1 // 2 == 1.0 // 2.0          # floor division
1
While this seems like the proper and right thing to do, one cannot help but be concerned with the code breakage it may lead to. Fortunately, the Python developers have kept this in mind, as this change will not be permanent until Python 3.0, which is still years away. Those who desire the new division can import it or start Python with the -Qnew command-line option. There are a few options to turn on warnings to prepare for the upcoming new division.

You can get more information from PEP 238, but dig through the comp.lang.python archives for the heated debates. Table 2 summarizes the division operators in the various releases of Python and the differences in operation when you import division (from __future__).

Table 2. Division Operator Summary

Merging Types and Classes

Merging Python types and classes has been on the want list for quite a while. Programmers are dismayed to discover that they cannot subclass existing data types, such as a list, to customize for their applications.

To learn more, it can't hurt to look through both the PEPs involved and a tutorial Guido wrote specifically for those who want to get up to speed quickly on the new style classes without having to wade through all the intricate details found in the PEPs (see Resources). We will also give you a teaser class that extends a Python list with enhanced stack features.

This example, stack2.py, is motivated by one of the iterator examples above (see also Example 6.2 at the Core Python Programming web site).

#!/bin/env python
'stack2.py -- subclasses and extends a list'
class Stack(list):
  def __init__(self, *args):
      list.__init__(self, args)   # call base class
                                  # constructor
  def push(self, *args):
      for eachItem in args:       # can push multiple
          self.append(eachItem)   # items
  def pop(self, n=1):
      if n == 1:                  # pop single item
          return list.pop(self)
      else:                       # pop multiple items
          return [ list.pop(self) for i in range(n) ]

Below is the output we get from flexing our newfound capabilities:

>>> from stack2 import Stack
>>> m = Stack(123, 'xyz')
>>> m
[123, 'xyz']
>>> m.push(4.5)
>>> m
[123, 'xyz', 4.5]
>>> m.push(1+2j, 'abc')
>>> m
[123, 'xyz', 4.5, (1+2j), 'abc']
>>> m.pop()
'abc'
>>> m.pop(3)
[(1+2j), 4.5, 'xyz']
>>> m
[123]
In addition to being able to subclass built-in types, other highlights of the new style classes include:
  • “Cast” functions being factories.

  • New __class__, __dict__, and __bases__ attributes.

  • __getattribute__() Special Method (smarter than __getattr__()).

  • Class descriptors.

  • Class properties.

  • Static methods.

  • Class methods.

  • Superclass method calls.

  • Cooperative methods.

  • New diamond diagram name resolution.

  • Fixed set of allowed class attributes with Slots.

For more information on the new style classes and the unification of types and classes, see both PEPs 252 and 253 as well as the aforementioned tutorial by Guido.

Conclusion

Although all these new features and weakness resolutions bring Python pretty far down the path, there are those who claim that they violate Python's simplistic nature. If you're stictly a purist, that is probably a valid consideration. However, by finally purging some of the annoyances and adding a few more powerful constructs to the language, we are probably better off than we were before. These changes will not have a negative impact on existing code, and those that do, such as the change in the division operator, are at least not required for some time and allow for a more painless transition.

Finally, see Resources for a couple of other high-level documents, such as Andrew Kuchlin's “What's New in Python 2.2” and the slide presentation from one of Guido's talks last fall at a Python user group meeting. Python 2.2.1 can be downloaded at the main Python language home page. Happy hacking!

Resources

email: cyberweb@rocketmail.com

Wesley J. Chun, author of Core Python Programming, has over a decade of programming and instructional experience. Chun helped build Yahoo! Mail and Yahoo! People Search using Python and is currently employed by Synarc, a service company in clinical trials utilizing Python to develop applications that allow radiologists to perform patient assessments.

LJ Archive