20.4 Stacks and Messaging Protocol (Advanced Perl Programming)

20.4 Stacks and Messaging Protocol

Whew! We have now finished a reasonably in-depth look at all the value types offered by Perl. The next half of this chapter is devoted to understanding the data structures, API, and protocol used between caller and called subroutines.

We mentioned earlier that the argument stack is the data structure used for passing parameters and results between functions. Figure 20.11 shows the stack after calling foo(10,20), which in turn has called bar("hello", 30.2, 100).

Figure 20.11: Argument and mark stack after foo has been called and foo has just called bar

How does bar know how many parameters it should pick up from the top of stack? Well, Perl keeps track of the stretches of the argument stack using another stack called a markstack (a stack of bookmarks, in a sense). bar knows the parameters meant for it by simply computing the difference between the current top of stack and the bookmark stored at the top of markstack. This stretch of the stack corresponds to bar's @_ array. Conversely, when bar is ready to return, it dumps one or more results in its stretch of stack, and foo knows how many scalars have been returned by looking at the markstack.

All these manipulations happen transparently when you are in script space. But if you write C routines that are called by Perl (extending Perl) or call Perl functions from C (embedding Perl), there are some details to contend with. Although tools such as XS and SWIG help you write extensions easily, you will find that the following sections will pave the way for even more powerful and intuitive extensions (intuitive, that is, for the script writer).

20.4.1 Calling a Perl Subroutine

Let's start with the case in which you call a Perl subroutine from C, normally done when you embed the Perl interpreter in your application. Table 20.6 contains the macros (defined in pp.h) that you will need to use, in the order given. These macros may be difficult to remember on one reading, but the good news is that they are called exactly in the same order every time, and they sort of grow on you after a while.

Table 20.6: Macros Used in Calling a Perl Routine (Embedding)
Function/Macro	Description
dSP	Declare a few variables used by the following macros.
ENTER	Start scope.
SAVETMPS	All mortal variables created after this call will be deleted when `FREETMPS` is called. See explanation of `tmps_stack` in the next section.
PUSHMARK	Remember the current top of `stack` (updates `markstack`). `ENTER`, `SAVETMPS`, and `PUSHMARK` are called to prepare the stack for a subroutine call.
XPUSHs(SV*)	Now you can push any number of arguments onto the stack. If you push newly created SVs, you can mark them as mortal, and Perl will automatically delete them at end of scope.
PUTBACK	Indicates that all arguments have been pushed in. `PUSHMARK` and `PUTBACK` bracket the arguments, in a sense. At this stage, the Perl procedure can be called using `perl_call_pv` or `perl_call_sv`. (See the following example.)
SPAGAIN	Like `PUSHMARK`, it provides the opening bracket for the returned results. Even if there aren't any returned results, you must call it anyway.
POPi POPl POPn POPp POPs	Pop a scalar from the stack and return the appropriate type: integer, long, double, pointer (typically to string), and SV. `perl_call_pv` returns the number of result parameters pushed onto the stack, and you must take care to call these macros only that many times. Keep in mind that `POP` returns the results in the inverse order in which the Perl procedure pushed its results on the stack.
PUTBACK	Call this after all result parameters have been popped.
FREETMPS	See `SAVETMPS`.
LEAVE	Ends scope. See `ENTER`.

The code snippet shown next illustrates how to invoke a Perl procedure called add, with two input parameters 10 and 20, and how to retrieve the results. Note again that the macros are used in the order given in Table 20.6.

#include <perl.h>
void foo() {
    int n;         /* number of parameters returned by add       */
    dSP;  
    ENTER;         /* Tell perl we are entering a new scope      */
    SAVETMPS;      /* Ensure that FREETMPS will free only those
                      mortals created after this stmt            */
    PUSHMARK(sp);  /* Remember the current stack pointer. sp is 
                     declared by dSP                             */
    /* Push arguments */
    XPUSHs(sv_2mortal(newSViv(10)));         /* push an integer  */
    XPUSHs(sv_2mortal(newSViv(20)));         /* push another     */
    PUTBACK;                                 /* End of arguments */

    /* Call subroutine by name, and expect it to return a scalar */
    n = perl_call_pv ("add", G_SCALAR); 

    SPAGAIN;                  /* Start looking at return results */

    /* Retrieve returned value from stack */
    if (n == 1)
        printf ("Result: %d \n", POPi);
    /* Closing details                                           */ 
    PUTBACK;            /* Finished removing results from stack  */
    /* Time to clean up and leave ..                             */
    FREETMPS;  /* Frees the two mortal parameters passed to add  */
    LEAVE;     /* Leave scope                                    */
}

This is all you need to understand the section "Easy Embedding API," which implements the perl_call_va convenience function introduced in Chapter 19.

20.4.2 The Called Side: Hand-Coding an XSUB

Having seen what it takes to call a Perl subroutine, let's look at the stack from the viewpoint of a called subroutine. This is precisely the situation that all XSUBs are in, and after this section, you'll be able to completely understand the code produced by SWIG and xsubpp.

First, let's settle the issue of how Perl discovers your XSUB. That is, if someone writes "add($a,$b,$c)" in a script, how does Perl know to call the C procedure add, or my_add, or whatever? Well, you have to create a binding between a subroutine name (as known in script space) and a C procedure, using the procedure newXS like this:

extern XS(add);             /* XS macro explained in Table 20.7 next */
newXS("add", add, "add.c"); /* Filename given for debugging reasons  */

For a module called foo, XS and SWIG generate a procedure called boot_foo, which uses newXS to bind all XSUBs in that module to the corresponding names. The elegant thing about this approach is that boot_foo itself is an XSUB, and if you use dynamic loading, this procedure is called by the Dynaloader module at run-time.

XSUBs use the macros (defined in XSUB.h) listed in Table 20.7 to examine the stack and return results.

Table 20.7: Macros Used to Manipulate the Stack (Embedding)
Function/Macro	Description
XS	Supplies the standard signature required for your XSUB. For example, the procedure `foo` should be declared thus: XS(foo) { }
dXSARGS	Defines some local variables used by the other macros. The important one is an integer called `items`, which contains the number of parameters pushed onto the stack by the caller.
SV* ST(n)	Retrieves the nth parameter (an SV*) from the stack. `ST(0)` refers to the first parameter (`$_[0]`), and `ST(items-1)` is the last parameter.
XSRETURN(n)	Indicates that you have left n result parameters on the stack and returns. In the typical case in which you have only one value to return, you can use one of the more convenient macros listed below.
XSRETURN_NO XSRETURN_YES XSRETURN_UNDEF	Issues `XSRETURN(1)` after leaving an SV on the stack with a value of 0, 1, or undef.
XSRETURN_EMPTY	The same as `XSRETURN(0).`
XSRETURN_IV (int) XSRETURN_NV (double) XSRETURN_PV (char *)	Leaves a new mortal scalar with the appropriate value type. This scalar will be deleted when the caller invokes `FREETMPS`.

The following snippet shows the hand-coded XSUB add, which adds all its input parameters and returns the result:

#include <perl.h>
#include <XSUB.h>
XS(add)                              /* All XSUBs have this signature*/
{
    int sum = 0;
    dXSARGS;                         /* defines 'items', and inits it* 
                                      * with the number of params    */
    if (items == 0) 
        XSRETURN_IV(0);    /* Return 0 if param list is empty        */

    for (--items ; items >= 0 ; --items) {
        if (SvIOK(ST(items))         /* If SV contains an integer    */
           sum += SvIV(ST(items));   
    }
    XSRETURN_IV (sum);
}

20.4.2.1 Returning a variable list of results

The subroutine in the preceding example returns one parameter. Returning multiple parameters is straightforward too. The following example shows how a null-terminated array of strings (argv) is converted to an equal number of result parameters on the stack:

int i = 0;
for ( ; *argv; argv++, i++) {
    ST(i) = sv_2mortal(newSVPV(*argv,0));
} 
XSRETURN(i);

As you can see, returned parameters occupy the stretch of argument stack between ST(0) and ST(n-1). XSRETURN adjusts the markstack so that the caller can see the number of scalars being returned. It is important to note that the preceding code does not modify the input arguments that happen to live in the same stretch; it updates the stack to point to new SVs. (Remember that the stack is an array of SV*s.) To modify an input parameter directly, you would instead write:

sv_setpv(ST(i), "hello", 0); /* Like modifying $_[i] */

While functions such as read do this, I recommend that you refrain from taking advantage of it, and create new SVs instead. In addition, to save the calling code from worrying about memory management or reference counting issues, give that responsibility to Perl by making these new values mortal. They will then be automatically deleted at the end of scope.

20.4.2.2 Ensuring that the stack is big enough

The ST macro refers directly to the corresponding spot on the stack. Because the stack may not have been extended enough to accommodate the argument in the macro, you cannot arbitrarily say, for example, ST(100) without risking a crash. The EXTEND macro ensures that the stack is big enough to hold your data:

EXTEND(sp, 100); /* Extend stack by 100 elements */

This macro can be used in both the caller and the called subroutines. The variable sp (the stack pointer) is defined automatically for you (by the dSP and dXSARGS macros). ST() could have used av_store() to automatically extend the stack, but that would be considerably slower.

There's an alternative approach. If we reset the stack pointer back to the bottom of our stretch of stack, then we can use the XPUSHs macro, which automatically extends the stack for you:

i = 0;
sp -= items;          /* Resets stack pointer to beginning */
for ( ; *argv; argv++, i++) {
    /* Push fresh mortal string-valued scalars */
    XPUSHs(sv_2mortal(newSVpv(*argv, 0)));
}
XSRETURN(i);

This is precisely the strategy adopted by the PPCODE directive in XS, as we shall soon see. As I said earlier, this code doesn't modify the input parameters; it simply replaces those pointers in the stack with new ones. Note that if we forgot to reset the stack pointer, we would be piling stuff on top of the input parameters, and all hell would break loose when this procedure returns.

20.4.3 Inside Other Stacks

Let us take a brief look at the stacks available inside Perl (besides the argument and mark stacks) to understand what the macros described in the preceding sections do internally. Unless you are curious about these kind of details, you can safely skip this section without loss of continuity.

Save stack (savestack)

This stack is used as a repository for storing all pieces of global information that are liable to change within a nested scope. To safely squirrel away an integer, for example, Perl uses a macro called SSPUSHINT (in scope.h). This macro pushes three pieces of information on savestack: the value of the integer, the address of the integer, and the fact that an integer has been stored. The value of this integer can now be changed freely within a nested scope. At the end of the current scope, Perl pops the savestack and knows that because an integer has been stored, it must also have stored the old pointer and value. Thus the original integer is efficiently restored.

A statement such as local($a) is implemented by saving the GV corresponding to "a" and its scalar value on the save stack; the scalar value is replaced with a new scalar. When the scope ends, the GV and its scalar pointer are automatically restored.

Scope stack (scopestack)

The scope stack is used to remember positions along the save stack that correspond to different scopes (analogous to the markstack providing bookmarks for the argument stack). When the scope ends (upon LEAVE), Perl knows exactly how many objects to pop off the save stack and restores them to their former values.

Temporaries stack (tmps_stack)

When you create a mortal variable or mark a variable as mortal (using sv_2mortal or local, in script space), Perl pushes this SV on to this stack (without touching its reference count). At the end of scope, it decrements the reference count of all temporary variables pushed into the stack in that scope. Recall that my variables (lexicals) sit in CV-specific scratchpads, so they never touch the temporaries stack.

Return stack (retstack)

Before calling a subroutine, Perl remembers the starting opcode of the statement following that subroutine call by pushing it on the retstack.

Context stack (cxstack)

This stack keeps track of the context information for the current block, such as the block label and the CV to execute when last, redo, or next are invoked. These are restored to the previous elements when the block is exited. I do not know why there are two stacks to deal with scope-related context information.


20.3 Perl Value Types		20.5 Meaty Extensions