Recursive User-Written Functions
User-written functions created using Proc FCMP can be recursive i.e. they can call themselves. Here is a simple example: a subroutine that changes any number of consecutive commas in a character string to a single comma.
This is defined as a subroutine rather than a function because the only parameter is a character string. It is more efficient to pass character strings and arrays by reference, which is what a subroutine does. With a function, they would be passed by value. The OUTARGS statement specifies which parameters are to be returned to the caller.
Subroutine DECOMMA calls the standard SAS function TRANWRD to replace pairs of commas by single commas. Then it looks at the new string, and if any multiple commas remain, it calls itself to get rid of them.
The OUTLIB option to the Proc FCMP statement specifies that the subroutine definition is to be written to a “package” called CHAR in function library WORK.FUNCS (which is just a data set, physically); the CMPLIB option makes this function library available for use, for example in DATA steps.
The output from this example is:
What made DECOMMA easy to implement was that the only variable it needed to use was a parameter. When a function or subroutine uses local variables, life can become more complicated. There is no concept of the “scope” of such a variable. When the routine calls itself, the value of such a variable is liable to be overwritten by a value from the next level down. The routine must therefore be structured so that no non-parameter variables need to have their values preserved when it calls itself.
The next example tackles the problem of factorising an integer. We would like to input a number and get back a character string containing its complete factorisation e.g. for 108, the result should be “2 x 2 x 3 x 3 x 3”. Here is a subroutine that does what is necessary, with the number and the string as its two parameters.
The DO loop looks for the smallest factor of the parameter N; if it finds one, then Q is set to N divided by that factor. The rest of the logic is all about whether we found a factor, and whether the FACTORS string already included any factors. In all cases, the FACTORS string is updated as appropriate. If a factor has been found, then FACTORISE_SUB calls itself recursively to find all the others. This recursive call will corrupt the values of such variables as I and THISFACTOR, but that doesn’t matter, since at the higher level they are not needed any more.
This subroutine is a little awkward to use. The caller has to declare a character variable long enough to contain the factors string, and has to initialise it to blank before calling the subroutine. (The subroutine cannot tell whether or not it is being called by itself, so would be unable to initialise the string for itself.)
However, we could define a function that calls it:
This function takes a single numeric parameter, and returns the factorisation string in a variable of up to 1024 characters. One advantage of having it set up as a function is that it can then be used within Proc FORMAT to define quite an interesting format. (A subroutine could not be used here.)
The format uses a default length of 40 if no length has been defined for the target string – as occurs in the example DATA step. The output from the DATA step is:
A production version of this code (if ever required!) would need some parameter validation, and careful consideration of string lengths.