DOSUBL, Macros and Gini Coefficients
The DOSUBL function is a powerful facility for running code “on the side”. Within a DATA step, a DOSUBL call can, for example, run one or more SQL queries to define macro variables which can then be used later in the same DATA step. One way of using DOSUBL is to have it call a macro, which defines the code to be executed.
The example here uses a macro that calculates a Gini coefficient. The Gini coefficient is a statistic that SAS does not in general provide convenient facilities to calculate. Its best-known application is as a measure of the uniformity (or fairness) of the distribution of incomes. According to some, Gini is an acronym for “Generalised Inequality Index”, but in fact it comes from the name of its inventor, an Italian statistician called Corrado Gini. Here is a macro to calculate a Gini coefficient from income values held in a data set.
The macro takes three parameters: the names of a data set, and an income variable and a weighting variable that it contains. It creates a new macro variable called GINI, which contains the value of the Gini coefficient. The details of the calculations are not particularly illuminating, but they are what Mr Gini wanted. A SQL step sorts the data by income value, and calculates sums of weights, and of weighted incomes. A DATA step then calculates partial sums of the weightings, and a variable NUM which is a partial sum of the weighted incomes, weighted by (essentially) the partial sum of the weightings. At end of file, a final calculation is done and the result saved in a macro variable called GINI.
The DATA step below shows the DOSUBL function being used to invoke this macro twice, to calculate Gini coefficients for two different income variables from data set GINIDATA.INCOMES.
The parameter to DOSUBL is a text string which can contain multiple SAS statements; here it contains simply a macro invocation, which keeps it readable. Data set GINIDATA.INCOMES contains two income variables, called INCINIT and INCFINAL, and a weighting variable called WEIGHT. After each DOSUBL call, the value of the GINI macro variable is obtained using SYMGET. NB GINI has to be a global variable. The resulting data set looks something like:
The Gini coefficient takes values between 0 and 1, with 0 indicating perfect equality and 1 perfect inequality. These results show final incomes (Gini=0.41) a lot more equal than initial incomes (Gini=0.79), suggesting that factors such as taxation are having a significant effect. NB The test data used here is fictitious!
Superficially, the DOSUBL calls may appear similar to calls of EXECUTE, but they have the important difference that with DOSUBL the code is run immediately, and the macro variable it creates is available immediately i.e. later in the DATA step. (With EXECUTE, the code is not run until after the DATA step has finished.)
Another similar facility is the RUN_MACRO function, but that is also less attractive. RUN_MACRO is only available within user-defined functions created using Proc FCMP, and the macros it calls are required to be non-standard in some respects (such as the quoting of character parameters).