Amadeus cookies policy - you'll see this message only once.

Amadeus use cookies on this website. They help us to know a little bit about you and how you use our website, which improves the browsing experience and marketing - both for you and for others. They are stored locally on your computer or mobile device. To accept cookies, continue browsing as normal. Or, go to the privacy policy for more information.

Data Mining: Transforming Variables

Data manipulation is an important part of the data mining process. In order to ensure that all of the data is used effectively, it can sometimes be useful to transform the variables. This can reduce the variances in the data and correct incorrect data within the data set. Within SAS Enterprise Miner™ software the Transform Variables node can be used to manipulate variables. The Transform Variables node allows you to make variable transformations that can maximise the correlation with the target variable and normalising variables. This node can also be used to create interaction variables from class variables.

Transform Variables Node

Within SAS Enterprise Miner, the Transform Variable node can be found within the modify tab:

Data Mining Removing Variables Image 1
The Transform Variables node can be dragged on to a SAS Enterprise Miner diagram and joined using an arrow to direct the flow of the data through the system:

Data Mining Transforming Variables Image 2

To view and modify the properties of the Transform Variable node, click on the node and the properties will appear in the left pane.

The ‘Default Methods’ for interval and class variables is to perform no transformation. The default method for missing values is not to treat any missing values as a level.

In order to choose variables for the transformation you can use the ‘…’ button highlighted below to bring up the variables window.

Data Mining Transforming Variables Image 3

The Formulas window is a useful tool to view any initial trends within the data set. This window displays a histogram to show a plot of the data within the data set. If a histogram shows a significant skewness on the data, this could indicate that a transformation could improve further modelling.

To view the histogram for each of the variables you can click on the name of the variable:

Data Mining Transforming Variables Image 4

You can see from the histogram above that the Div variable is not normally distributed. This could indicate that the Div variable should be transformed within the data set. Once you have identified the variable that you would like to transform, the formulas window can be closed.

To transform a variable within the data set you will need to choose the ‘…’ next to the Variables property to open up the Variables – Trans window.

Data Mining Transforming Variables Image 5

The Variables – Trans window allows you to choose the method to use to transform the variable. A number of options are available for this as listed below:

  • Best
  • Bucket
  • Centring
  • Equalise
  • Exponential
  • Inverse
  • Log 
  • Log 10
  • Max Correlation
  • Max Normal
  • Multiple
  • Optimal Binning
  • Optimal Max Equalise
  • Quantile
  • Range Standardisation
  • Square
  • Square Root
  • Standardise

By default, no transformation methods are specified for interval or class variables and missing values are not treated as a level within the data set.

Data Mining Transforming Variables Image 6

To identify the most suitable transformation method for the Div, the ‘Explore…’ button can be helpful.

Data Mining Transforming Variables Image 7

In order to analyse the data fully, in this case it may be beneficial to split the leagues and divisions as follows:

Current Country Region
AE America East
AW America West
NE Nebraska East
NW Nebraska West

This can be completed using the SAS Code editor, which can be found under the Train heading as below:

Data Mining Transforming Variables Image 8
The substring function has been used to extract the country code and two new variables are created within the data set.

Data Mining Transforming Variables Image 9

Choose 'OK' to apply the code and run the Transform Variable node by right-clicking and selecting 'Run'.

On adding any further nodes to the diagram, you will now be able to use the new variables.

Data Mining Transforming Variables2

The new variables can be viewed by right-clicking the Regression Node and choosing 'Edit Variables'.

Data Mining Transforming Variables LAST2