Amadeus cookies policy - you'll see this message only once.

Amadeus use cookies on this website. They help us to know a little bit about you and how you use our website, which improves the browsing experience and marketing - both for you and for others. They are stored locally on your computer or mobile device. To accept cookies, continue browsing as normal. Or, go to the privacy policy for more information.

Creating a Stratified Sample of Data Using Proc SURVEYSELECT

Proc SURVEYSELECT is a very useful SAS/STAT procedure for taking samples from datasets using a variety of different methodologies. This has wide ranging applications including selecting a random sample of people to survey; forming a control group of customers to assess the effectiveness of a marketing campaign and creating a set of validation data to confirm a statistical model.

The dataset SASHELP.CARS contains data about a variety of different makes and model of car manufactured in Asia, Europe and USA.

The following code uses Proc SURVEYSELECT to create a sample of approximately one third of the whole dataset whilst preserving the overall proportion of cars from each country in the sample (note that the dataset must be sorted by any variables used in the strata statement and that the seed option is used in the Proc SURVEYSELECT statement to allow you to reproduce the results):

proc sort data = out =;
  by origin;
proc surveyselect data = out = work.control method = srs samprate = (0.333 0.333 0.333) seed = 123456789; strata origin; run;

Running a quick Proc FREQ on both the original dataset (n=428) and the Control dataset (n=143) reveals that the proportion of cars from each country in both datasets is similar:

Asia 36.92% 37.06%
Europe 28.74% 28.67%
USA 34.35% 34.27%

Having created the control group, the cars in the dataset can be excluded from rest of the data by using a GATING IF in a MERGE using appropriate BY variables.

For further information please consult the SAS Help documentation.