Amadeus cookies policy - you'll see this message only once.

Amadeus use cookies on this website. They help us to know a little bit about you and how you use our website, which improves the browsing experience and marketing - both for you and for others. They are stored locally on your computer or mobile device. To accept cookies, continue browsing as normal. Or, go to the privacy policy for more information.

Zero Filling

There are many occasions when it is desirable to create a template data set containing variables initialised to zero. Where as this technique is possible using traditional data step techniques it can be more efficiently performed by using Proc SQL.

The example we will demonstrate looks at joining quality of life questionnaire data which in this case has a fixed structure of 10 questions. When filled in by 10 subjects in our intent to treat population we would ideally like to end up with a data set with 100 observations. Unfortunately, there may well be situations where a subject only answers 6 or 8 questions and in these circumstances we need to ensure we still end up with 10 observations for that subject, with the remaining questions initialised to 0.

We start with a data set containing our actual quality of life responses:

data qol_actual;
  input subjid question_no question;
  datalines;
1 1 2
1 2 10
1 3 4
1 4 3
1 5 10
1 6 10
1 7 6
1 8 6
2 1 9
2 2 6
2 3 9
2 4 1
2 5 10
3 1 3
3 2 3
3 3 7
3 4 10
3 5 3
3 6 7
3 7 5
3 8 6
3 9 3
3 10 5
4 1 3
4 2 4
4 3 5
4 4 7
4 5 2
4 6 2
4 7 9
4 8 3
4 9 10
4 10 10
5 1 2
5 2 7
6 1 3
6 2 7
6 3 5
6 4 1
6 5 4
6 6 8
6 7 2
6 8 2
6 9 6
6 10 3
7 1 6
7 2 8
7 3 10
7 4 6
7 5 2
8 1 4
8 2 7
8 3 2
8 4 2
8 5 3
8 6 7
8 7 5
8 8 1
8 9 3
8 10 5
9 1 9
9 2 9
9 3 9
9 4 3
9 5 4
9 6 4
9 7 4
9 8 8
9 9 6
9 10 7
10 1 9
10 2 6
10 3 8
10 4 2
10 5 1
10 6 9
10 7 7
10 8 1
10 9 9
10 10 4
run;

Knowing the total number of questions we can construct a template data set with each question initialised to missing:

data qol_template;
  length question_no 8;
  retain question 0;
  do question_no = 1 to 10;
    output;
  end;
run;
Quality of life template
question_ Obs no question
1 1 0 2 2 0 3 3 0 4 4 0 5 5 0 6 6 0 7 7 0 8 8 0 9 9 0 10 10 0

We can now use Proc SQL to perform a Cartesian product using the CROSS JOIN with our demographic data set. Cartesian products merge each row from one table with each row from the other. This will produce a data set which repeats the structure created in Qol_template for each subject.

data demog;
input age gender $ height weight subjid;
datalines;
22 M 64 60 1
26 F 87 74 2
29 M 55 62 3
28 M 84 74 4
23 F 84 88 5
30 M 87 76 6
34 F 78 72 7
38 F 69 56 8
56 F 52 60 9
52 F 66 69 10
run;
proc sql; create table allscores as select d.subjid, q.* from in.demog as d cross join qol_template as q order by subjid, question_no; quit;

Using the DATA step we can now join our actual quality of life data back onto this template, overwriting the initial values where present but otherwise ensuring that questions not answered are initialised to 0. This ensures that our final table always has 10 observations per subject.

data actualscores;
  merge allscores in.Qol_actual;
  by subjid question_no;
run;
             Actual Scores
question_ Obs subjid no question
1 001 1 2 2 001 2 10 3 001 3 4 4 001 4 3 5 001 5 10 6 001 6 10 7 001 7 6 8 001 8 6 9 001 9 1 10 001 10 1 11 002 1 9 12 002 2 6 13 002 3 9 14 002 4 1 15 002 5 10 16 002 6 0 17 002 7 0 18 002 8 0 19 002 9 0 20 002 10 0 21 003 1 3