Amadeus cookies policy - you'll see this message only once.

Amadeus use cookies on this website. They help us to know a little bit about you and how you use our website, which improves the browsing experience and marketing - both for you and for others. They are stored locally on your computer or mobile device. To accept cookies, continue browsing as normal. Or, go to the privacy policy for more information.

Sorted

Here are some assorted tips related to Proc SORT.  First, let’s look at the detection of duplicates.

Sorted Image 1

Here we use the DUPOUT option to write duplicates to a data set. Suppose that for one particular key there are four matching records; then the first of them will go into the main SORTED data set, and the other three into DUPES. If instead we wanted all four of the duplicates to go into the DUPES dataset, we could do it like this:

Sorted Image 2

Beware, incidentally, of the NODUPRECS (or NODUP) option. This will eliminate duplicate records only if they occur consecutively in the sorted dataset. This is guaranteed to remove all duplicate records only if you sort BY all variables.

Sorting can be greedy in terms of workspace. If you need to use less workspace, one option is to use TAGSORT; however this can significantly increase runtime. An alternative technique is to split your data set into smaller chunks, sort the chunks, and then put them back together with an “interleave” data step:

Sorted Image 3

Each of these sorts will use half as much workspace as sorting the whole file at once. The total runtime should be virtually unchanged.

Finally, in data warehousing applications, you may sometimes see sort-related error messages when using the Type II Loader transform in DI Studio. These can arise when the sort order is a matter of disagreement between SAS and the database being loaded. (Sometimes the issue is simply one of case sensitivity.) You can eliminate such problems by specifying the SAS option SORTPGM=SAS, which makes SAS responsible for all sorting.