Amadeus cookies policy - you'll see this message only once.

Amadeus use cookies on this website. They help us to know a little bit about you and how you use our website, which improves the browsing experience and marketing - both for you and for others. They are stored locally on your computer or mobile device. To accept cookies, continue browsing as normal. Or, go to the privacy policy for more information.

Linguistic Sorting

The way Proc SORT orders data is not always ideal, from the programmer's point of view. However, the SORTSEQ option has some useful features which deserve to be more widely known. Let's look first of all at the normal situation.

Linguistic Sorting Image 1

The sorted version of this data set begins:

Linguistic Sorting Image 2

The test data set has arranged for the first character of the NUMTXT string to alternate between lower case "n" and upper case "N". Because the sorting is case sensitive, all the "N"s come before all the "n"s. And of course the numeric parts of the string are sorted alphabetically, which here bears little relation to the actual values they represent.

The SORTSEQ option supports a lot of values like "Danish", to take account of the alphabetical order used by particular natural languages. However, it also supports the value "Linguistic", which gives case-insensitive sorting:

Linguistic Sorting Image 3

Linguistic Sorting Image 4

Yes, that's case insensitive now. But the numbers are still in a pretty useless order. If only there were a sub-option that would fix that... There is; it's called "numeric collation".

Linguistic Sorting Image 5

Linguistic Sorting Image 6

Another sub-option, ALTERNATE_HANDLING=SHIFTED, tells Proc SORT to ignore spaces - so that a value like "N" 4", for example, would still come between "n3" and "n5".