Using Macros to Input Multiple CSV Files into a Data Set
This tip highlights how simple it can be to read in multiple CSV files and output them as a single SAS Data Set utilising the FILENAME command and the PIPE device-type keyword.
The PIPE device-type keyword in the FILENAME statement allows SAS to execute a command outside of SAS and redirect its results to SAS.
Here, we use the FILENAME PIPE combination to run the "DIR" command which is a Windows command which returns the files and directory listings of a specified file path. This example concatenates multiple CSV files using just two data steps; the first data set records all of the data set names whilst the second data step performs multiple concatenations.
A high school uses CSV files to track the grades of students in separate classes. Each class has their own CSV file which details the grades of the students. Providing these files are saved in a single location, the data can be easily combined:
We assume that all CSV files follow the same format - they consist of: gender, first name, surname and the grades the students have attained in five different subjects.
Three files have been created to simulate three different classes; 5J, 5L and 5M. Each file has been saved as a CSV file in the same directory C:Exam_Results. A macro variable is then created in SAS which points to this directory:
The macro variable is then utilised in a filename statement along with the PIPE device-type keyword to read from the windows “DIR†command results to identify all of the CSV files within this location. The /b option allows for only the filenames to be picked up from the directory and does not include the directory path (See Note).
The data set file_list can now be created. This data set contains the names of each of the files that will be input into the main data set. This data set identifies the three CSV files saved within the C:Exam_Results directory.
A separate data statement can then be issued in order to read all of the file names from the file_list data set. Each file name identified is input into the filepath variable to detect its location. The do loop ensures that the data set inputs every variable within each CSV file until the end of the file_list data set is reached. This inputs all of the students' details into the all_students data set and also includes a classname variable derived from the filename to identify the class of each student.
The output data set all_students contains the contents of all three of the CSV files in the order in which they are input. This data can be used to perform further analysis.
Note: In order to use the filename pipe statement you will need to ensure that XCMD is enabled. This can be checked using a proc options.
If XCMD is enabled the following will be visible within the log: