Retrieving Files from an SFTP Site
In an age where secure transfer of data becomes increasingly important, we regularly find ourselves being asked to retrieve data via Secure File Transfer Protocol (SFTP). To do this via SAS we could go down the route documented on the SAS Support website, i.e. using a filename statement with the SFTP access method. However, this relies on the use of PuTTY software to enable SAS to access files on the OpenSSH server and requires public key authentication to be set up. This is complicated and not always practical.
An alternative approach is presented here. It uses PuTTY utility psftp.exe to allow files to be copied from the SFTP site, to a location where they can be read by SAS. PSFTP, the PuTTY SFTP client, is a tool for transferring files securely between computers using an SSH connection.
PuTTY can be downloaded from http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
It is freeware, developed and maintained by a group of volunteers. It should be downloaded to: C:\Program Files (x86)\PuTTY
With psftp.exe, files from the SFTP server can be retrieved to the local server, where they can be read to SAS as normal. The basic approach is to use SAS programmatically, in combination with batch files to:
- Test the access to the SFTP server
- List the files on the SFTP server and output a txt file with the content
- Manipulate the content from the list to create a further txt file containing ‘get’ commands
- Submit the file with the ‘get’ commands to the SFTP server
1. Test the access to the SFTP server
This can be done without the need for a batch file, using a systask command to call psftp.exe and submit a dummy batch file to the SFTP server. The systask command, calls psftp.exe, provides the server login details and submits a batch command file ‘dummy’ to the server. No actual batch file is required, and there is no output from this step. However, a status macro variable is set indicating the status of the task. This is set to ‘0’ if a connection is achieved and ‘1’ if it is not. A typical macro is:
Note the use of the ECHO N command prior to calling psftp.exe. This is used in this example and all subsequent batch files that call psftp.exe. This is required as the servers host key may not be cached in registry. Without this command you may see in the SAS log:
The server's host key is not cached in the registry. You have no guarantee that the server is the computer you think it is.
The server's dss key fingerprint is: ssh-dss 1024 somekey
With the echo command, you are answering ‘N’ to the question:
The server's host key is not cached in the registry. You have no guarantee that the server is the computer you think it is. The server's dss key fingerprint is: [ssh-dss 1024 somekey] If you trust this host, enter "y" to add the key to PuTTY's cache and carry on connecting. If you want to carry on connecting just once, without adding the key to the cache, enter "n". If you do not trust this host, press Return to abandon the connection. Store key in cache? (y/n)
By answering ‘N’ you get a consistent approach and are not relying on the fingerprint being cached.
2. List the files on the SFTP server and output .txt files with the content
To list the files on the SFTP server, a batch file is used to call psftp.exe, and issue ‘ls’ commands to the SFTP server via a .txt file. The output from the ‘ls’ commands is piped to a further .txt file, using the unnamed pipe access method. Example code is:
Batch_LS.cmd contains the following line:
echo N | "C:\Program Files (x86)\PuTTY\psftp.exe" sftp_site -l "username" -pw
"password" -b "D:\SAS\Programs\sftp_scripts\batch_ls.txt"
The commands in file batch_ls.txt are submitted to the server. These are typically:
cd dir_of_interest - navigate to the directory you require
ls - list the contents
cd next_dir - change to another directory
ls - list the contents
cd .. - go up a level
cd next_dir2 - change to another directory
ls - list the contents
The result of the listing commands are piped to file batch_LS_STDOUT.txt. This contains a typical listing of the directories and files and includes a datetime stamp for each file.
A typical SAS log after the DATA step is:
So note, there is no indication of the success of the step and automatic macro variable &syserr is not set in case of an error.
3. Manipulate the content from the list to create a further .txt file containing ‘get’ commands
The next step is to read the generated listing file batch_LS_STDOUT.txt to SAS, and manipulate it to include just the files required. Then generate a further .txt file batch_get.txt which contains commands to navigate to the correct place on the SFTP server, indicate the local storage location and get the files from the SFTP server, e.g.
lcd D:\SAS\Data\1.Inputs - local storage location for files
A DATA step with a file statement is used to generate the batch_get.txt file. For example:
4. Submit the file with the ‘get’ commands to the SFTP server
The final step is to submit the file containing the get statements to the SFTP server and retrieve the files. Typical code for this is:
Batch_GET.cmd contains the following line:
echo N | "C:\Program Files (x86)\PuTTY\psftp.exe" sftp_site -l "username" -pw "password" -b "D:\SAS\Programs\sftp_scripts\batch_get.txt"
The commands in file batch_get.txt files will be issued to the SFTP server via psftp.exe and the files copied from SFTP server to the local computer.