Project 2: FTPS Client

Implement a real protocol using two sockets

Description

The goal of this project is to deepen your ability to write network code. At this point you can open a socket and send and receive simple messages. In this project, you will implement a client for a more complex protocol, that has more features, and uses two sockets rather than one.

Specifically, in this project you will develop a client for the File Transfer Protocol Secure (FTPS) protocol. FTPS is an extension of the FTP protocol that adds TLS for additional security. We have setup an FTPS server for you to use when developing and debugging your client. The server is available at ftps://ftp.3700.network.

What is FTP

Developed in 1971, the File Transfer Protocol (FTP) is one of the oldest protocols still in common use today. FTP’s development predates TCP/IP, yet the protocol still works fairly well on the modern internet.

FTP is a client-server oriented protocol for uploading, downloading, and managing files. The FTP server’s job is to listen for incoming connections from clients and then respond to their requests. These requests include all common file and directory operations, such as listing the files in a directory, making and deleting directories, uploading files from the client machine to the server, and downloading files from the server to the client machines. FTP clients connect to FTP servers and issue commands to tell the server what to do.

Because the FTP protocol is so old, it has many, many features, some of which are vestigial and no longer make sense on the modern internet (e.g., uploading a file in 36-bit, EBCDIC format directly to a line-feed printer), and others that are so esoteric that they are rarely used and supported. Wikipedia has an extensive article on the FTP protocol, as well as a list of all FTP protocol commands. Fortunately, a FTP client only needs to support a minimum, basic set of commands in order to function. We outline the necessary functionality for the FTP client you will develop below. Our reference implementation is roughly 300 lines of Python3, including self-documentation and extensive error checking.

How does FTP relate to FTPS

As noted above, FTPS is an extension of the basic FTP protocol that leverages TLS to add (some) security to file transfers. FTP is to FTPS what HTTP is to HTTPS.

There are many ways to combine FTP and TLS. The method used by our server is specifically known as explicit FTPS, sometimes called FTPES. Our server does not support the implicit method of combining FTP and TLS. We detail precisely how explicit FTPS works below.

Note that FTPS is not the same as SFTP. SFTP also aims to add encryption to the FTP protocol, but it achieves this by tunnelling the FTP protocol over the SSH protocol. You will not be writing an SFTP client in this project.

FTPS Server Decorum

We have set up a FTPS server for you to help you develop and debug your client. It is available at ftps://ftp.3700.network. To use the FTPS server you must login with a username and password. We will email each student to give them a unique username and password. After logging in, the server will give each student access to an isolated folder for uploading and downloading files.

WARNING: DO NOT TEST YOUR FTP CLIENT ON PUBLIC FTP SERVERS
There are many FTP servers out there on the internet. Even though some of these FTP servers are "public" and allow anonymous access (i.e., no password required), we ask that students confine their testing to our server. Since your clients will behave weirdly and violate the FTP protocol during development, you don't want to send this kind of anomalous traffic to unsuspecting FTP servers, as it may alarm their administrators.

WARNING: DO NOT UPLOAD GIANT OR ILLEGAL FILES TO OUR SERVER

Our server does not have unlimited capacity. If we observe a student uploading large files to the server (say, larger than 10 megabytes), those files will be deleted and we will cut off the student's access.

Similarly, students are responsible for the files they upload to the server and are expected to exercise good judgement. Do not upload illegal or inappropriate content to the server. Any student that violates this rule will be reported to the university administration, along with copies of the offending files.

Getting Familiar with FTPS

If you’ve never used an FTP/FTPS client or server before, it may be helpful to use an existing client to familiarize yourself with how FTP works at a high-level. We recommend that beginners start with FileZilla, which is a popular, feature packed, open-source FTP client that works across all major operating systems. Note that FileZilla is a GUI client; in this project you will be building a command line client.

Unix and BSD-compliant systems like Linux and MacOS have a command line FTP client named ftp that is typically pre-installed, or can be installed easily from the command line. This command line ftp client is interactive: running it opens an FTP shell where the user can type FTP commands like “dir” and “cd” to view and change remote filesystem directories, respectively. The “help” command will list available FTP commands, and “help [command name]” provides helpful information about a specific command. Note that on some systems, the default ftp client does not support FTPS. For example, on Ubuntu you can sudo apt install ftp-ssl to replace the default tool with one that supports FTPS.

The curl command line program, which is also available for most Unix and BSD-based systems, supports downloading files from FTPS servers. It has a command line syntax that is most similar (though not identical) to the FTPS client that you will be developing in this assignment.

Your Program

Your goal to write a basic FTPS client application. This client will run on the command line, and must support the following six operations: directory listing, making directories, file deletion, directory deletion, copying files to and from the FTP server, and moving files to and from the FTP server.

Your FTPS client must execute on the command line using the following syntax:

$ ./3700ftp [operation] [param1] [param2]

operation is a string that specifies what operation the user is attempting to perform. Valid operations are ls, mkdir, rm, rmdir, cp, and mv. Each operation requires either one or two parameters, denoted by param1 and param2 on the command line. param1 and param2 are strings that either represent a path to a file on the local filesystem, or a URL to a file or directory on an FTP server.

3700ftp is not required to print anything to STDOUT or STDERR. That said, students may add functionality that prints out FTP protocol messages (e.g., to aid debugging), directory listings, and errors (e.g., network or socket errors, FTP protocol errors, etc.).

Here is an example of a full-fledged 3700ftp implementation. Note that it includes two optional parameters (–verbose and –help); your client is not required to have these optional parameters. The help for this client describes what each operation does, how to interpret the command parameters with respect to each operation, and the URL format used to specify the FTP server’s connection information.

$ ./3700ftp --help
usage: 3700ftp.py [-h] [--verbose] operation params [params ...]

FTPS client for listing, copying, moving, and deleting files and directories on remote FTPS servers.

positional arguments:
operation      The operation to execute. Valid operations are 'ls', 'rm', 'rmdir',
              'mkdir', 'cp', and 'mv'
params         Parameters for the given operation. Will be one or two paths and/or URLs.

optional arguments:
-h, --help     show this help message and exit
--verbose, -v  Print all messages to and from the FTP server

# Available Operations

This FTPS client supports the following operations:

ls <URL>                 Print out the directory listing from the FTPS server at the given URL
mkdir <URL>              Create a new directory on the FTPS server at the given URL
rm <URL>                 Delete the file on the FTPS server at the given URL
rmdir <URL>              Delete the directory on the FTPS server at the given URL
cp <ARG1> <ARG2>         Copy the file given by ARG1 to the file given by
                          ARG2. If ARG1 is a local file, then ARG2 must be a URL, and vice-versa.
mv <ARG1> <ARG2>         Move the file given by ARG1 to the file given by
                          ARG2. If ARG1 is a local file, then ARG2 must be a URL, and vice-versa.

# URL Format and Defaults

Remote files and directories should be specified in the following URL format:

ftps://[USER[:PASSWORD]@]HOST[:PORT]/PATH

Where USER and PASSWORD are the username and password to access the FTPS server,
HOST is the domain name or IP address of the FTPS server, PORT is the remote port
for the FTPS server, and PATH is the path to a file or directory.

HOST and PATH are the minimum required fields in the URL, all other fields are optional.
The default USER is 'anonymous' with no PASSWORD. The default PORT is 21.

# Example Usage

List the files in the FTPS server's root directory:

$ ./3700ftp ls ftps://bob:s3cr3t@ftp.example.com/

List the files in a specific directory on the FTPS server:

$ ./3700ftp ls ftps://bob:s3cr3t@ftp.example.com/documents/homeworks

Delete a file on the FTPS server:

$ ./3700ftp rm ftps://bob:s3cr3t@ftp.example.com/documents/homeworks/homework1.docx

Delete a directory on the FTPS server:

$ ./3700ftp rmdir ftps://bob:s3cr3t@ftp.example.com/documents/homeworks

Make a remote directory:

$ ./3700ftp mkdir ftps://bob:s3cr3t@ftp.example.com/documents/homeworks-v2

Copy a file from the local machine to the FTPS server:

$ ./3700ftp cp other-hw/essay.pdf ftps://bob:s3cr3t@ftp.example.com/documents/homeworks-v2/essay.pdf

Copy a file from the FTPS server to the local machine:

$ ./3700ftp cp ftps://bob:s3cr3t@ftp.example.com/documents/todo-list.txt other-hw/todo-list.txt

(Il)Legal Libraries

Part of the challenge of this assignment is that all FTPS request and response code must be written by the student, from scratch. In other words, you need to implement the FTPS protocol yourself. Students may use any available libraries to create socket connections, handle TLS, and parse URLs. However, you may not use any libraries/modules/etc. that implement FTP or FTPS. Obviously, your code is also not allowed to invoke system tools that implement FTP or FTPS like ftp or curl.

For example, if you were to write your FTPS client in Python3, the following modules would all be allowed: socket, ssl, and urllib.parse. However, the following modules would not be allowed: urllib.request, ftplib, and pycurl.

If students have any questions about the legality of any libraries please post them to Piazza. It is much safer to ask ahead of time, rather than turn in code that uses a questionable library and receive points off for the assignment after the fact.

Implementation Details

In this assignment, you will develop an FTPS client. This client must be able to login to a remote FTPS server and perform several operations on the remote server. In this section, we will explain how to connect to an FTPS server, the format of FTPS protocol requests and responses, and the FTPS commands that your client must support.

Modern FTPS implementations use TCP/IP as their transport and network protocols. Thus, to connect to an FTPS Server, your client will need to open a TCP socket. By default, FTPS servers listen on port 21, although users may override the default port by specifying a different one on the command line. Once your client has connected a TCP socket to a remote server, it will begin exchanging text-formatted requests and responses with the FTPS server.

All FTPS requests take the form:

COMMAND <param> <...>\r\n

COMMAND is typically a three or four letter command, in all caps, that tells the FTPS server to perform some action. Depending on what command is sent, additional parameters may also be required. Note that parameters should not be surrounded by < and > symbols; we use those to denote things in messages that are optional. All FTPS requests end with \r\n.

After each request, the FTPS server will reply with at least one response. Some requests will elicit two responses. Additionally, FTPS servers send a welcome message after the TCP connection is opened, before the client sends any requests. All FTPS responses take the form:

CODE <human readable explanation> <param>.\r\n

CODE is a three digit integer that specifies whether the FTPS server was able to complete the request.

  • 1XX codes indicate that more action is expected (e.g., waiting for a file to download or upload);
  • 2XX codes indicate success;
  • 3XX indicates preliminary success, but more action is required (e.g., your client sent a username, but now a valid password is required);
  • 4XX, 5XX, and 6XX codes indicate that an error has occurred.

More details on FTPS response codes can be found here. Some servers include an optional, human-readable explanation in each response that explain what happened or what is expected of the client. These human-readable messages are useful for debugging purposes. Responses may also include a parameter that is necessary for the client to function (most importantly, for the PASV command, see below). All FTPS responses end with \r\n.

Your FTPS client must be able to send the following FTP commands:

  • AUTH TLS\r\n
    Tells the FTPS server to switch to an encrypted connection using TLS. This is the first request that your client must send to the FTPS server after connecting the socket. After sending this command and reading the response, the client program must wrap their socket in TLS.
  • USER <username>\r\n
    Login to the FTPS server as the specified username. If the user does not specify a username on the command line, then your client may assume that the username is "anonymous". This is the second request that your client must send to the FTPS server.
  • PASS <password>\r\n
    Login to the FTPS server using the specified password. If the user specified a password on the command line, then this is the third request that your client must send to the FTPS server. If the user did not specify a password on the command line then your client may skip this request.
  • PBSZ 0\r\n
    Sets the protection buffer of the connection to zero bytes, which disables it. Your client must set PBSZ immediately after sending USER (and possibly PASS).
  • PROT P\r\n
    Sets the protection-level of your connection to Private, which means all subsequent data transfers must occur over TLS encrypted connections. Your client must set PROT immediately after sending PBSZ.
  • TYPE I\r\n
    Set the connection to 8-bit binary data mode (as opposed to 7-bit ASCII or 36-bit EBCDIC). Your client should set the TYPE before attempting to upload or download any data.
  • MODE S\r\n
    Set the connection to stream mode (as opposed to block or compressed). Your client should set MODE before attempting to upload or download any data.
  • STRU F\r\n
    Set the connection to file-oriented mode (as opposed to record- or page-oriented). Your client should set STRU before attempting to upload or download any data.
  • LIST <path-to-directory>\r\n
    List the contents of the given directory on the FTPS server. Equivalent to ls on the Unix command line.
  • DELE <path-to-file>\r\n
    Delete the given file on the FTPS server. Equivalent to rm on the Unix command line.
  • MKD <path-to-directory>\r\n
    Make a directory at the given path on the FTPS server. Equivalent to mkdir on the Unix command line.
  • RMD <path-to-directory>\r\n
    Delete the directory at the given path on the FTPS server. Equivalent to rm -r on the Unix command line.
  • STOR <path-to-file>\r\n
    Upload a new file with the given path and name to the FTPS server.
  • RETR <path-to-file>\r\n
    Download a file with the given path and name from the FTPS server.
  • QUIT\r\n
    Ask the FTPS server to close the connection.
  • PASV\r\n
    Ask the FTPS server to open a data channel.

Control Channel, Data Channel

The FTP protocol is a bit unusual in that it requires not one, but two socket connections. The first socket that your client will open to the FTP server is known as the control channel. The control channel is typically the connection to port 21 on the FTP server. The control channel is for sending FTP requests and receiving FTP responses. However, no data is uploaded or downloaded on the control channel. To download any data (i.e., a file or a directory listing) or upload any data (i.e., a file) your client must ask the server to open a data channel on a second port.

The FTP command to open a data channel is PASV. The client sends PASV to the FTP server, and it responds with a message that looks something like this:

227 Entering Passive Mode (192,168,150,90,195,149).

Code 227 indicates success. The six numbers in parenthesis are the IP address and port that the client should connect a TCP/IP socket to to create the data channel. The first four numbers are the IP address (192.168.150.90 in this example) and the last two numbers are the port. Port numbers are 16-bits, so the two numbers represent the top and bottom 8-bits of the port number, respectively. In this example, the port number is (195 « 8) + 149 = 50069.

The semantics of the data channel are always the same: once the data transfer is complete the data channel must be closed by the sender. What changes is who closes the channel. If the server is sending data (i.e., a downloaded file or a directory listing) then the server will close the data socket once all data has been sent. This is how the client knows that all data has been received. Alternatively, if the client is sending data (i.e., uploading a file), then the client must close the data socket once all data has been sent. This is how the server knows that all data has been received. If the client wishes to upload or download additional data, e.g., perform multiple operations during a single control channel session, then one PASV data channel must be opened per operation.

Note that the control channel (i.e., the first socket) must stay open while the data channel is open. Once the data channel is closed, the client is free to end the FTPS session by sending QUIT to the FTPS server on the control channel and closing the control socket.

Supporting TLS in the Control Channel

When your client first opens the control channel socket to the FTPS server it will not be encrypted. After reading the FTPS server’s welcome message from the control channel, the first command that your client must send to the server is AUTH TLS, as described above. The AUTH command instructs the server to upgrade the connection using TLS. The FTPS server will respond to the AUTH TLS message in clear text (i.e., no TLS encryption), but all subsequent communication on the control channel must be TLS encrypted.

In other words, after receiving the server’s reply to the AUTH TLS message, your control channel socket must be wrapped in TLS. Pretty much all major programming languages support wrapping an unencrypted, vanilla socket in TLS. Typically, the constructor for the TLS socket takes an unencrypted socket as one of its input parameters; this is where the wrapping occurs.

Supporting TLS on Data Channels

Implementing TLS on data channels is very tricky. The necessary sequence of operations must be done in precisely the correct order, or else either the control or data channel will deadlock.

The correct sequence of operations to open a TLS-encrypted data channel is the following:

  1. Send the PASV command on the control channel, read the response, and store the IP address and port given by the server.
  2. Send the next command on the control channel (e.g., LIST, STOR, or RETR) but do not attempt to read a response from the server.
  3. Open a new socket for the data channel and connect it to the IP address and port obtained from step 1.
  4. Read the server’s response to the command sent in step 2 from the control channel. If the response contains an error code, close the data channel and abort.
  5. Wrap the data channel socket in TLS.
  6. Send or receive data on the data channel as necessary.
  7. Close the data channel.
  8. Read the final response from the server on the control channel.

Getting the sequence of reads and writes on the two sockets is critical. The specification for the FTPS protocol, called RFC 4217, has a helpful diagram showing this process in section 12.7.

Language

You can write your code in whatever language you choose, as long as your code compiles and runs on unmodified Khoury College Linux machines on the command line. Do not use libraries that are not installed by default on the Khoury College Linux machines, or that are disallowed for this project. You may use IDEs (e.g., Eclipse) during development, but do not turn in your IDE project without a Makefile. Make sure you code has no dependencies on your IDE.

Suggested Implementation Approach

When starting work on this project, we recommend implementing the required functionality in the following order.

  • Command Line Parsing. Start by writing a program that successfully implements the required command line syntax and can parse the incoming data, e.g., FTPS URLs.
  • Connection Establishment. Add support for connecting and logging-in to an FTPS server. This includes establishing a TCP control channel, correctly sending AUTH, PBSZ, PROT, USER, PASS, TYPE, MODE, STRU, and QUIT commands. Print out responses from the server to confirm that each command is being received and interpreted correctly. You will need to implement TLS support on your control channel in this step.
  • MKD and RMD. Implement support for making and deleting remote directories. These commands are simpler because they do not require a data channel. Verify that your client is working by using a standard FTPS client to double check the results.
  • PASV and LIST. Implement support for creating a data channel, then implement the LIST command to test it. You will need to implement TLS support on your data channel in this step.
  • STORE, RETR, and DELE. Complete your client by adding support for file upload, download, and deletion. Double check your implementation by comparing it to the results from a standard FTPS client.
  • Double check that your client works successfully on a CCIS Linux machine, e.g., login.ccs.neu.edu

Submitting Your Project

To turn-in your project, you should submit the following three things:

  1. Your thoroughly documented source code that implements your FTPS program.
  2. A Makefile that compiles your code.
  3. A plain-text (no Word or PDF) README.md file. In this file, you should briefly describe your high-level approach, any challenges you faced, and an overview of how you tested your code.

Your README.md, Makefile, source code, etc. should all be placed in the root of a compressed archive (e.g., a .zip or .tar.gz) and then uploaded to Gradescope. Alternatively, you can check these items in to Github and then instruct Gradescope to clone your Github repository.

Double Checking Your Submission

To try and make sure that your submission is (1) complete and (2) will work with our grading scripts, we provide a simple script that checks the formatting of your submission. You can download the script here and invoke it using the following command:

$ ./ftp_fmt_chk.py [path to your project directory]

Note that you may need to chmod +x ftp_fmt_chk.py to make the script executable.

This script will attempt to make sure that the correct files (e.g., README.md and Makefile) are available in the given directory, that your Makefile will run without errors (or is empty), and that after running the Makefile a program named 3700ftp exists in the directory. The script will also try to determine if your files use Windows-style line endings (\r\n) as opposed to Unix-style line endings (\n). If your files are Windows-encoded, you should convert them to Unix-encoding using the dos2unix utility before turning in.

Grading

This project is worth 10% of your final grade. You will receive full credit if (1) your code compiles, runs, and produces the expected output, and (2) you have not used any illegal libraries. All student code will be scanned by plagiarism detection software to ensure that students are not copying code from the Internet or each other.

To test your 3700ftp client, we will ask it to perform a number of tasks on the command line, including uploading and downloading files, creating and deleting directories, and deleting files. In other words, we will exercise its expected functionality. If your client does not obey the command line specification then we will not be able to exercise its functionality, and you will lose points. Similarly, points will be lost for each expected feature that is not implemented correctly.