CS2911
Network Protocols

Resources

  • Dr. Yoder's section: Please see the checklist at the end of the lab.
  • Other sections: Please consult with your instructor for submission instructions.
  • CS2911 Coding Standard

Lab 5 is significantly harder than previous labs. Your code and demo are due at the start of the Lab 6 period. Please work in teams of two unless approved by the instructor. Please submit only one report per team.

Introduction

The goal of this lab is to write a Python program to request and save a web resource, acting as an HTTP client. You will write code from scratch, sending and receiving bytes over a TCP connection rather than using a prebuilt HTTP library.

You will start from the template httpclient.py.

The program has at least the following functions; you will add others.

main()

This method is provided in its entirety in the template.

The provided main function will perform basic tests. You may add others. No user input is needed.

  • This method has no arguments
  • This method is invoked with a main() function call at the end of the program.

get_http_resource(url,file_name)

This method is provided in its entirety in the template.

Using HTTP, request a web resource and store the returned data in the specified file.

  • Arguments:
    • url: string containing URL (including the "http://" protocol declaration and the domain name) for desired resource
    • file_name: string containing name of file in which to store response data
  • Return value:
    • None

make_http_request(host, port, resource, file_name)

Using HTTP, request a web resource and store the returned payload data in the specified file.

  • Arguments:
    • host: bytes object with the ASCII the domain name or IP address of the server machine (i.e., host) to connect to
    • port: port number to connect to on host
    • resource: bytes object with the ASCII path/name of resource to get. This is everything in the URL after the domain name, including the first /.
    • file_name: string (str) name of file in which to store the retrieved resource
  • Return value:
    • Integer status code given by server
  • Operation:
    • Send an HTTP request to retrieve the resource at the specified url.
      • The client should recognize and interpret both Content-Length and chunked responses
      • While you need to implement chunking, you do not need to implement the chunk extensions described in the RFC (and the videos)
    • If successful, store the response data in a file with the specified file_name.
    • Return the status code given by the server

It is not necessary to handle or report errors that the server hypothetically could make by not following the protocol. (You may assume the servers we recommend follow the protocol correctly.)

Procedure

  1. Work through the design steps from Lab 3
  2. Download the skeleton Python template: httpclient.py
  3. Edit the header of the file to include your team members' names.
  4. Complete the make_http_request method to request, receive, and store the designated resource. You should add other helper methods, but do not change the code provided in the template. As a team, design the methods and the data passed between them. Then divide your efforts, with each team-member writing at least one method. Document who writes each method with a Sphinx tag starting with :author:

If this base functionality turns out to be too easy, you may experiment with adding additional functions, but be sure the basic requirements are still met.

Divide up the primary responsibility for parts of the program in an equitable way.

Document each method following the coding standard

You may use the next_byte() method from Lab 4. Whether or not you use next_byte(), your program should use recv correctly. It should handle the situation when recv() returns fewer bytes than expected during normal transmission. It is usually best to just use next_byte()

You do not need to implement a persistent connection. The server will send only one response unless you explicitly request a persistent connection. Nevertheless, your code should be extensible to the persistent case. In other words, when you are reading the message, the program should not attempt to read any bytes past the end of the message, to avoid reading bytes from the second response.

Because a student has asked me to state this explicitly: You should not use HTTP libraries when implementing this lab. The purpose of this lab is for your team to write the library from scratch! (International students: This means, like in cooking, to make something from raw ingredients without any pre-made components.)

To test your files, compare the contents of index.html with what you get by right-clicking on the page when visited in your browser and selecting "view page source." Your image is probably correct if it displays correctly in IntelliJ

Have fun! Ask your instructor if you have any questions!

Just for fun

Many servers these days are HTTPS-only. In July of 2018, Google started marking sites as "not secure" which use HTTP instead of HTTPs. In our Fall 2019 offering of Network Protocols, the use of HSTS on most sites prevents Chrome from sending any plain HTTP requests at all. If you have never visited a site, you browser will get a 301 Moved Permanently status code redirecting you to the HTTPS version of the site. After this your browser will remember that the site uses HSTS and refuse to talk to the site in HTTP at all. And for many sites, they are on preloaded list in Chrome that Chrome will NEVER use plain HTTP at all. This is important for security reasons to avoid man-in-the-middle attacks, which we will discuss at the end of the quarter. (Indeed, many sites started transitioning to HTTPS-only in 2016.)

This is very good from a security perpsective, but it means if you want to connect to sites, you need to be able to handle HTTPS! It turns out this isn't hard to do. Here are some tips on how to do it if you would like to try it:

HTTPS is simply HTTP wrapped in a TLS socket, offered on port 443 instead of port 80. To connect to an HTTPS server, first establish an ordinary TCP connection to port 443 as you would always do. Then, wrap your TCP socket in a TLS socket using the following code:

context = ssl.create_default_context()
ssl_socket = context.wrap_socket(sock, server_hostname=SMTP_SERVER)

where sock is your connected TCP client socket. Once this command succeeds, use the ssl_socket throughout your code for all the calls to recv() and sendall().

That's it! Let us know if you hit any roadblocks while trying this!

Submission Instructions for Dr. Yoder

For Dr. Yoder's section, please use the following checklist: