This is an old version of this course, from Fall 2016. A newer version is available here.
Resources
- Lab Checklist — I will provide a paper copy of this.
You can work the entire lab (except the demo) as a prelab. Although this is not required, I recommend completely writing all of your code, integrating it, and starting to work out some of the bugs before lab, so you are able to easily work out the rest of the bugs during the lab period. As previously stated, this lab is one of the hardest labs of the quarter. Please work in teams of two unless approved by the instructor. Please submit only one report per team.
Also note: The Week 6 lab (HTTP server) may be the 2nd hardest lab of the quarter, so getting an early start on that may be helpful, too.
Introduction
The goal of this lab is to write a Python program to request and save a web resource, acting as an HTTP client. You will write code from scratch, sending and receiving bytes over a TCP connection rather than using a prebuilt HTTP library.
You will start from the template httpclient.py.
The program has at least the following functions; you will add others.
main()
This method is provided in its entiretly in the template.
The provided main function will perform basic tests. You may add others. No user input is needed.
- This method has no arguments
- This method is invoked with a main() function call at the end of the program.
get_http_resource(url,file_name)
This method is provided in its entirety in the template.
Using HTTP, request a web resource and store the returned data in the specified file.
- Arguments:
- url: string containing URL (including the "http://" protocol declaration and the domain name) for desired resource
- file_name: string containing name of file in which to store response data
- Return value:
- None
make_http_request(host, port, resource, file_name)
Using HTTP, request a web resource and store the returned payload data in the specified file.
- Arguments:
- host: bytes object with the ASCII the domain name or IP address of the server machine (i.e., host) to connect to
- port: port number to connect to on host
- resource: bytes object with the ASCII path/name of resource to get. This is everything in the URL after the domain name, including the first /.
- file_name: string (str) name of file in which to store the retrieved resource
- Return value:
- Integer status code given by server
- Operation:
- Send an HTTP request to retrieve the resource at the specified url.
- The client should recognize and interpret both Content-Length and chunked responses
- While you need to implement chunking, you do not need to implement the chunk extensions described in the RFC (and the videos)
- If successful, store the response data in a file with the specified file_name.
- Return the status code given by the server
- Send an HTTP request to retrieve the resource at the specified url.
It is not necessary to handle or report errors that the server hypothetically could make by not following the protocol. (Dr. Sebern's website follows the protocol correctly.)
Procedure
- Download the skeleton Python template: httpclient.py
- Edit the header of the file to include your team members' names.
- Complete the
make_http_request
method to request, receive, and store the designated resource. You should add other helper methods, but do not change the code provided in the template. As a team, design the methods and the data passed between them. Then divide your efforts, with each team-member writing at least one method. Document who writes each methodswith a comment line starting withAuthor:
If this base functionality turns out to be too easy, you may experiment with adding additional functions, but be sure the basic requirements are still met.
Divide up the primary responsibility for parts of the program in an equitable way.
Include a multiline #-comment (not docstring) at the start of each method, including a description of the method, a descriptions of the parameters (e.g. what type or value is expected), and a description of the value that is returned.
You should include a comment for every method, but you do not need to document every argument. Instead, document as many arguments as you can document non-trivially. For example, if the only thing you can think of to describe a variable num_lines is "The number of lines" then don't document that variable. But if you think it is useful to say, "num_lines - The number of CRLF-terminated newlines in the entity body" then include the comment. As a ballpark estimate, you should be able to comment at least half of your arguments this way.
As the last line of each method's comment, include the string # Author: Phileas Fogg
, replacing Phileas Fogg with the name of the primary author of the method.
You may use the next_byte() method from Lab3 if you choose. Whether or not you use next_byte(), your program should use recv correctly. It should handle the situtation when recv() returns fewer bytes than expected during normal transmission. It is usually best to just use next_byte()
You do not need to implement a persistent connection. The server will send only one response unless you explicitly request a persistent connection. Nevertheless, your code should be extensible to the persistent case. In other words, when you are reading the message, the program should not attempt to read any bytes past the end of the message, to avoid reading bytes from the second response.
Because a student has asked me to state this explicitly: You should not use HTTP libraries when implementig this lab. The purpose of this lab is for your team to write the library from scratch! (International students: This means, like in cooking, to make something from raw ingredients without any pre-made components.)
To test your files, compare the contents of index.html with what you get by right-clicking on the page seprof.sebern.com and seleting "view page source." Your image is probably correct if it displays correctly in IntelliJ
Have fun! Ask your instructor if you have any questions!