乐趣区

关于后端:谈谈COMP3310

COMP3310 2021 – Assignment 2: An annoying web-proxy
Background:
• This assignment is worth 15% of the final mark
• It is due by 23:59 Sunday 9 May AEST – note: CANBERRA TIME (gmt+10)
• Late submissions will not be accepted, except in special circumstances
o Extensions must be requested well before the due date, via the course convenor,
with appropriate evidence.
This is a coding assignment, to enhance and check your network programming skills. The main focus
is on native socket programming, and your ability to understand and implement the key elements of
an application protocol from its RFC specification.
Assignment 2 outline
A web-proxy is a simple web-client and web-server wrapped in a single application. It receives
requests from one or more clients (web-browsers) for particular content URLs, and forwards them
on to the intended server, then returns the result to your web-browser – in some form. How is this
useful?
• It can cache content, so the second and later clients to make the same request get a more
rapid response, and free up network capacity.
• It can filter content, to ensure that content coming back is‘safe’, e.g. for children or your
home, or for staff/their computers inside an organisation.
• It can filter requests, to ensure that people don’t access things they shouldn’t, for whatever
policy reasons one might have.
• It can listen to requests/responses and learn things, i.e. snoop on the traffic. Getting people
to use your proxy though is a different challenge…
o And of course it can listen to and modify requests/responses, for fun or profit.
For this assignment, you need to write a web proxy in C or Java, without the use of any external
web/http-related libraries (html-parsing support is ok). ENGN students with limited C/Java
backgrounds should talk with their tutors as we have other options there, though the requirements
will be the same and more closely considered. As most networking server code is written in C, with
other languages a distant second, it is worth learning it.
Your code MUST open sockets in the standard socket() API way, as per the tutorial exercises. Your
code MUST make appropriate and correctly-formed HTTP/1.0 (RFC1945) or HTTP/1.1 enhanced
requests (to a web-server, as a client) and responses (to a web-browser, as a server) on its own, and
capture/interpret the results on its own in both directions. You will be handcrafting HTTP packets, so
you’ll need to understand the structures of requests/responses and HTTP headers.
Wireshark will be helpful for debugging purposes, compare it to a direct web-browser transaction.
The most common trap is not getting your line-ending‘\n\n’right on requests, and this is rather OS
and language-specific. Remember to be conservative in what you send and reasonably liberal in what
you accept.
Page 2 of 3
What your successful and highly-rated proxy will need to do:

  1. Act as a proxy against a website we name. You must allow that name to be specified either
    as a command line argument or read from a file.
  2. Rewrite (simple) html links that originally pointed to the website to now point to your
    proxy, so all subsequent requests also go via your proxy.
    a. Sometimes links are not written in pure style, e.g. they are
    calculated within javascript, and we will accept those breaking, after checking.
  3. Modifies the content, by replacing displayed Australian capital city names with random city
    names of your choosing, but be consistent. Be careful not to break the website access (e.g.
    where a page is called Sydney.html, it still has to work, don’t rename that link – only modify
    the displayed text).
    a. You can do more, e.g. rotating/replacing images. More fun but no extra marks here.
  4. Logs (prints to STDOUT):
    a. Timestamp of each request
    b. Each request that comes into your proxy, as received (‘GET / HTTP/1.0’, etc.)
    i. Don’t log the other headers.
    c. Each status response that comes back (200 OK, 404 Not found, etc.)
    i. Don’t log the other headers
    d. A count of the modifications made to the page by your proxy, counting text changes
    and link rewrites separately (i.e. return two labelled numbers)
    We will test this against the Bureau of Meteorology (BoM) website, by opening our web browser or
    telnet, making a top-level (‘/’) page request to your running proxy as if it were a server and we
    should get back the BoM homepage, modified suitably. Any (simple) links we click on that page
    should take us back to your proxy and again through to the BoM site for that next page, and so on.
    We’re not going to go too deep, there are some overly complex pages on the sites, but we will pick
    5-10 pages. There will be only one client at a time running against your client.
    The reason for being flexible about the website to run against is that you can also daisy-chain
    proxies, i.e. to connect one proxies’output to another’s input. This is one way of testing new
    protocol developments before they are accepted as IETF RFC’s, to see that everyone agrees with the
    protocol syntax. You can test this with classmates in tutorials or outside. It’s also used to federate a
    hierarchy of caches, so that the most popular content for a given network radius is more likely to be
    cached closer to the consumers, on potentially smaller caches.
    Submission and Assessment
    You need to submit your source code, and an executable (where appropriate). If it needs instructions
    to run, please provide those in a README file. Your submission must be a zip file, packaging
    everything as needed, and submitted through the appropriate link on wattle.
    There are many existing web-proxying tools and libraries out there, many of them with source. While
    perhaps educational for you, the assessors know they exist and they will be checking your code
    against them, and against other submissions from this class.
    Page 3 of 3
    Your code will be assessed on
  5. Output correctness (the http queries it sends, the modified BoM pages, the log of requests),
  6. Performance (a great proxy should be perfectly transparent, not causing any delays),
  7. Code correctness, clarity, and style, and
  8. Documentation (i.e. comments and any README – how easily can somebody new pick this
    up and modify it).
    Marks are allocated roughly 50% for 1-2 and 50% for 3-4.
    You should be able to test your code against any HTTP-based website you like, although a lot of sites
    use HTTPS now, or have complex html/js pages that can make parsing harder.
退出移动版