Project 2: Simple Bridge

Implement a simple network bridge

Description

You will implement a simple bridge that is able to establish a spanning tree and forward frames between its various ports. Since running your bridge on Northeastern’s network would likely impact real traffic, we will provide you with a simulation environment that emulates LANs and end hosts. Your bridges must construct a spanning tree (and disable any ports that are not part of the tree), must forward frames between these ports, learn the locations (ports) of hosts, and handle both bridge failures and new bridge arrivals (e.g., by automatically reconfiguring the spanning tree).

There are several educational goals of this project:

  • To give you a sense of how data-link-layer devices work, and how they build a simple yet surprisingly scalable and efficient network using straightforward algorithms
  • To give you experience implementing a distributed system, where none of your code has a complete global view and events happen “asynchronously”.
  • To introduce the select() API for managing multiple sockets in a single program.

Your bridge code will be tested for both correctness and performance. Part of your grade will come from the overhead your network has (i.e., lower overhead will receive a higher score) and the fraction of packets that are successfully delivered between end hosts. Your results will be compared to your classmates' during the grading process.

Requirements

To simplify the project, instead of using real packet formats, we will be sending our data across the wire in JSON (many languages have utilities to encode and decode JSON, and you are strongly encouraged to use these libraries). Your bridge program must meet the following requirements:

  • Form a spanning tree in order to prevent packet loops
  • Handle the failure of bridges and the introduction of new bridges and LANs over time
  • Learn the locations (port numbers) of end hosts
  • Deliver end host packets to the destination
  • Handle the mobility of end hosts between LANs
  • Print out specific debugging messages to STDOUT

You should implement a simplified version of the standard bridge spanning tree protocol that we discussed in class. Note that more sophisticated and properly tuned algorithms (i.e., those which perform better) will be given higher credit. For example, some desired properties include (but are not limited to):

  • Fast convergence: Require little time to form or update a spanning tree.
  • Low overhead: Reduce packet flooding and unnecessary transmissions when possible.

Regardless, correctness matters most; performance is a secondary concern. We will test your code and measure these two performance metrics; better performance will result in higher credit. We will test your code by introducing a variety of different errors and failures; you should handle these errors gracefully, recover, and never crash.

Your program

For this project, you will submit one program: a program 3700bridge that implements a bridge. You may use any language of your choice, and we will give you basic starter code for python. You may not use any bridging libraries in your project; you must implement all of the bridge logic yourself.

If you use C or any other compiled language, your executable should be named 3700bridge. If you use an interpreted language or a virtual machine-based language, you must write a brief Bash shell script, named 3700bridge, that conforms to the input syntax above and then launches your program with whichever incantations are necessary. For example, if you write your solution in Java, your Bash script might resemble

#!/usr/bin/env perl

$args = join(' ', @ARGV);
print `java -jar 3700bridge.jar $args`;

or, if you use python, your program should look like

#!/usr/bin/env python3 -u

import foo from bar
...

and should be marked executable (use chmod u+x 3700bridge on UNIX-like OSes to mark it as such).

Important: You will write one 3700bridge program, but our simulator will run multiple copies of your program, and those copies will have different “views” of the network. They will only interact by exchanging messages via the simulator. Each copy will be connected to different LANs, will see different messages from your other bridges and hosts, and they need to work together to allow messages to be delivered to destinations reliably. For many of you, this will be the first time you’ve written distributed code of this nature; it requires a different way of thinking about your program, and takes time to grasp. Everyone who has worked on this sort code has struggled (your instructors included), so please start early and don’t worry if you initially find the project challenging!

Language

You can write your code in whatever language you choose, as long as your code compiles and runs on unmodified Khoury College Linux machines on the command line. Do not use libraries that are not installed by default on the Khoury College Linux machines, or that are disallowed for this project. You may use IDEs (e.g., Eclipse) during development, but do not turn in your IDE project without a Makefile. Make sure you code has no dependencies on your IDE. If you do not have a Khoury account, you should get one ASAP in order to complete the project.

Starter code

Very basic starter code for the assignment in perl is available on the Khoury GitHub server. Provided is a simple implementation of a bridge that simply connects to the LANs and broadcasts a simple dummy BPDU on all ports and the prints out every other message it receives. You may use this code as a basis for your project if you wish. To get started, you should create a copy of the starter code on your local machine (or on the Khoury Linux machines if you wish):

git clone https://github.ccs.neu.edu/cs3700/bridge-starter-code.git

Requirements

The command line syntax for your bridge is given below. The bridge program takes command line arguments representing (a) the ID of the bridge, and (b) the LANs that it should connect to. The bridge must be connected to at least one LAN. The syntax for launching your bridge is therefore:

./3700bridge <id> <LAN> [<LAN> [<LAN> ...]]

The arguments are:

  • id (required) The id of the bridge. For simplicity, all bridge ids are four-digit hexadecimal numbers (e.g., 0aa1 or f29a).
  • LAN (required) The UDP port numbers of the LAN(s) the bridge should connect to. As described below, the LANs are implemented by our simulator via UDP; the port numbers themselves are not meaningful. At least one LAN will be provided; an additional, unlimited number of LANs may also be provided, indicating the number of LANs this bridge should connect to.

For example, if your program is called with

./3700bridge 8a44 29382 20391

that would mean that your program should use bridge ID 8a44, and that it has two ports. Port 0 on your bridge should use localhost UDP port 29382 to send/receive messages with its LAN. Port 1 on your bridge should use localhost UDP port 20391 to send/receive messages with its LAN. Note that your bridge could have multiple ports connected to the same LAN, meaning one port will hear the other port’s messages.

When starting up, your bridge should print to STDOUT

Bridge starting up

BPDU messages

You should configure your bridges to periodically broadcast BPDUs on all points. Using those BPDUs, you should constantly be listening for new roots, new bridges, etc, and should make decisions about which ports are active and inactive upon receiving each BPDU. To aid in grading and debugging, your bridge program should print out messages about the spanning tree calculation to STDOUT. When your bridge selects a new root, it should print to STDOUT

New root: <root> cost <cost>

where <id> is the ID of the local bridge, <root> is the ID of the new root, and <cost> is the cost of the path to the root. When your bridge changes its root port, it should print to STDOUT

Root port: <port_id>

where <port_id> is the port number (0-indexed). Note that your bridge’s port number is distinct from the UDP port number that the bridge uses to communicate with its LAN. In the case where the bridge has no root port, it should print to STDOUT

Root port: none

Note that the bridge can change its root port without changing the bridge it believes is the root, so it is important to separate these messages. Finally, whenever your bridge decides that a port is the designed port for a LAN, it should print to STDOUT

Designated port: <port_id>

and whenever your bridge decides that a port should be disabled, it should print to STDOUT

Disabled port: <port_id>

As an example, if your bridge were only connected to a single LAN, and no other bridges were present on that LAN, your STDOUT log might look like:

username@host$ ./3700bridge 7a2a ...
Bridge starting up
New root: 7a2a cost 0
Root port: none 

As another example, if there were two bridges, each with one port connected to a single LAN, the logs might look like:

username@host$ ./3700bridge 2f44 ...
Bridge starting up
New root: 2f44 cost 0
Root port: none
username@host$ ./3700bridge 9ea1 ...
Bridge starting up
New root: 9ea1 cost 0
Root port: none
New root: 2f44 cost 1
Root port: 0

Sending BPDU messages

Your bridge should be configured to send BPDU messages at least once every 500ms, even if nothing has changed. Think about why this might be necessary, and how you might use this to infer when certain events have happened. You are free to send BPDUs more frequently (or, even better, in response to certain events) if you wish. You should be sure to “forget” old BPDUs after 1000ms.

Hint: You will find that when bridges are connected/disconnected, the bridges often have to reconfigure themselves. During the time until the bridges are completely reconfigured, you can experience routing loops and routing black holes, which lead to packet duplication and drops. Think about how you might reduce the time it takes to complete reconfiguration.

Data messages

Additionally, your bridge should build up a forwarding table for host/bridge addresses as discussed in class. You should be sure to update your forwarding table as you receive messages (recall that hosts can move), and you should also “timeout” forwarding table entries 5 seconds after receiving the last message from that address. When any of your bridge’s ports changes state (designated, root, etc), you should flush your forwarding table, as this indicates a change to the spanning tree.

When forwarding data packets, your bridge program should print out the following messages to STDOUT. When your bridge receives a data message on an active port (i.e., not disabled), it should print out

Received <source>/<msg_id> on port <port_id> to <dest>

where <id> is the ID of the bridge, <source> is the source of the message, <msg_id> is the message ID, <port_id> is the port number on the bridge that the message was received on, and <dest> is the destination of the message. (Note that your bridge should silently ignore all data messages on disabled ports). Once your bridge makes a forwarding decision about the message, it should print out one of three messages:

Forwarding <source>/<msg_id> to port <port_id>

or

Broadcasting <source>/<msg_id> to all active ports

or

Not forwarding <source>/<msg_id>

Thus, every data message your bridge receives on an active port should have one of the above three lines printed out. This will help you to debug why your bridges are misrouting messages (if this should ever occur).

Note: When your bridge decides to forward or broadcast data messages, it should do so without changing any of the fields in the message (including the source address).

Packet format

In order to simplify the development and debugging of this project, we use JSON (JavaScript Objection Notation) to format all messages sent on the wire. Most common programming languages have built-in support for encode and decoding JSON messages, and you should use these when sending and receiving packets (i.e., you should not attempt to create or parse JSON messages yourself). The format of all packets is

{"source": "<source>", 
 "dest": "<destination>", 
 "msg_id": <id>,
 "type": "<type>", 
 "message": {
    ...
 }
}

where <source> and <dest> are strings that contain either bridge or end host addresses. Recall that all addresses are four-byte hexadecimal numbers (e.g., 98a2), and a special broadcast address ffff indicates the packet should be received by all hosts and bridges on a given LAN. The field type is a string that can either be bpdu for BPDUs or data for end-host data packets. Every packet will have a <msg_id>, which should be unique a non-negative integer for each sending host or bridge (e.g., for each <source>, there should only have be a single message with a given <msg_id>). Finally, <message> should be the JSON-encoded message itself.

You can assume that messages are no larger than 1,500 bytes.

BPDU message format

Every BPDU message should the fields <id>, <root>, <cost>, and <port> (and no other fields). The fields <id> and <root> are strings representing the ID of the bridge sending the BPDU and the ID of the bridge it believes is the root, respectively. The fields <cost> and <port> are non-negative integers representing the cost to the root and the port number on the bridge that is sending the BPDU, respectively. Packets containing BPDUs should be sent to the broadcast address.

For example, a BPDU that you send from bridge 92b4 might look like

{"source": "92b4", 
 "dest": "ffff", 
 "msg_id": 27,
 "type": "bpdu", 
 "message": {
    "id": "92b4",
    "root": "02a1",
    "cost": 3,
    "port": 2
 }
}

This message would imply that bridge 92b4 believes the root is 02a1, that its current cost to the root is 3 hops, and that this BPDU came from port 2 on bridge 92b4.

Data message format

Data messages can have any number of other fields (or other emedded JSON content). You code should gracefully process these messages, not care about the actual content of the messages, and not change the content (though the ordering of the JSON fields can be changed if necessary).

For example, a data message from host 28aa to 97bf might look like

{"source": "28aa", 
 "dest": "97bf", 
 "msg_id": 4,
 "type": "data", 
 "message": {
    "favorite_color": "green",
    "best_teacher": "alden",
    "best_courses": ["CS 3700", "CS 3650"]
 }
}

Connecting to the LANs

The run program that we provide will emulate the LANs, and your 3700bridge code will send and receive packets on our emulated LANs by sending UDP datagrams to provided port numbers. Your code should bind to a localhost UDP socket for each LAN that you are connected to; you do not need to be intimately familiar with how UDP sockets work, but they essentially give you a the ability to send and receive messages. Whenever you send, all other end hosts and bridges on the emulated LAN will receive your message. You should constantly be reading from all your UDP sockets to make sure you receive all messages that come to you (they will be buffered if you don’t read immediately).

Note: Our network simulator will add a 10ms delay on the LANs, meaning after your bridge sends a message to a LAN, you should expect that all other hosts and bridges on the LAN receive the message after 10ms.

Exactly how to connect to local UDP socket depends on your programming language. For example, if you were using python to complete your project, your code to create a UDP socket and bind to a local port would look like

import socket
  
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.bind(('localhost', 0))

Once you have done so, you can then send using the socket s by writing:

s.sendto(<data>, ('localhost', <port>))

where <data> was a byte array you wished to send, and <port> was the port number. You can receive messages by writing:

data, addr = s.recvfrom(1500)

where <data> will be the data that comes in, and <addr> will be a tuple (<host>, <port>) that represents the sender of the UDP packet.

Note that you will need to create a separate socket for each LAN your bridge is connected to, meaning your bridge will typically be managing a number of sockets. Thus, when you call recvfrom() on a socket, it will “block” your bridge until a message arrives; this is not what you want to happen. Instead, we strongly encourage you to write your code in an event-driven style using select() on all of the sockets. This will keep your code single-threaded and will make debugging your code significantly easier. Alternatively, you can implement your bridge in a threaded model (with one thread attached to each LAN), but expect it to be much more more difficult to debug.

In python, you would use select as follows:

socket_array = [ ... ]
readable, _, exceptions = select.select(socket_array, [], socket_array)

for s in readable:
  data, addr = s.recvfrom(1500)
  # process data 

for s in exceptions:
  # process socket that had an error

As above, the <addr> is a tuple representing the address the that the message was received from; you can use this (or the socket s in readable) to figure out which port the message came in on.

Hint: Note that you can optionally provide a timeout to select(), which will likely prove useful at some point…

Testing your code

In order for you to test your code in our network simulator, we have included a script that will create the emulated LANs, run your bridge program and connect it to these LANs, start and stop your bridges in a configurable way, and create and record end host traffic. This script is included in the starter code, and you can run it by executing

./run <config-file>

where <config-file> is the path to the configuration file that describes the network you would like to implement. Note that you should not be parsing the configuration files yourself; the run script does this. Instead, you can create custom configs to test your code under different scenarios.

Config file format

The configuration file that you specify describes the LANs, the bridges, and the connections between these. It also contains information about when bridge come up and down, and the end host traffic that should be generated. The file is formatted in JSON and has the following elements

  • lifetime (required) The number of seconds the simulation should run for.
  • bridges (required) An array of bridge elements (described below). At least one bridge must be specified. Each bridge element is an associative array that has the following properties:
    • id (required) The ID of the bridge, a string. It must be exactly 4 characters long.
    • lans (required) An array of the LAN IDs that the bridge is connected to, of length at least one. All LANs are identified by a non-negative integer.
    • start (optional) The start time (in seconds) when the bridge should be turned on. If not specified, the bridge is started at the beginning of the simulation.
    • stop (optional) The stop time (in seconds) when the bridge should be turned off. If not specified, the bridge is stopped at the end of the simulation.
  • hosts (required) The number of hosts to generate (these are randomly attached to LANs).
  • traffic (required) The number of end host packets to randomly generate (these are sent with randomly selected sources and destinations).
  • wait (optional) The number of seconds to wait before sending any data traffic (default of 2 seconds).
  • seed (optional) The random seed to choose. If not specified, a random value is chosen. Setting this value will allow for a reproducible set of hosts and traffic.

For example, a simple network with two LANs connected by a single bridge, with 10 hosts sending a total of 1,000 packets would be:

{"lifetime": 30,
 "bridges": [
     {"id": "84aa", "lans": [1, 2]}
 ],
 "hosts": 10,
 "traffic": 1000
}

and a more complex network may be

{"lifetime": 30,
 "bridges": [
     {"id": "23a1", "lans": [1, 3]},
     {"id": "91ab", "lans": [2, 3], "stop": 7},
     {"id": "7396", "lans": [1, 2], "start": 5, "stop": 9},
     {"id": "22bb", "lans": [2, 4]},
     {"id": "00a4", "lans": [2, 4, 5, 6]}
 ]
 "hosts": 100,
 "traffic": 10000
}

where this network has six LANs (numbered 1-6), 100 hosts sending 10,000 packets, and five bridges (23a1, etc). Note that bridge 91ab dies at the 7-second mark, and bridge 7396 turns on at the 5-second mark and dies at the 9-second mark. Of course, the run script will handle all of this for you (launching the bridges at the approrpiate time, simulating the host packets, etc). You only need to implement a correct 3700bridge program.

Simulator output

The output of the run script includes timestamps and all logging information from your bridges and the emulated end hosts. Note that all data traffic will be delayed for 2 seconds at the beginning of the simulation to allow your bridges to form an initial spanning tree. At the end, the output also includes some statistics about the your bridges' performance:

username@host$ ./run config.json
[  0.0000  Bridge 92ba] Bridge starting up
...
[ 14.9990    Host 7763] Sent message 7763/4 to 41c1
[ 15.0001  Bridge 92ba] Received message 7763/4 on port 0 to 41c1
Simulation finished.

CORRECTNESS:
Packets sent: 50
Packets not delivered: 1 (2.00%, msg_ids ['00a8/1'])
Percentage packets delivered: 98.0000%

PERFORMANCE:
Messages sent on wire: 4117 (308 BPDUs, 3809 data)
Packets delivered multiple times: 680 (10.00%, msg_ids 29dc/1 [x3], e6a3/0 [x255], e06d/1 [x3], 00a8/0 [x7], a3c7/3 [x412])
Effective goodput: 1.1902%

each of the fields is self-explanatory. Ideally, you would like all messages to be delivered (a delivery percentage of 100%) and the number of packets dropped and delivered multiple time to be 0 (a message can be dropped or be delivered multiple times if it the network is being re-configured while it is being trnansmitted). Additionally, you want the number of total messages sent on the wire to be low as well (this includes every broadcast as well as BPDUs, which are overhead). All of these are captured in the “goodput” metric, which is simply the number of packets delivered divided by the total number of messages sent.

Testing script

Additionally, we have included a basic test script that runs your code under a variety of different network configurations and also checks your code’s compatibility with the grading script. If your code fails in the test script we provide, you can be assured that it will fare poorly when run under the grading script. To run the test script, simply type

username@host:$ ./test 
Starter (single bridge) tests (delivery = 100%)
One bridge, one LAN (simple-1.conf)                         [PASS]
One bridge, two LANs (simple-2.conf)                        [PASS]

Basic (no failures, no new bridges) tests (delivery = 100%, goodput >= 5%)
One bridge, three LANs (simple-3.conf)                      [PASS]
Two bridges, two LANs (simple-5.conf)                       [FAIL]
  goodput was .3934%, required 5.0000%
...

This will run your code on a number of configurations, and will output whether your program performs sufficiently. If your program fails to meet the required performance, it will print out why. If you wish to run one of the tests manually, you can do so with

username@host:$ ./run configs/basic-4.conf 

Performance testing

As mentioned in class, 10% of your grade on this project will come from performance. Your project will be graded against the submissions of your peers, and more points will be awarded in your final grade for submissions that perform better. To get an idea of how your submission performs, you can manually run some of the advanced- configurations, which have bridges starting and stopping during the simulation.

Grading

The grading in this project will be broken down as follows:

Item Percentage of Grade
Program correctness 65%
Performance 10%
Style and documentation 15%
Milestone 1 functionality 2%
Milestone 2 functionality 8%

Submitting your project

You will submit your project on the Khoury Linux machines. You will first need to register a team before you can submit for either the milestone or the final project.

Registering your team

You and your partner should first register as a team by running the

/course/cs3700sp22/bin/register

script. You should pick out a team name (no spaces or non-alphanumeric characters, please) and run

/course/cs3700sp22/bin/register project2 <teamname>

where <teamname> is your chosen team name. You and your partner both must run this command, with the same team name, reporting to the course staff who you are working with. This command will either report back success or will give you an error message. If you have trouble registering, please contact the course staff.

Milestone 1

In order to ensure that you are making sufficient progress, you will have an interim milestone deadlines. For the first milestone, your code must pass the simple-2.conf test, which simply requires that it be able to forward messages from one LAN to another. You can pass this milestone by simply rebroadcasting every message you receive on one port on all other ports. For the milestone, you do not need to worry about documenting your code or about the performance tests; we will only grading to make sure that you deliver 100% of packets in the simple-2.conf test.

You should submit your milestone by running the turnin command. Specifically, you should create a directory, and place all of your code in it. Then, run

/course/cs3700sp22/bin/turnin project2-milestone1 <dir>

Where <dir> is the name of the directory with your submission. The script will print out every file that you are submitting; make sure that it prints out all of the files you wish to submit! You should receive an email confirmation of your submission. Only one of you or your partner needs to run this command; you are submitting on behalf of your entire team.

Milestone 2

Note that the first milestone is much easier than the second. For the second milestone, your code must pass the Basic (no failures, no new bridges) tests in the testing script. In other words, your code must form a spanning tree and correctly deliver all packets when the set of bridges is static and does not change over the course of the simulation. For the milestone, you do not need to worry about documenting your code or about the performance tests; we will only be grading to make sure that you deliver 100% of packets for all basic tests.

You should submit your milestone by running the turnin command. Specifically, you should create a directory, and place all of your code in it. Then, run

/course/cs3700sp22/bin/turnin project2-milestone2 <dir>

Where <dir> is the name of the directory with your submission. The script will print out every file that you are submitting; make sure that it prints out all of the files you wish to submit! You should receive an email confirmation of your submission. Only one of you or your partner needs to run this command; you are submitting on behalf of your entire team.

Final submission

For the final submission, you should submit your (thoroughly documented) code along with a plain-text (no Word or PDF) README file. In this file, you should describe your high-level approach, the challenges you faced, a list of properties/features of your design that you think is good, and an overview of how you tested your code.

You should submit your project by running the turnin script. Specifically, you should create a directory, and place all of your code and README files in it. Then, run

/course/cs3700sp22/bin/turnin project2 <dir>

Where <dir> is the name of the directory with your submission (Note: You should use project2 and not project2-milestone for the final submission). Again, the script will print out every file that you are submitting, make sure that it prints out all of the files you wish to submit! You should receive an email confirmation of your submission. Only one of you or your partner needs to run this command; you are submitting on behalf of your entire team.

Academic Integrity

All code that you submit must be the original work of you and your teammates. Copying code from the internet or other individuals is a violation of academic integrity.

All student code will be scanned by plagiarism detection software to ensure that students are not copying code from the Internet or each other. We have a database of all previous submissions to this project in prior semesters, and this database (and other files) will be used as a basis to detect copying of code. Any violations of academic integrity will be handled according to the procedures outlined in the course syllabus, and you will receive a 0 on the project.

Suggested Implementation Approach

When starting work on this project, we recommend implementing the required functionality in the following order.

  • Data structures Remember that your goal is to write a single bridge; our simulator will run multiple copies of it and allow them to communicate via messages. Thus, consider what data structures your bridge will need to keep track of, and what possible states each of those data structures can be in. In essense, make a data definition for the state of a bridge.

  • State changes With your data definition in hand, think about when those states will change. What will cause them to change? What logic will be executed to determine exactly how they will change? Be sure to think through different kinds of events that can happen (both receiving messages and otherwise…).

  • Tests We have provided you with a few tests, but they are far from exhaustive. Create a few additional test configuration files to test the various state changes you identified above. For each one, note what you expect your bridge code to do.

  • Simple implementation For the second milestone, you can make assumptions that make certain events impossible. Focus on the events that can happen during the milestone, and implement those portions of your bridge code. Once you’re successful, your code should pass the Basic tests.

  • Final implementation Implement handling the remaining events that can happen. Think of particularly evil configurations that could exist, and make sure that your bridge properly handles them. Consider the performance of your bridge only once your code passes all of the tests; think about places where you could be smarter about handling certain events and make your code more efficient.

Other Advice

A few pointers that you may find useful while working on this project:

  • Start by getting your code working on simple network configurations. Slowly introduce more complex networks. Get in the habit of using the random seed so that you can reproduce errors when they occur (to aid debugging).
  • Try to reduce the “state” of each bridge to the minimum required to implement the bridge correct. Avoid duplicating state wherever possible; you are not CPU-bound, so recalculate information as needed from the raw state (rather than caching it).
  • Check the Piazza forum for question and clarifications. You should post project-specific questions there first, before emailing the course staff.
  • Finally, get started early and come to the instructor’s office hours and TA lab hours. You are welcome to come to the lab and work, and ask the TA any questions you may have.