Homework 11

Due 4/20/2012 [start of section]

Problem 1

Sun's network file system (NFS) protocol provides reliability via:

Problem 2

Which is the best network on which to implement a remote-memory read that sends a 100 byte packet from machine A to machine B and then sends a 8000 byte packet from machine B to machine B?
  1. A network with 200 microsecond overhead, 10 Mbyte/s bandwidth, 20 microsecond latency
  2. A network with 20 microsecond overhead, 10 Mbyte/s bandwidth, 200 microsecond latency
  3. A network with 20 microsecond overhead, 1 Mbyte/s bandwidth, 2 microsecond latency
  4. A network with 2 microsecond overhead, 1 Mbyte/s bandwidth, 20 microsecond latency

Problem 3

In class, we discussed the fact that, if messages can be lost, it is impossible to devise an algorithm that guarantees that two nodes can agree to do the same thing at the same time (the two generals problem). However, weaker forms of agreement may be possible.

Suppose two nodes, A and B, communicate via messages and that the probability of receiving any message that is sent is P (0 < P < 1 ). You need not consider any other types of failures.

  1. Is it possible for A and B to agree with certainty to perform some action (but not necessarily perform it at the same time)? If not, explain why not. If so, describe a protocol that provides this guarantee.

  2. Is it possible for both nodes to agree to do the same thing at the same time with >99.99999% certainty (e.g. guarantee that there is less than a 0.0000 1% risk that one or both will fail to make the appointment)? If not, explain why not. If so, describe a protocol that provides this guarantee.

  3. Suppose that in addition to lost messages, either A or B may crash at any time and, once crashed, recover at some arbitrary time in the future. Is it possible for A and B to agree with certainty to perform some action (but not necessarily perform it at the same time)? If not, explain why not. If so, describe a protocol that provides this guarantee

Problem 4

Suppose a server workload consists of network clients sending 128-byte requests to a server which reads a random 50KB chunks from a server's file system and transmits that 50KB to the client. The server's file system is able to cache all metadata, so that each read consists of a single 50KB sequential read from a random location on disk. The server may have multiple disks and multiple network interfaces.

Each disk rotates at 10000 RPM and takes 5 ms on an average random seek. There are on average 300 sectors per track and each sector is 512 bytes (in actuality, the number of sectors per track will vary, but we'll ignore that. We'll also assume that each request is entirely contained in one track and that each starts at a random sector location on the track.)

To access disk, the CPU overhead is 30 microseconds to set up a disk access. The disk DMAs data directly to memory, so there is no CPU per-byte cost for disk accesses.

Each network interface has a bandwidth of 100 Mbits/s (that's Mbits not MBytes!) and there is a 4 millisecond one-way network latency between a client and the server. The network interface is full-duplex: it can send and receive at the same time at full bandwidth. The CPU has an overhead of 100 microseconds to send or receive a network packet. Additionally, there is a CPU overhead of .01 microseconds per byte sent.

  1. How many requests per second can each disk satisfy?
  • How many requests per second can each network interface satisfy?
  • How many requests per second can the CPU satisfy (assuming the system has a sufficient number of disks and network interfaces?)
  • What is the latency from when a client begins to send the request until it receives and processes the last byte of the reply (ignore any queuing delays).

    Problem 5

    Consider a distributed system where there is a file server and a number of client machines. To provide concurrency control, the file system includes a lock manager that issues locks to client machines upon requests. Locks can be either shared or exclusive. Shared locks are useful only for file reads, while exclusive locks are needed for file updates. The file server issues lock to a given client with a timed leases, such that when the lease expires, the lock is revoked and the client machine must re-apply to reacquire the lock. Answer the following questions:
    1. Why are leases useful?

    2. Consider the following scenario in accessing a file F.
      MachineRequest time:Request type: Duration until release
      A00:00Shared05
      B00:05Shared10
      C00:08Exclusive02
      D00:10Shared05
      B00:14Exclusive05
      A00:20Shared05
      Assuming that a lease is given for 10 time units, that clients cache the files for performance, that coherence is maintained by an update protocol, and figures showing the four machines and the file server as blocks (see example below), and identifying at each state transition which client machine holds which lock, and the state of the cache at each client. A state transition occurs when the state of the cache changes at one client, when a request is received, when a lock is acquired or when a lock is released.

      Time: 00:00
      Machine A
      Lock: Shared
      Cache: File F
      Machine B
      Lock: None
      Cache: Empty
      Machine C
      Lock: None
      Cache: Empty
      Machine D
      Lock: None
      Cache: Empty

    3. If an "invalidate'' protocol is used for coherence, would the efficiency of the system increase or decrease? Why?

    4. Same as (b), but assume that machine C fails 1 time unit after it acquires the lock. Show the state transition diagrams as instructed in part (b). State clearly and precisely what precautions should be taken in writing the code that updates the file at machine C.

    Problem 6

    Suppose we run the following program, with the code in the first column running on one machine in a distributed system and the code on the right running on another machine. The distributed system provides a set of shared files with some consistency model. Initially A and B are both 0.
    write(A, 1); // Write the value ``1'' to file A write(B, 1); // Write the value ``1'' to file B
    if(read(B) == 0) // read the value from file B if(read(A) == 0) // read the value from file A
    print ``A wins''; print ``B wins'';

    (a) What are the possible outputs assuming the system enforces {\em linearizability}?

    (b) For the program described in the previous question, what are the possible outputs assuming the system enforces {\em causal consistency}?

    (c) What are the possible outputs for the above program assuming the system enforces {\em sequential consistency}?

    Problem 7

    Suppose a distributed file system implements linearizable consistency using callbacks and does not use leases. Suppose that a client $c1$ that is caching file $F$ becomes disconnected from the network. Which of the following is true

    • (a) Other clients cannot read file $F$ until the client $c1$ reconnects
    • (b) Other clients cannot write file $F$ until the client $c1$ reconnects
    • (c) Other clients can write file $F$, but once $F$ has been written by any client, no client can read $F$ until $c1$ conects.
    • (d) More than one of the above
    • (e) None of the above

    Problem 8

    Suppose programs on three machines are reading and writing files using a file system that enforces causal consistency. Machine 1 runs the following code
    i1 = 0;
    while(true){
      overwriteFile("/foo", i1);
      i1++;
    }
    

    The function overwriteFile() replaces previous contents of the file with the specified value.

    Machine 2 runs the following code

    while(true){
      int i2 = readValueFromFile("/foo");
      overwriteFile("/bar", i2);
    }
    

    Machine 3 runs the following code

      int i2 = readValueFromFile("/bar");
      int i1 = readValueFromFile("/foo");
    

    Suppose that machine 3 reads the value ``10'' on the first read (of i2). Which of the following is true of the value that machine 3 reads on the second read (of i1)? (If multiple items are true, choose the most precise/restrictive of the options. I.e., i1 < 10 is more precise/restrictive than i1 ≤ 10.)

    • (a) i1 < 10
    • (b) i1 ≤ 10
    • (c) i1 = 10
    • (d) i1 ≤ 10
    • (e) i1 > 10
    • (f) None of the above or more than one of the above (i.e., none of the above choices includes all possible values for i1.)