# Homework 9

Due 3/30/2012 [start of section]

## Problem 1

Suppose I have 10 disks with an advertised failure rate of 1% per 10K hours, and that I arrange these disks into two groups of 5 disks per group. In each group, I use a 4 data + 1 parity RAID arrangement. Assuming independent failures and a mean time to repair of 1 hour, what is my expected mean time to data loss?

## Problem 2

Is it possible to have a MTTR of 1 hour for the 320GB SATA drive described here?

## Problem 3

Consider a highly-simplified version of the log-structured file system (LFS). You are supposed to write some code that models the cleaner. Specifically, go through each update in a segment and figure out whether the update has a *live inode* n it. If you find a live inode, print "LIVE", otherwise print "DEAD."

  // The inode map: records (tnumber) -> (disk address) mapping.
// Assume disk address stored here is in bytes.
unsigned int imap[MAX_INODES];

typedef struct __inode_t {
int direct[10]; // just 10 direct pointers
} inode_t;

typedef struct __update_t {
int inumber; // inode number of the inode in this update (in bytes)
inode_t inode; // the inode
int offset; // offset of data block in file, from 0 ... 9
char data[4096]; // the data block
} update_t;

typedef struct __segment_t {
// (assume all MAX_UPDATES are used)
} segment_t;

segment_t *segment; // start with this pointer to the segment in question


Write code to process a segment. Assume you are given a pointer to a segment_t (called 'segment'). Then go through each update in the segment, figure out whether the inode referred to in that update is live or not, printing "LIVE" or "DEAD" as you go.

## Problem 4

Some RAID code has been lost. You have to write it!

Assume you have a RAID-4 (parity-based RAID + a single parity disk), with a 4KB chunk size, and 5 disks total as follows:

  DISK-0    DISK-1    DISK-2    DISK-3    DISK-4

block0    block1    block2    block3    parity(0..3)
block4    block5    block6    block7    parity(4..7)
...       ...       ...       ...       ...

• (a) Fill in the routine SMALLWRITE() below (you may assume that a crash does not occur while SMALLWRITE() is in progress):
•   // SMALLWRITE()
//
// This routine takes a logical block number 'block' and writes
// the single block of 4KB referred to by 'data' to it.
//
// It may have to use these existing underlying primitives:
//   READ(int disk, int offset, char *data);
//   WRITE(int disk, int offset, char *data);
//   XOR(char *source1, char *source2, char *dest);

void SMALLWRITE(int block, char *data){

}

• (b) In the above, you were allowed to assume that a disk crash does not occur during SMALLWRITE. Update your code to work without this assumption.

## Problem 5

• (a) Assuming disk failures are uncorrelated and that the MTTF of a single disk is 1.3M hours, what is the MTTDL for a system with 100 disks arranged into 20 groups of 5 disks each, allowing 4 blocks and one parity block to be stored across a group of 5 disks. Assume a MTTR of 1 day.
• (b) In practice, the value you just calculated is likely to overestimate the actual MTTDL for several reasons. Crisply describe two such reasons.
• (c) Suppose the RAID system discussed in the previous questions includes several {\em hot spares}---extra disks that can quickly and automatically replace failed disks. Estimate the {\em best} $MTTR_{disk}$ that can be promised for a Seagate ST31000520AS 1 TB disk drive. (You will need to find this drive's data sheet to answer this question.)
• (d) Suppose instead of {\em singly-redundant} RAID, you implement {\em doubly-redundant} RAID. E.g., instead of 1 parity block per group of {\em G} disks allowing any block to read even if one disk per group fails, you use 2 redundant blocks per group with an encoding that allows any block to be read even if 2 disks from a group fail. (Note that one needs an encoding a bit more sophisticated than parity, but you don't need to worry about the details here---assume such an encoding is given.)

Derive and write the equation for the mean time to data loss {\em MTTDL_system_doubly_redundant} for a doubly-redundant system with {\em N} total disks arranged in groups of {\em G} disks where any block can be read from a group as long as no more than 2 disks per group have failed. (It is OK to assume that {\em N} is evenly divisible by {\em G} for simplicity.)

MTTDL_system_doubly_redundant =