The Process of Normalization

Functional Dependencies

If A and B are sets of attributes in a relation (table) then B is functionally dependent on A (A -> B) if for any two tuples (rows) t1 and t2 in the table if t1(A) = t2(A) then t1(B) = t2(B). A is called the determinant. B is fully functionally dependent on A if B is functionally dependent on A but not on any proper subset of A. A functional dependence is partial if there is some attribute that can be removed from A and yet the dependency still holds.

Here are some axioms and rules that govern functional dependency where A, B, and C are subsets of the attributes (columns) of the relation R.

Q. 1: Given the relation R shown below. State whether or not the following functional dependencies are satisfied by the relation.

  1. A -> B
  2. A -> C
  3. AB -> C
  4. C -> A
  5. BC -> A
  6. AC -> B

A B C
1 4 2
3 5 6
3 4 6
7 3 8
9 1 0

Q. 2: Show that the inference rule is if AB -> C and C -> A then C -> B is invalid by giving a counter example of relation R having attributes A, B, C, and D where AB -> C and C -> A but not C -> B.

Q. 3: Given the relation R (A, B, C, D) and the functional dependencies: A -> B and BC -> D, determine which of the dependencies can be derived from the axioms and rules listed above:

  1. AC -> D
  2. B -> D
  3. AD -> B

Normalization

The process of normalization is to produce a set of relations (tables) with the following properties:

Un-Normal Form

An un-normalized table has multiple values for some or all entries. An entry is the intersection of a row and column.

StaffPropertyInspection
propertyNo pAddress iDate iTime comments staffNo sName carReg
PG4 6 Lawrence St, Glasgow 18-Oct-03 10:00 Need to replace crockery SG37 Ann Beech M23 JGR
    22-Apr-04 09:00 In good order SG14 David Ford M53 HDR
    01-Oct-04 12:00 Damp rot in bathroom SG14 David Ford N27 HFR



Project
Proj-ID Proj-Name Proj-Mgr-ID Emp-ID Emp-Name Emp-Dept Emp-Hrly-Rate Total-Hrs
140 Drupal Website 8365 4193 Hawkins IT 80 7
      5726 Roche Tech Support 60 8
      2021 Standfer IT 80 10



First Normal Form (1NF)

To go from an un-normalized table to a table in First Normal Form (1NF) you must have at most a single value at all intersections of the rows and columns. You can do in two ways:

StaffPropertyInspection
propertyNo iDate iTime pAddress comments staffNo sName carReg
PG4 18-Oct-03 10:00 6 Lawrence St, Glasgow Need to replace crockery SG37 Ann Beech M23 JGR
PG4 22-Apr-04 09:00 6 Lawrence St, Glasgow In good order SG14 David Ford M53 HDR
PG4 01-Oct-04 12:00 6 Lawrence St, Glasgow Damp rot in bathroom SG14 David Ford N27 HFR



Project
Proj-ID Proj-Name Proj-Mgr-ID Emp-ID Emp-Name Emp-Dept Emp-Hrly-Rate Total-Hrs
140 Drupal Website 8365 4193 Hawkins IT 80 7
140 Drupal Website 8365 5726 Roche Tech Support 60 8
140 Drupal Website 8365 2021 Standfer IT 80 10



There are several problems relations in 1NF. In the table Project the primary key is composite (Proj-ID, Emp-ID) and these are the anomalies that arise:

  1. Insertion Anomaly: We cannot insert information about any new employee that is going to work for a particular department unless that employee is already assigned to a project.
  2. Deletion Anomaly: If we delete the row that contains employee Roche we also lose information of the Hourly Rate for employees working in the Tech Support department.
  3. Update Anomaly: If an employee moves to another department we have to update every row in the Project table where that employee appears.

Second Normal Form (2NF)

A relation is in Second Normal Form if it is in First Normal Form and no non-prime attribute is partially dependent on any key. For example, in the relation R (A, B, C, D) if AB -> C and AB -> D and A -> D, then this relation is not in 2NF. The solution here is to decompose this relation into two R (A, B, C) and R (A, D).

PropertyInspection
propertyNo iDate iTime comments staffNo sName carReg
PG4 18-Oct-03 10:00 Need to replace crockery SG37 Ann Beech M23 JGR
PG4 22-Apr-04 09:00 In good order SG14 David Ford M53 HDR
PG4 01-Oct-04 12:00 Damp rot in bathroom SG14 David Ford N27 HFR



Property
propertyNo pAddress
PG4 6 Lawrence St, Glasgow


For the Project table we could split that table into three:

These tables are in 2NF but the following problems may arise. To illustrate those problems just consider the Employee table.

Third Normal Form (3NF)

A relation is in Third Normal Form if it is already in 2NF and no nonprime attribute is transitively dependent on the key or no nonprime attribute functionally determines any other nonprime attribute. For example, if in a relation R (A, B, C) we have A -> B, and B -> C, and A -> C then the solution is to split the relation to R (A, B) and R (B, C).

PropertyInspection
propertyNo iDate iTime comments staffNo carReg
PG4 18-Oct-03 10:00 Need to replace crockery SG37 M23 JGR
PG4 22-Apr-04 09:00 In good order SG14 M53 HDR
PG4 01-Oct-04 12:00 Damp rot in bathroom SG14 N27 HFR



Property
propertyNo pAddress
PG4 6 Lawrence St, Glasgow


Staff
staffNo sName
SG14 David Ford
SG37 Ann Beech


The relation Employee is not in 3NF, there is a transitive dependency of a nonprime attribute on the primary key of the relation. The nonprime attribute Emp-Hrly-Rate is transitively dependent on the key through the functional dependency Emp-Dept -> Emp-Hrly-Rate. To transform this relation into a 3NF relation it is necessary to remove any transitive dependency. Two new tables are created:

However, we may still have problems with tables that are in 3NF. Consider the following two tables:

Certification Program
Region Course Section Time Location
East Coast Database Introduction MWF 9:00 - 10:00 AM Boston Learning Center
West Coast Database Intermediate TTH 10:00 - 11:30 AM San Francisco Learning Center
Southwest Database Advanced MWF 1:00 - 2:00 pM Austin Learning Center

The primary key is (Region, Course, Section) for the above table. The table is in 3NF as the two nonprime attributes (Time, Location) determines the other attributes. However, if there is only one instructional facility per city, then we will have Location -> Region. This dependency does not violate the 3NF condition, however if delete the last row of the table we may lose information about the location of that learning center.

Here is another table that is in 3NF.

Manufacturer
ID Name Item-No Quantity
mf-12 Aztec Company KJ97 437
mf-12 Aztec Company HF65 832
mf-45 Capitol Company TY86 574
mf-45 Capitol Company KJ97 291

The Manufacturer relation has two candidate keys (ID, Item-No) and (Name, Item-No) that overlap on the attribute Item-No. The relation Manufacturer has a proble if one of the manufacturers changes its name. The new name has to be updated everywhere it occurs.

Boyce-Codd Normal Form (BCNF)

This is a stricter form of the 3NF condition. A relation is in BCNF if it is already in 3NF and all nonprime attributes must be fully dependent on every key. Or, all prime attributes must be fully dependent on all keys of which they are not part of.

Here is another statement of BCNF: A relation is in BCNF if and only if every determinant is a candidate key.

PropertyInspection
propertyNo iDate iTime comments staffNo
PG4 18-Oct-03 10:00 Need to replace crockery SG37
PG4 22-Apr-04 09:00 In good order SG14
PG4 01-Oct-04 12:00 Damp rot in bathroom SG14



Property
propertyNo pAddress
PG4 6 Lawrence St, Glasgow



Staff
staffNo sName
SG14 David Ford
SG37 Ann Beech



StaffCar
staffNo iDate carReg
SG37 18-Oct-03 M23 JGR
SG14 22-Apr-04 M53 HDR
SG14 01-Oct-04 N27 HFR

To convert the Manufacturer table into BCNF one can decompose that table to two tables as follows

OR

Lossless or Lossy Decomposition

We have to decompose our tables such that when we join the resultant tables we get back the original relation. Here is a situtaion that we have to be careful about

Original Table
X Y Z
x1 y1 z1
x2 y2 z2
x3 y2 z3
x4 y3 z4


>
Decomposed Table 1
X Y
x1 y1
x2 y2
x3 y2
x4 y3


Decomposed Table 2
Y Z
y1 z1
y2 z2
y2 z3
y3 z4


Joined Table
X Y Z
x1 y1 z1
x2 y2 z2
x2 y2 z3
x3 y2 z2
x3 y2 z3
x4 y3 z4