Yeah, its true to say, “If you don’t do normalization on your database, you’re gonna have a bad time”. Let’s take an example to analyze whole normalization concept.

Assume, a video library maintains a database of movies rented out. Without any normalization, all information is stored in one table as shown below.

Let’s see what the problems we will face while performing any operation on these data.

  • Suppose we have to update MOVIES RENTED column for “Robert Phil” then we have to update the same in two different rows or the data will become inconsistent. This is known as update anamoly.
  • Suppose we need to add a record for new member but he has not rented any movie yet. Then we would not be able to insert the data into the table if MOVIES RENTED column doesn’t allow null value. This is known as insert anamoly.
  • Suppose we have to create a large table with huge data in row and column, then there will be high chance of repeated data in some part of the table.

Thus, to overcome above problems, we need a technique that reduces data redundancy, eliminates such characteristics like update and insert anomalies, divides larger tables into smaller tables and links them using relationships. This technique is known as normalization.

First Normal Form (1NF)

  • Each table column should contain a single value.
  • Each record needs to be unique.

The decomposition of above table in 1NF —

Second Normal Form (2NF)

  • The table must be in 1NF
  • Single Column Primary Key. In other words, all non-key attributes are fully functional dependent on the primary key.

The decomposition of above table in 2NF —

We have divided our 1NF table into two tables viz. Table 3 and Table 4. Table 3 contains member information. Table 4 contains information on movies rented.

We have introduced a new column called Membership_id which is the primary key for table 3. Records can be uniquely identified in Table 3 using membership id.

Third Normal Form (3NF)

  • The table must be in 2NF
  • The table should not contain any transitive functional dependency.
  • It is used to reduce the data duplication. It is also used to achieve the data integrity.

A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function dependency X → Y, where X is a super key, and Y is a prime attribute, i.e., each element of Y is part of some candidate key.

To move our 2NF table into 3NF, we again need to again divide our table.

We have again divided our tables and created a new table which stores Salutations. There are no transitive functional dependencies, and hence our table is in 3NF. In Table 7 Salutation ID is primary key, and in Table 1 Salutation ID is foreign to primary key in Table 7.

Boyce-Codd Normal Form (BCNF)

  • BCNF is the advance version of 3NF. It is stricter than 3NF.
  • A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
  • For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Even when a database is in 3rd Normal Form, still there would be anomalies resulted if it has more than one candidate Key.