Database normalization


In previous chapter, “Types of keys“, we had a good discussion on keys and their types in SQL Server. In this chapter “Database normalization”, we are going to discuss database normalization process and normal forms. Lets start our discussion with definition of Normalization.

Normalization

Normalization is a process of minimizing redundant data from database by decomposing the columns of a table into multiple tables. This process increases the number of tables in the database but it minimizes the redundant (duplicate), inaccurate and inconsistent data in database. Decomposed tables are connected using relationships (primary keys and foreign keys) to reduce the insert, update and delete anomalies. During normalization process, database designer decides the actual structure of the tables and their relationships. It is easy to find and fix any design problem at this early stage. During normalization process, we have various normal forms with some conditions. A table must satisfy all conditions of that normal form to qualify be in that normal form. Lets discuss what is a normal form in next sub section.

Normal Form

Normal form defines a set of standard which must be followed for a good database design. During normalization process database designer converts the database tables in normal forms and check its functional dependency. Each normal form has some predefined standards which must be followed to qualify that normal form. To qualify each normal form, a table must qualify the previous normal form conditions first i.e. a table in 3NF must satisfy all conditions of 2NF which in turn must satisfy all conditions of 1NF.

Lets try to understand the types of normal form with an example of employee table:

First Normal Form – 1NF

A table can be in First Normal Form (1NF), when it satisfies below conditions:

  1. All columns of a table must have a single atomic value in each row of a table i.e. a cell (intersection of rows and columns) in a table must contain a single atomic value.
  2. Each row of a table should have an unique identifier to uniquely identify the rows of the table i.e. no two rows of a table can be identical in 1NF. A primary key or composite key can be used to uniquely identify records of the table.

To understand the concept of First Normal Form (1NF), consider this employee table for an example:

EmpIdEmpNameProjectIdProjectNameZipCity
100Rohit300,301ERP, Finance Management123456Kolkata
101Ramesh300,302ERP, KPO Management654321New Delhi

To bring the employee Table in First Normal Form; 1) Organize the ProjectId and ProjectName attribute values in single atomic values and then 2) EmpId and ProjectId attributes can uniquely identify each row of the employee table.

Have a look on employee table after bringing it in First Normal Form:

EmpIdProjectIdEmpNameProjectNameZipCity
100300RohitERP123456Kolkata
100301RohitFinance Management123456Kolkata
101300RameshERP654321New Delhi
101302RameshKPO Management654321New Delhi

Note: EmpId and ProjectId uniquely identifies each row of employee table.

Second Normal Form – 2NF

A table can be in Second Normal Form (2NF), if satisfies below conditions:

  1. The table must satisfy all the conditions of 1NF.
  2. All non-key attributes (columns) are dependent on key columns i.e. Each column of the table must be fully functionally dependent on key column (or set of columns). Partial dependencies must be removed from the table in case table has a composite primary key.

Note: A table with a single column primary key is automatically in 2NF and does not need to be tested for partial dependency.

In continuation of employee table as an example, to bring this table in 2NF, we have to find and remove any partial dependency of non key columns (EmpName, ProjectName, Zip and City) on key columns (EmpId and ProjectId). We can see that column ProjectName functionally depends on ProjectId but not on EmpId attribute which shows a partial dependency in the table. To bring this table in Second Normal Form, we need to break this table in two tables, employee table and project table as below:

Employee Table in 2NF:

EmpIdEmpNameZipCity
100Rohit123456Kolkata
101Ramesh654321New Delhi

Project Table in 2NF:

EmpIdProjectIdProjectName
100300ERP
100301Finance Management
101300ERP
101300KPO Management

Third Normal Form – 3NF

A table must satisfy below conditions to be in Third Normal Form (3NF):

  1. The table must satisfy all conditions of 2NF.
  2. Transitive dependency of non-key attributes on key column must be removed i.e. if column A depends on column B and column B depends on column C, column A is transitively depends on column C. Any non-key column must not dependent on another non-key column.

To bring employee table in Third Normal Form (3NF), we have to find and remove any transitive dependency from this table. We can see that attribute City depends on Zip, and attribute Zip depends on EmpId which shows a transitive dependency of city attribute on EmpId. To bring this table in 3NF, split the employee table as below:

Employee table in 3NF:

EmpIdEmpNameZip
100Rohit123456
101Ramesh654321

Zip – City table:

ZipCity
123456Kolkata
654321New Delhi

Boyce-Codd Normal Form

Boyce-Codd Normal Form (BCNF) is an extension of 3NF with strict condition. A table must satisfy below conditions to be in BCNF:

  1. Table must be in 3NF.
  2. For any functional dependency X -> Y, X should be a super key.

In employee table, in dependency EmpId -> EmpName, Zip, EmpId is the super key and in Zip – City table, in dependency Zip -> City, Zip is the super key for city.

We might need to test the above BCNF functional dependency in tables which have overlapping composite candidate keys.

Normal forms at a glance

Its time to summarize our reading. We have below image to summarize the reading on normal forms:

Database normalization

Database normalization

Note on normalization

Normalization helps in managing a database efficiently by reducing inconsistent, inaccurate and redundant data from database but over normalization must be avoided during this practice. Over normalization might be occurred as a result of decomposing a table in many small tables in such a way so that a reference table need to refer another table to get a single unit of information. Also under normalization leads to inconsistent, inaccurate and redundant data in database. We must balance database normalization between over normalization and under normalization to get the benefit of fast query processing and efficiency of database management.

 

Till now, we have discussed below database concepts in this tutorial:

  1. DBMS, RDBMS and SQL Server
  2. Table Relationships
  3. Data Integrity
  4. Types of keys
  5. Database normalization

Now we are good to start with T-SQL and its concepts from next chapter.

Rate This
[Total: 2    Average: 5/5]


Gopal Krishna Ranjan

About Gopal Krishna Ranjan

I am Gopal Krishna Ranjan, having 8 years of industry experience in Software development. I have a head down experience in Database, Data Warehouse, Big Data and cloud technologies and have implemented end to end Database, Data Warehouse,  Big Data and Cloud Solutions.
I have extensively worked on SQL Server, Python, Hadoop, Hive, Spark, Azure, Machine Learning, and MSBI (SSAS, SSIS, and SSRS). I also have good experience in windows and web application development using ASP.Net and C#.

Leave a comment

Your email address will not be published. Required fields are marked *