Normalozation in SQL
Normalization in DataBase
Normalization is a process or a technique which is used to reduced or decompose the redundancy(repetition) within the database. In common language you can say that normalization is used to remove repetition of data or record within database.
Normal Forms in a database or the concept of Normalization make a Relation or Table free from insert/update/delete Anomalies and saves space by removing duplicate data.
According to E. F. Codd, the objectives of normalization were stated as follows:
- To free the collection of relations from undesirable insertion, update and deletion dependencies.
- To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life span of application programs.
- To make the relational model more informative to users.
- To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.
Problems Without Normalization
If your table is not properly normalized and have data redundancy then it will take extra memory space and difficult to handle and update the database.
Roll_no | Name | Branch | Head of Department |
---|---|---|---|
100 | Faiz | CSE | Mr. Sanjay Jain |
101 | Rama | CSE | Mr. Sanjay Jain |
102 | Hitesh | CSE | Mr. Sanjay Jain |
103 | Sultan | CSE | Mr. Sanjay Jain |
104 | Gaurav | CSE | Mr. Sanjay Jain |
105 | Manoj | CSE | Mr. Sanjay Jain |
106 | Mahendra | CSE | Mr. Sanjay Jain |
In the table above, we have data of 6 Computer Science students. As we can see, data for the fields branch and Head of Department is repeated for the students who are in the same branch in the college, this is Data Redundancy.
If we have to insert data of 100 students of the same branch, then the branch information will be repeated for all those 100 students. This is called Insertion Anomaly
If Head of Department Mr. Sanjay Jain leaves college, or is it no longer the HOD of the computer science department. In that case, all the student records will have to be updated, and if by mistake we miss any record, it will lead to data inconsistency. This is called Updation anomaly.
In above Student table, two different pieces of information are kept together, Student information and Branch information. Hence, at the end of the academic year, if student records are deleted, we will also lose the branch information. This is called Deletion Anomaly.
We have Totly Six Types of Normal Forms
- 1 st NF (1st normal form)
- 2 nd NF (2nd normal form)
- 3 rd NF (3rd normal form)
- BCNF (BOYCE CODD NF)
- 4 th NF
- 5 th NF
Dependency
It can be defined as same values in the table are depending on a specific column. There are three type of dependency.
- Full function dependency
- Partial function dependency
- Transitive function dependency
Full function dependency
In this all the non-key attributes fully or functionally depending on only one key attribute column. and the record in the table which is uniquely identify by using key attributed column.
Partial function dependency
In a variable we may have more than one attribute column and some of the column are depending one key attribute column and some other non-key attributes are depending on other key attribute so these table can not follow full function dependency. we can avoid this problem by using 2nd normal form.
Transitive function dependency
Here one non-key attribute function is depending on another non-key attribute table should not maintain transitive function dependency so avoid this by using 3rd normal form.
1st Normal form
In this normal form user need to remove the multi-value attribute on the table. For a table to be in the First Normal Form, it should follow the following 4 rules:
- It should only have single(atomic) valued attributes/columns
- Values stored in a column should be of the same domain
- All the columns in a table should have unique names
- And the order in which data is stored, does not matter
There are no duplicate rows and each row should have a unique identifier (or Primary key). A table should be free from repeating groups.
The values in each column of a table are atomic. Meaning a field value cannot be decomposed into smaller pieces or should not be divided into parts with more than one kind of data in it.Like: A Person's Name column could be further divided into First, Middle, Last Name columns.
2nd Normal form
In 2nd Normal form maintain the 1st Normal form and remove the partial function dependency. For a table to be in the Second Normal Form,
- It should be in the First Normal form
- And, it should not have Partial Dependency
Any Candidate key (K) and any Attribute (A) that is not a constituent of a candidate key, A depends upon whole of K rather than just part of it This means all its non-prime attributes are functionally dependent on the whole of a candidate key.
In Simple terms, any non-key columns must be dependent on the entire primary key. In the case of a composite primary key, this means that a non-key column cannot depend on only part of the composite key.
3rd Normal form
In 3rd Normal form maintain the 1st and 2nd normal form and remove the transitive function dependency. A table is said to be in the Third Normal Form when,
- It is in the Second Normal form
- It doesn't have Transitive Dependency
Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every candidate/primary key of R.
All columns should depend directly on the primary key. Tables violate the Third Normal Form when one column depends on another column, which in turn depends on the primary key (a transitive dependency).
BCNF
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF.