How to identify unused indexes in SQL Server

We can use dynamic management view “sys.dm_db_index_usage_stats” to identify unused indexes on SQL Server.

DMV – sys.dm_db_index_usage_stats

This DMV shows you how many times the index was used for user queries.

SELECT OBJECT_NAME(S.[OBJECT_ID]) AS [OBJECT NAME],

       I.[NAME] AS [INDEX NAME],

       USER_SEEKS,

       USER_SCANS,

       USER_LOOKUPS,

       USER_UPDATES

FROM   SYS.DM_DB_INDEX_USAGE_STATS AS S

       INNER JOIN SYS.INDEXES AS I

ON I.[OBJECT_ID] = S.[OBJECT_ID] AND I.INDEX_ID = S.INDEX_ID

WHERE  OBJECTPROPERTY(S.[OBJECT_ID],‘IsUserTable’) = 1

       AND S.database_id = DB_ID()

Here we can see seeks, scans, lookups and updates.

  • The seeks refer to how many times an index seek occurred for that index.  A seek is the fastest way to access the data, so this is good.
  • The scans refers to how many times an index scan occurred for that index.  A scan is when multiple rows of data had to be searched to find the data.  Scans are something you want to try to avoid.
  • The lookups refer to how many times the query required data to be pulled from the clustered index or the heap(does not have a clustered index).  Lookups are also something you want to try to avoid.
  • The updates refers to how many times the index was updated due to data changes which should correspond to the first query above.

image

Identifying Unused Indexes

So based on the output above you should focus on the output from the second query.  If you see indexes where there are no seeks, scans or lookups, but there are updates this means that SQL Server has not used the index to satisfy a query but still needs to maintain the index.  Remember that the data from these DMVs is reset when SQL Server is restarted, so make sure you have collected data for a long enough period of time to determine which indexes may be good candidates to be dropped.

Advertisements
Posted in Advanced SQL | Tagged , , | Leave a comment

How to Enable multiple SQL agent Jobs using single SQL Query

Here is the SQL query to Enable multiple SQL Agents Jobs.

exec sp_configure ‘allow updates’,1

go

reconfigure with override

go

update msdb.dbo.sysjobs

set [enabled] = 1

where name like ‘Ownership Standardization%’

go

exec sp_configure ‘allow updates’,0

go

reconfigure with override

go

To disable multiple jobs just replace “set enabled = 1” line with “set enabled = 0” in query.

This query can be very useful in case you need to enable or disable multiple jobs having common string in their names.

Posted in Advanced SQL | Tagged , , , | Leave a comment

How to Create Table and define data integrity

In this post I will show you how you can create table and define data integrity on the table’s data.

To explain you with example I create a new database on my SQL Server instance with the name “sampleDB”. Here is the code to create this database.

 

Use master;

IF DB_ID(‘ sampleDB’) IS NULL

  CREATE DATABASE sampleDB;

 

 

Here “Use master” command brings you under the context of master database. I use DB_ID function to check if “sampleDB” database already exists or not and if it not exists, only then I am creating smapleDB database.

 

Creating Tables

The following code creates a table called Employee in the sampleDB database:

USE sampleDB;

 

IF OBJECT_ID(‘dbo.Employee’, ‘U’) IS NOT NULL

  DROP TABLE dbo.Employee;

 

CREATE TABLE dbo.Employee

(

  empid     INT         NOT NULL,

  firstname VARCHAR(30) NOT NULL,

  lastname  VARCHAR(30) NOT NULL,

  hiredate  DATE        NOT NULL,

  mgrid     INT         NULL,

  ssn       VARCHAR(20) NOT NULL,

  salary    MONEY       NOT NULL

);

 

Here I am changing my database context to sampleDB by using “USE sampled” command. Then using OBJECT_ID function I am checking if dbo.Employee table already exists or not (OBJECT_ID function returns NULL if object not exists) and if this table already exists in sampleDB database then I am dropping that table using “DROP TABLE” command.

After that I am creating the table using “CREATE TABLE” command. Here you specify the name of the table and, in parentheses, the definition of its attributes (columns). For each column you specify the column name, datatype, and NULLability.

In our Employee table the attributes empid (employee ID) and mgrid (manager ID) are defined as INT (four byte integer); firstname, lastname, and ssn (social security number) are defined as VARCHAR (variable length character string with the specified max supported number of characters); hiredate is defined as DATE and salary is defined as MONEY.

In our Employee table all columns are defined as NOT NULL except for the mgrid column. A NULL in the mgrid attribute would represent the fact that the employee has no manager, as in the case of the CEO of the organization.

 

Defining Data Integrity

One of the great benefits in the relational model is that data integrity is an integral part of it. Data integrity that is enforced as part of the model—namely, as part of the table definitions—is considered declarative data integrity. Data integrity that is enforced with code—such as with stored procedures or triggers—is considered procedural data integrity.

Now I will describe examples for declarative constraints, including primary key, unique, foreign key, check, and default constraints. You can define such constraints when creating a table as part of the CREATE TABLE statement, or after the table was already created using an ALTER TABLE statement.

 

Primary Key Constraints

A primary key constraint enforces uniqueness of rows and also disallows NULLs in the constraint attributes. An attempt to define a primary key constraint on a column that allows NULLs will be rejected by the RDBMS. Each table can have only one primary key.

Here’s an example of defining a primary key constraint on the empid column in the Employee table that you created earlier:

ALTER TABLE dbo.Employee

  ADD CONSTRAINT PK_Employee

  PRIMARY KEY(empid);

 

With this primary key in place, you can be assured that all empid values will be unique and known.

 

Unique Constraints

A unique constraint enforces uniqueness of rows. Unlike primary keys, multiple unique constraints can be defined in the same table. Also, a unique constraint is not restricted to columns defined as NOT NULL. You can have a single NULL value on a column on which you have created unique Constraint.

The following code defines a unique constraint on the ssn column in the Employee table:

ALTER TABLE dbo.Employee

  ADD CONSTRAINT UNQ_Employee_ssn

  UNIQUE(ssn);

 

 

Foreign Key Constraints

A foreign key enforces referential integrity. This constraint is defined on a set of attributes in what’s called the referencing table, and points to a set of candidate key (primary key or unique constraint) attributes in what’s called the referenced table. Note that the referencing and referenced tables can be one and the same. The foreign key’s purpose is to restrict the domain of values allowed in the foreign key columns to those that exist in the referenced columns.

The following code creates a table called Orders with a primary key defined on the orderid column:

 

IF OBJECT_ID(‘dbo.Orders’, ‘U’) IS NOT NULL

  DROP TABLE dbo.Orders;

 

CREATE TABLE dbo.Orders

(

  orderid   INT         NOT NULL,

  empid     INT         NOT NULL,

  custid    VARCHAR(10) NOT NULL,

  orderts   DATETIME    NOT NULL,

  qty       INT         NOT NULL,

  CONSTRAINT PK_Orders

    PRIMARY KEY(OrderID)

);

 

Suppose you want to enforce an integrity rule that restricts the domain of values supported by the empid column in the Orders table to the values that exist in the empid column in the Employee table. You can achieve this by defining a foreign key constraint on the empid column in the Orders table pointing to the empid column in the Employee table like so:

 

ALTER TABLE dbo.Orders

  ADD CONSTRAINT FK_Orders_Employee

  FOREIGN KEY(empid)

  REFERENCES dbo.Employee(empid);

 

Note that NULLs are allowed in the foreign key even if there are no NULLs in the referenced candidate key columns.

Foreign Key also establishes a rule that says that any attempt to delete rows from the referenced table or update the referenced candidate key attributes will be rejected if related rows exist in the referencing table. For example, if you try to delete an employee row from the Employee table when there are related orders in the Orders table, the RDBMS will reject such an attempt and produce an error. This is the default behavior of foreign key constraint.

You can change this default behavior of Foreign Key constraint by using some options with Foreign Key and applying actions based on those options. You can define the options ON DELETE and ON UPDATE with actions like CASCADE, SET DEFAULT, and SET NULL as part of the foreign key definition. CASCADE means that the operation (delete or update) will be cascaded to related rows. For example, ON DELETE CASCADE means that when you delete a row from the referenced table, the RDBMS will delete the related rows from the referencing table. SET DEFAULT and SET NULL mean that the compensating action will set the foreign key attributes of the related rows to the column’s default value or NULL respectively. 

 

Check Constraints

A check constraint allows you to define a predicate that a row must meet to enter the table or to be modified. For example, the following check constraint ensures that the salary column in the Employee table will support only positive values:

 

ALTER TABLE dbo.Employee

  ADD CONSTRAINT CHK_Employee_salary

  CHECK(salary > 0);

 

An attempt to insert or update a row with a nonpositive salary value will be rejected by the RDBMS. Note that a check constraint rejects an attempt to insert or update a row when the predicate evaluates to FALSE. The modification will be accepted when the predicate evaluates to either TRUE or UNKNOWN.

 

Default Constraints

A default constraint is associated with a particular attribute. It is an expression that is used as the default value when an explicit value is not specified for the attribute when you insert a row. For example, the following code defines a default constraint for the orderts attribute (representing the order’s timestamp):

 

ALTER TABLE dbo.Orders

  ADD CONSTRAINT DFT_Orders_orderts

  DEFAULT(CURRENT_TIMESTAMP) FOR orderts; 

 

The default expression invokes the CURRENT_TIMESTAMP function, which returns the current date and time value. Once this default expression is defined, whenever you insert a row in the Orders table and do not explicitly specify a value in the orderts attribute, SQL Server will set the attribute value to CURRENT_TIMESTAMP.

­­­

Posted in T-SQL | Tagged , , , | Leave a comment

SQL Server Architecture

This section will introduce you to the SQL Server architecture, the entities involved—SQL Server instances, databases, schemas, and database objects.

SQL Server Instances

A SQL Server instance is an installation of a SQL Server database engine/service. You can install multiple instances of SQL Server on the same computer. Each instance is completely independent of the others in terms of security and the data that it manages. At the logical level, two different instances residing on the same computer have no more in common than two instances residing on two separate computers. But they do share the server’s physical resources such as CPU, memory, and disk.

Instances

One of the instances on the computer can be set up as the default instance, while all others must be named instances. You determine whether an instance is the default or a named one upon installation; you cannot change this later.

For a client application to connect to a default instance, it needs to specify the computer’s name or IP address. To connect to a named instance, the client needs to specify the computer’s name or IP address, followed by a backslash (\), followed by the instance name (as provided upon installation). For example, say you have two instances of SQL Server installed on a computer called Server1. One of the instances was installed as the default instance, and the other was installed as a named instance called Inst1. To connect to the default instance you need to specify Server1 as the server name; to connect to the named instance you need to specify Server1\Inst1.

Databases

You can think of a database as a container of objects such as tables, views, stored procedures, and so on. Each instance of SQL Server can contain multiple databases. When you install SQL Server, the setup program creates several system databases that hold system data and serve internal purposes. After installation you can create your own user databases that will hold application data.

Databases

The system databases that the setup program creates include master, Resource, model, tempdb, and msdb. A description of each follows.

master The master database holds instance-wide metadata information, server configuration, information about all databases in the instance, and initialization information.

Resource The Resource holds all system objects. When you query metadata information in a database, this information appears to be local to the database but in practice it resides in the Resource database.

model The model database is used as a template for new databases. Every new database that you create is initially created as a copy of model. So if you want certain objects (such as data types) to appear in all new databases that you create, or certain database properties to be configured in a certain way in all new databases, you need to create those objects and configure those properties in the model database. Note that changes you apply to the model database will not impact existing databases—only new databases that you create in the future.

tempdb The tempdb database is where SQL Server stores temporary data such as work tables, sort space, row versioning information, and so on. Note that this database is destroyed and recreated as a copy of the model every time you restart the instance of SQL Server. For this reason, when I need to create objects for test purposes and I don’t want the objects to remain in the database, I usually create them in tempdb.

msdb The msdb database is where a service called SQL Server Agent stores its data. SQL Server Agent is in charge of automation, which includes entities such as jobs, schedules, and alerts. The SQL Server Agent is also the service in charge of replication. The msdb database also holds information related to other SQL Server features such as Database Mail and Service Broker.

You can create as many user databases as you might need within an instance. A user database will hold objects and data for an application.

You can define a property called collation at the database level that will determine language support, case sensitivity, and sort order for character data in that database.

In terms of security, to be able to connect to a SQL Server instance, the DBA must create a login for you. The login can be tied to your Windows credentials, in which case it is called a Windows authenticated login. With a Windows authenticated login, you won’t need to provide login and password information when connecting to SQL Server because you already provided those when you logged on to Windows. The login can also be independent of your Windows credentials, in which case it is called a SQL Server authenticated login. When connecting to SQL Server using a SQL Server authenticated login, you will need to provide both a login name and a password.

The DBA needs to map your login to a database user in each database that you are supposed to have access to. The database user is the entity that will be granted permissions to objects in the database.

Database Layout

The database is made of data and transaction log files. When you create a database, you can define various properties for each file, including the file name, location, initial size, maximum size, and an autogrowth increment. Each database must have at least one data file and at least one log file (the default in SQL Server). The data files hold object data, and the log files hold information that SQL Server needs to maintain transactions.

Although SQL Server can write to multiple data files in parallel, it can only write to one log file at a time, in a sequential manner. Therefore, unlike with data files, having multiple log files does not result in performance benefit. You might need to add log files if the disk drive where the log resides runs out of space.

Data files are organized in logical groups called filegroups. A filegroup is the target for creating an object, such as a table or an index. The object data will be spread across the files that belong to the target filegroup. Filegroups are your way to control what will be the physical locations of your objects. A database must have at least one filegroup called PRIMARY, and optionally other user filegroups as well.

The PRIMARY filegroup contains the primary data file (extension .mdf), and the database’s system catalog. You can optionally add secondary data files (extension .ndf) to PRIMARY. User filegroups contain only secondary data files. You can determine which of the filegroups is marked as the default filegroup. An object is created on the default filegroup when the object creation statement does not explicitly specify the target filegroup.

Schemas and Objects

A database contains schemas, and schemas contain objects. You can think of a schema as a container of objects such as tables, views, stored procedures, and others.

databases-schemas-and-database objects

 

You can control permissions at the schema level. For example, you can grant a user SELECT permissions on a schema, allowing the user to query data from all objects in the schema. So security is one of the considerations for determining how to arrange objects in schemas.

The schema is also used as a prefix to the object name. For example, say you have a table called Orders in a schema called Sales. The schema-qualified object name (also known as the two-part object name) is Sales.Orders. If you omit the schema name when referring to an object, SQL Server will apply a process to resolve the schema name such as checking whether the object exists in the user’s default schema, and if it doesn’t, checking whether it exists in the dbo schema. It is recommended that when you refer to objects in your code you always use the two-part object names. Some insignificant extra costs are involved in resolving the object name when it’s not explicitly specified. As insignificant as this extra cost might be, why pay it? Also, if multiple objects with the same name exist in different schemas, you might end up getting a different object than the one you wanted.

 

Reference : Book – Microsoft_SQL_Server_2008_T_SQL_Fundamentals by Itzik Ben-Gan.

I Hope, you will like this post. Please share your thoughts on this topic or write comments in the comment section.

 

Posted in SQL Fundamentals | Leave a comment

SQL Core Concepts (The Data Life Cycle)

In SQL Server database data is actually similar to a product in an assembly line moving from one environment to another and undergoing transformations along the way. Figure-1 illustrates the data life cycle.

data-life-cycle

OnLine Transactional Processing

Data is entered initially into an OnLine Transactional Processing (OLTP) system. The focus of an OLTP system is data entry and not reporting—transactions mainly insert, update, and delete data. The relational model is targeted mainly at OLTP systems, where a normalized model provides both good performance for data entry and data consistency. In a normalized environment each table represents a single entity and redundancy is at a minimum. When you need to modify a fact you need to modify it in only one place, resulting in optimized performance for modifications and little chance for error.

An OLTP environment is not suitable for reporting purposes because a normalized model usually involves many tables (one for each entity) with complex relationships. Even simple reports require joining many tables, resulting in complex and poorly performing queries.

You can implement an OLTP database in SQL Server and both manage it and query it with T-SQL.

 

Data Warehouse

A data warehouse (DW) is an environment designed for data retrieval/reporting purposes. When serving an entire organization such an environment is called a data warehouse; when serving part of the organization (such as a specific department) it is called a data mart. The model has intentional redundancy, which allows fewer tables and simpler relationships, ultimately resulting in simpler and more efficient queries compared to an OLTP environment.

The simplest design of a data warehouse is called a star schema, which includes several dimension tables and a fact table. Each dimension table represents a subject by which data is analyzed. For example, in a system that deals with orders and sales, you will probably want to analyze data by customers, products, employees, time, and so on. In a star schema each dimension is implemented as a single table with redundant data. For example, a product dimension could be implemented as a single ProductDim table instead of three normalized tables: Products, ProductSubCategories, and ProductCategories. If you normalize a dimension table, resulting in multiple tables representing the dimension, you get what’s known as a snowflake dimension. A schema that contains snowflake dimensions is known as a snowflake schema (as opposed to a star schema).

The fact table holds the facts and measures such as quantity and value for each relevant combination of dimension keys. For example, for each relevant combination of customer, product, employee, and day there will be a row with the quantity and value in the fact table. Note that data in a data warehouse is typically pre-aggregated to a certain level of granularity (such as a day), unlike data in an OLTP environment, which is usually recorded at the transaction level.

You can implement a data warehouse as a SQL Server database and manage and query it with T-SQL. The process that pulls data from source systems (OLTP and others), manipulates it, and loads it into the data warehouse is called Extract Transform and Load, or ETL. SQL Server provides a tool called Microsoft SQL Server Integration Services (SSIS) to handle ETL needs.

 

OnLine Analytical Processing

OnLine Analytical Processing (OLAP) systems support dynamic, online analysis of aggregated data.

Consider a data warehouse that you implemented as a relational database in SQL Server. Whenever a user makes a request for aggregated data, the application submits a query to the database, typically scanning and aggregating large amounts of base data. Even though it is more efficient to handle such requests against a relational data warehouse compared to an OLTP environment, that approach might not be efficient enough. Online dynamic analysis of aggregated data usually involves frequent requests for different levels of aggregations, which require slicing and dicing the data. Each such request might end up being very expensive if it needs to scan and aggregate large amounts of data, and chances are the response time will not be satisfactory.

To handle such needs you can pre-calculate different levels of aggregations. For example, you can pre-calculate yearly, monthly, and daily with the time dimension; category, subcategory, and product with the product dimension, and so on. When you pre-calculate aggregates, requests for aggregated data can be satisfied more quickly.

One option to implement this idea is to calculate and store the different levels of aggregations in the relational data warehouse. This involves writing a sophisticated process to handle the initial processing of the aggregates and the incremental updates. Another option is to use a special product designed for OLAP needs—Microsoft SQL Server Analysis Services (SSAS or AS). Note that SSAS is a separate service/engine from the SQL Server service. SSAS supports calculating different levels of aggregations and storing them in optimized multidimensional structures known as cubes. The source data for SSAS cubes can—and usually is—a relational data warehouse. Besides supporting large volumes of aggregated data, SSAS also provides many rich and sophisticated data analysis capabilities. The language used to manage and query SSAS cubes is called Multidimensional Expressions (MDX).

 

Data Mining

OLAP systems provide the user with answers to all possible questions, but the user’s task is to ask the right questions—to sift anomalies, trends, and other useful information from the sea of data. In the dynamic analysis process the user navigates from one view of aggregates to another—again, slicing and dicing the data—to find useful information.

Data mining (DM) is the next step; instead of letting the user look for useful information in the sea of data, data mining models can do this for the user. That is, data mining algorithms comb the data and sift the useful information from it. Data mining has enormous business value for organizations, helping to identify trends, figure out which products are purchased together, predict customer choices based on given parameters, and so on.

SSAS supports data mining algorithms—including clustering, decision trees, and others—to address such needs. The language used to manage and query data mining models is Data Mining Extensions (DMX).

 

Reference : Book – Microsoft_SQL_Server_2008_T_SQL_Fundamentals by Itzik Ben-Gan.

I Hope, you will like this post. Please share your thoughts on this topic or write comments in the comment section.

Posted in SQL Fundamentals | Leave a comment

Create Table in SQL

In this example i show you How you can create New table in a Database.

For my example i use tempDB database.

 

 

–use tempDB database context

use tempDB;

–Drop Employee table if already exists in tempDB database

If OBJECT_ID(‘dbo.Empluyee’,‘U’) is NOT NULL

Drop Table dbo.Employee;

–Create Employee table having 4 columns

CREATE TABLE dbo.Employees

(

empid INT NOT NULL,

firstname VARCHAR(30) NOT NULL,

lastname VARCHAR(30) NOT NULL,

salary MONEY NOT NULL

);

Posted in T-SQL | Tagged | Leave a comment

Database Normalization and Normal Forms

Database Normalization:

If you are doing database work from some time, you possibly have heard about Normalization before. Normalization is basically a process that is applied on databases to make sure of minimum redundancy (1. No duplicate data) and more consistency (2. Correctness of Data) of the data stored in the database.

Normalization is applied and measured by normalization levels. Normalization levels (Normal forms) are defined by Database experts. It means a database can be told in any one of normal form if it satisfies all the required criteria defined by that normal form.

Normalization, if applied correctly, can drastically improve the database performance. It reduces space used by data in the databases and it also helps in reducing update conflicts because of data redundancy.

Normal Forms:

Normal forms are basically normalization levels, used to measure the redundancy and consistency level of the databases. The higher the normal form means less data redundancy and more data consistency or more correctness of data. Normalization starts with 1NF (First Normal Form) and go further with 2NF (Second Normal Form), 3NF (Third Normal Form), BCNF (Boyce-Codd Normal Form or 3.5NF), 4NF (Forth Normal Form), 5NF and higher. 5NF and higher normal forms are rare seen and sometimes not practically possible, so we will not cover these normal forms in this post.

Normal forms are cumulative in nature; it means that, to be in particular normal form, database has to fulfill all the required criteria of all the lower normal forms. For example to be in 3NF database has to satisfy all the criteria of 2NF and 1NF.

To explain normal form we will normalize below Student table. This table has student data like his teachers, classes and class-rooms details. Rooms are assigned to Teachers on the class basis. For Example, Teacher Richard is assigned room Number 21 for class A, B and C.

Student Name

Student Address

Teacher Name

Room Number

Class1

Class2

Class3

John

Addr-A

Richard

21

A

B

D

David

Addr-B

Fillip

32

E

F

C

David

Addr-B

George

25

H

I

J

Fig.-1: Student Table

 

First Normal Form (1NF):

To be in first Normal form table has to satisfy following criteria.

1.       No duplicate columns in single table.

2.      Each column has single value (i.e. there is no group of values or array).

3.      Each row should be identified by unique key (A Primary Key).

In our example table, we have three columns named Class1, Class2, and Class3 for the same data. This violated first rule of 1NF.

One Alternative of this is to have one column named Classes having all class names in comma separated way (See Fig.-2). But this violates the second rule of 1NF that each column should have single value.

Student Name

Student Address

Teacher Name

Room Number

Classes

John

Addr-A

Richard

21

A, B, D

David

Addr-B

Fillip

32

E, F, C

David

Addr-B

George

25

H, I, J

Fig.-2: Column Classes has multiple values of same type.

Because every student can have multiple classes, we should have one row for each class of every student (See Fig.-3).

Student Name

Student Address

Teacher Name

Room Number

Class

John

Addr-A

Richard

21

A

John

Addr-A

Richard

21

B

John

Addr-A

Richard

21

D

David

Addr-B

Fillip

32

E

David

Addr-B

Fillip

32

F

David

Addr-B

Fillip

32

C

David

Addr-B

George

25

H

David

Addr-B

George

25

I

David

Addr-B

George

25

J

Fig.-3: Table has multiple rows of each student for every class

Now this table satisfies first two rules (No duplicate columns and no group of values of same type in a column) of 1NF. Before going further let’s understand the concept of Candidate Key and Primary Key.

Candidate Key: Any combination of columns (Single or multiple columns) of a table, that can be used to uniquely identify each row of a table, is the candidate key for that table. A table can have one or multiple candidate keys.

Primary Key: From the list of Candidate keys user can choose any one as a primary key, which will be used to uniquely identify rows of that table. A table can have only one primary key.

Now that we know the concept of Primary key and Candidate keys, we can apply it in our example.

We can set Student Name and Class column combination as Primary Key (See Fig.-4).

image

Fig.4: Student table having Primary Key set on Student Name and Class combination

Now Student table of Fig.-4 is in First Normal Form (1NF).

 

Second Normal Form (2NF):

To be in Second Normal Form Table has to satisfy following criteria.

1.       Table should be in First Normal Form.

2.      There should be no partial dependency between non-key attributes and composite key. A partial dependency occurs when a non-key attribute is dependent on only a part of the (composite) key.

3.      There should be no subset of data that comes with multiple rows in a table. Place that data into different table.

4.      Define relationships among the new tables and the base table using foreign key.

Here point 3 and 4 are conceptually same as point 2. If a table has partial dependency of any non-key column then it will always have some set of data that is repeated for multiple rows. We need to identify those set of data and place them into separate table and define proper relationships among those.

We can see in Fig-4 that Student Address column not fully functionally dependent on Primary key (Student Name, Class). We should keep Student Name and Student Address columns in Student table and place all other columns in separate table and name in Student-Teacher-Class table. 

Foreign Key: Foreign Key is set on a column which references a Primary Key on another table. It establishes referential integrity on a table. Foreign key make sure that no foreign key value exists in a referencing table, if it does not have corresponding value in referenced table.

We generated a unique Student ID column in Student table and set it as primary key. Then we created foreign key relationship with Student-Teacher-Class table using Student ID column.

We also create Primary Key of Student ID and Class column combination on Student-Teacher-Class table.

image

Fig-5: Student and Student-Teacher-Class table with Foreign Key Relationship

Now at this time Tables are in Second normal form.

Third Normal Form (3NF):

To be in Third Normal Form Table has to satisfy following criteria.

1.       Table should be in Second Normal Form.

2.      No non-key column should depend on other non-key column or has no transitive functional dependency.

To convert our tables into 3NF, we need to remove all non-key columns from a table which are not dependent upon primary key. In our example, Room Number (non-key) column on table Student-Teacher-Class is fully functionally depends on Teacher Name (non-key) and not on Student ID and Class (Primary Key) columns.

We need to Place Teacher Name and Room Number column in different table and create relationship with this table (See Fig.-6).

image

Fig.-6: Student-Teacher-Class table is decomposed and Teacher table created.

In Fig.-6 Student-Teacher-Class table is decomposed and Teacher table is created. We created Teacher ID as Primary key in Teacher table and define foreign key relationship of Teacher table with Student-Teacher-Class table using Teacher ID column. We set Student ID, Class and Teacher ID combination as primary key in Student-Teacher-Class table.

Now tables in Fig.6 are in Third Normal Form (3NF).

 

Boyce-Codd Normal Form (BCNF or 3.5NF):

To be in Boyce-Codd Normal Form Table has to satisfy following criteria.

1.       Table should be in Third Normal Form.

2.      Every determinant must be a candidate key.

Determinant Definition: Set of columns, used to identify other column values in the same row of a table.

In Student-Teacher-Class table we have two determinants. One is Class column which determines Student ID and Teacher ID. Second is Teacher ID column which determines Student ID.

Here Class column is a candidate key because each value of class uniquely identifies rows. But Teacher ID column is not candidate key because this column has duplicate values in different rows.

It is easy to check the anomalies in table Student-Teacher-Class. For Example, If a Teacher Richard (Teacher ID: 1) is leaving the school than the table will lose information that Student John (Student ID: 101) is attending classes A, B and D.

Of course the solution here is to decompose the table into two tables using Class column to relate these two tables (See Fig.-7).

image

Fig.-7: Student-Teacher-Class table is decomposed into Student-Class table and Teacher-Class table

Now as we can see in Fig.-7, every determinant of all tables is candidate key, hence we now have our tables are in Boyce-Codd Normal Form.

Fourth Normal Form (4NF):

To be in Fourth Normal Form table has to satisfy following criteria.

1.     Table should be in Third Normal Form.

2.     It should not have any Multi-Valued dependencies.

Multivalued dependencies occur when the presence of one or more rows in a table implies the presence of one or more other rows in that same table.

Examples:

For example, imagine a car company that manufactures many models of car, but always makes both red and blue colors of each model. If you have a table that contains the model name, color and year of each car the company manufactures, there is a multivalued dependency in that table. If there is a row for a certain model name and year in blue, there must also be a similar row corresponding to the red version of that same car.

In our example we don’t need to further normalize our tables.

Do i always need to normalize a database ?

Generally, Database Normalization is good for databases, but it is not always required. In some cases we intentionally need to violate normalization rules to enhance the performance of Database. Sometimes we have the requirement to duplicate same data in more than one tables, so that we can retrieve data easily using lesser number of Joins and where conditions.

For Example, In Data warehouse like environment where data modification operations are less happening and Reporting operations are frequently happening, we intentionally need to duplicate data in multiple tables, so that we can retrieve it easily.

 

I hope you will like this post. Please write your views on this topic in the comment section.

Posted in SQL Fundamentals | Tagged , , , , | Leave a comment