SQL Core Concepts: Part 1 (RDBMS, SQL, T-SQL, Set Theory and Predicate Logic, Relational Model)

In this post I covered concepts of RDBMS, SQL, T-SQL, Set Theory and Predicate Logic, Relational Model. All these concepts are related to each other and knowledge of these basic concepts is I think very necessary before starting with the database. It will help beginners to understand T-SQL and SQL Server working.

1. RDBMS

RDBMS stands for Relational Database Management System. An RDBMS is a database management system based on the relational model (a semantic model forrepresenting data), which in turn is based on two mathematical branches: set theory and predicate logic. The relational model is language-independent. That is, you can implement the relational model with languages other than SQL, for example, with C# in a class model. For example, Some RDBMSs that support languages other than a dialect of SQL, such as CLR integration in SQL Server.

2. SQL

SQL stands for Structured Query Language. SQL is both an ANSI and ISO standard language based on the relational model, designed for querying and managing data in an RDBMS.

Unlike many other programming languages, SQL requires you to specify what you want to get and not how to get it. The task of the RDBMS is to figure out the how to part of processing your request.

SQL has several categories of statements, including Data Definition Language (DDL), Data Manipulation Language (DML), and Data Control Language (DCL). DDL deals with object definitions and includes statements such as CREATE, ALTER, and DROP. DML allows you to query and modify data and includes statements such as SELECT, INSERT, UPDATE, DELETE, and MERGE. DCL deals with permissions and includes statements such as GRANT and REVOKE.

SQL is based on relational model but it does not follow relational model completely. SQL deviates from the relational model in a number of ways. But still SQL is the de facto industrial language used by all leading RDBMSs.

3. T-SQL

Microsoft provides T-SQL as a dialect of, or extension to, SQL in Microsoft SQL Server, its RDBMS. T-SQL is based on standard SQL and also provides some nonstandard/proprietary extensions.

4. Set Theory

Set theory is one of the mathematical branches on which the relational model is based on. Here is the formal definition of a Set:

A set is any collection of definite, distinct objects of our perception or our thought considered as a whole. The objects are considered the elements or members of the set.

Let’s start with the word whole in the definition. A set should be considered a single entity. Your focus should be the collection of objects as opposed to the individual objects that make up the collection.

The word distinct means that every element of a set must be unique. Just like rows in a table in a database, you can enforce the uniqueness of rows in a table by defining key constraints. Without a key, you won’t be able to uniquely identify rows, and therefore the table won’t qualify as a set. Rather, the table would be a multiset, or bag.

The phrase of our perception or our thought implies that the definition of a set is subjective. Consider a classroom: One person might perceive a set of people, whereas another might perceive a set of students.

As for the word object, the definition of a set is not restricted to physical objects such as cars or employees, but rather is relevant to abstract objects as well, such as prime numbers or lines.

The order in which set elements are listed is not important. Like we have a set A={a, b, c} and because order has no relevance, the same set can be expressed as {b, a, c} or {b, c, a}. The elements of a set are described by their attributes. Each attribute in a set must have unique attribute names. Order of attributes is also not important.

5. Predicate Logic

Predicate logic, is another branch of mathematics that the relational model is based on. A predicate is a property or an expression that either holds or doesn’t hold—in other words, is either true or false. In the relational model, predicates are used to maintain the logical integrity of the data and define its structure. One example of a predicate used to enforce integrity is a constraint defined in a table of Employees that allows only employees with a salary greater than zero in the table. The predicate is “salary greater than zero” (T-SQL expression: salary > 0).

You can also use predicates when filtering data to define subsets, and so on. For example, if you need to query the Employees table and return only rows for employees from the sales department, you would use the predicate “department equals sales” in your query filter (T-SQL expression: department = ‘sales’).

In set theory, you can use predicates to define sets. It’s more convenient to define a set based on a property. As an example of an infinite set defined with a predicate, the set of all prime numbers can be defined with the following predicate: “x is a positive integer greater than 1 that is divisible only by 1 and itself.” The set of all prime numbers is the set of all elements for which the predicate holds true.

6. The Relational Model

The relational model is a semantic model for representing data and is based on set theory and predicate logic. The goal of the relational model is to enable consistent representation of data with minimal or no redundancy and without sacrificing completeness, and to define data integrity (enforcement of data consistency) as part of the model. An RDBMS is supposed to implement the relational model and provide the means to store, manage, enforce the integrity of, and query data.

The relational model involves concepts such as propositions, domains, n-ary relations, n-tuples, ordered pairs, and so on.

  • Propositions, Predicates, and Relations

“Relational” actually pertains to the mathematical term relation. A relation is a representation of a set in set theory. In the relational model, a relation is a set of related information, with the implementation in the database being a table. A key point in the relational model is that a single relation should represent a single set (for example, Customers). It is interesting to note that operations on relations (based on relational algebra) result in a relation (for example, a join between two relations).

When you design a data model for a database, you represent all data with relations (tables). You start by identifying propositions that you will need to represent in your database. A proposition is an assertion or a statement that must be true or false. For example, the statement “employee Ashish Jain was born on February 28, 1983 and belongs to department IT” is a proposition. If this proposition is true, it will manifest itself as a row in a table of Employees. A false proposition simply won’t manifest itself.

The next step is to formalize propositions or creating predicates out of propositions. The heading of a relation comprises a set of attributes. An attribute is identified by an attribute name and a domain (type) name. For example, the heading of an Employees relation might consist of the following attributes (expressed as pairs of attribute name and type name): employeeid integer, firstname character string, lastname character string, birthdate date, departmentid integer. A domain is the set of possible/valid values for an attribute. For example, the domain INT is the set of all integers in the range–2,147,483,648 to 2,147,483,647. Probably the best way to think of a domain is as a class—encapsulated data and the behavior supporting it. An example of a complex domain would be a geometry domain that supports polygons.

  • Missing Values

One aspect of the relational model is the source of many debates which is whether propositions should be restricted to use two-valued predicate logic. That is, using two-valued predicate logic, a proposition is either true or false. If a proposition is not true, it must be false. However, some say that there’s room for three-valued predicate logic, taking into account cases where something is unknown. Take, for example, a cellphone attribute of an Employees relation. Suppose that a certain employee’s cell phone number is missing. How do you represent this fact in the database? In a three-valued logic implementation, the cellphone attribute should allow a special mark for a missing value.

SQL implements three-valued predicate logic by supporting the NULL mark to signify the generic concept of a missing value. Support for NULLs and three-valued predicate logic in SQL is the source of a great deal of confusion and complexity, though one can argue that missing values are part of reality and the alternative—using two-valued predicate logic—is no less problematic.

  • Constraints

One of the greatest benefits of the relational model is having data integrity defined as part of the model. Integrity is achieved through rules, or constraints, that are defined in the data model and enforced by the RDBMS. The simplest methods of enforcing integrity are the attribute type and its NULLability, which enforce domain integrity. Constraints are also enforced through the model itself; for example, the relation Orders(orderid, orderdate, duedate, shipdate) allows three distinct dates per order, while the relations Employees(empid) and EmployeeChildren(empid, childname) allow zero to countable infinity of children per employee.

Other examples of constraints include candidate keys that provide entity integrity and foreign keys that provide referential integrity. A candidate key is a key defined on one or more attributes preventing more than one occurrence of the same tuple (row) in a relation. A predicate based on a candidate key can uniquely identify a row (such as an employee). You can define multiple candidate keys in a relation. For example, in an Employees relation you can define candidate keys on employeeid, on ssn (social security number), and others. One of the candidate keys is arbitrarily chosen as the primary key (say, employeeid in the Employees relation), and is used as the preferred way to identify a row. All other candidate keys are also known as alternate keys.

Foreign keys are used to enforce referential integrity. A foreign key is defined on one or more attributes of a relation (known as the referencing relation) and references a candidate key in another (or possibly the same) relation. This constraint restricts the values in the referencing relation’s foreign key attributes to the values that appear in the referenced relation’s candidate key attributes. For example, say that the Employees relation has a foreign key defined on the attribute departmentid, referencing the primary key attribute departmentid in the Departments relation. This means that the values in Employees.departmentid are restricted to the values that appear in Departments.departmentid.

Reference : Book – Microsoft_SQL_Server_2008_T_SQL_Fundamentals by Itzik Ben-Gan.

I Hope, you will like this post. Please share your thoughts on this topic or write comments in the comment section.

Advertisements

About Ashish Jain

I am Ashish Jain, a software engineer by profession. My goal of creating this blog is to share my knowledge of SQL server with all other SQL enthusiasts and also to learn from them.
This entry was posted in SQL Fundamentals and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s