SQL Core Concepts: Part 1 (RDBMS, SQL, T-SQL, Set Theory and Predicate Logic, Relational Model)

In this post I covered concepts of RDBMS, SQL, T-SQL, Set Theory and Predicate Logic, Relational Model. All these concepts are related to each other and knowledge of these basic concepts is I think very necessary before starting with the database. It will help beginners to understand T-SQL and SQL Server working.

1. RDBMS

RDBMS stands for Relational Database Management System. An RDBMS is a database management system based on the relational model (a semantic model forrepresenting data), which in turn is based on two mathematical branches: set theory and predicate logic. The relational model is language-independent. That is, you can implement the relational model with languages other than SQL, for example, with C# in a class model. For example, Some RDBMSs that support languages other than a dialect of SQL, such as CLR integration in SQL Server.

2. SQL

SQL stands for Structured Query Language. SQL is both an ANSI and ISO standard language based on the relational model, designed for querying and managing data in an RDBMS.

Unlike many other programming languages, SQL requires you to specify what you want to get and not how to get it. The task of the RDBMS is to figure out the how to part of processing your request.

SQL has several categories of statements, including Data Definition Language (DDL), Data Manipulation Language (DML), and Data Control Language (DCL). DDL deals with object definitions and includes statements such as CREATE, ALTER, and DROP. DML allows you to query and modify data and includes statements such as SELECT, INSERT, UPDATE, DELETE, and MERGE. DCL deals with permissions and includes statements such as GRANT and REVOKE.

SQL is based on relational model but it does not follow relational model completely. SQL deviates from the relational model in a number of ways. But still SQL is the de facto industrial language used by all leading RDBMSs.

3. T-SQL

Microsoft provides T-SQL as a dialect of, or extension to, SQL in Microsoft SQL Server, its RDBMS. T-SQL is based on standard SQL and also provides some nonstandard/proprietary extensions.

4. Set Theory

Set theory is one of the mathematical branches on which the relational model is based on. Here is the formal definition of a Set:

A set is any collection of definite, distinct objects of our perception or our thought considered as a whole. The objects are considered the elements or members of the set.

Let’s start with the word whole in the definition. A set should be considered a single entity. Your focus should be the collection of objects as opposed to the individual objects that make up the collection.

The word distinct means that every element of a set must be unique. Just like rows in a table in a database, you can enforce the uniqueness of rows in a table by defining key constraints. Without a key, you won’t be able to uniquely identify rows, and therefore the table won’t qualify as a set. Rather, the table would be a multiset, or bag.

The phrase of our perception or our thought implies that the definition of a set is subjective. Consider a classroom: One person might perceive a set of people, whereas another might perceive a set of students.

As for the word object, the definition of a set is not restricted to physical objects such as cars or employees, but rather is relevant to abstract objects as well, such as prime numbers or lines.

The order in which set elements are listed is not important. Like we have a set A={a, b, c} and because order has no relevance, the same set can be expressed as {b, a, c} or {b, c, a}. The elements of a set are described by their attributes. Each attribute in a set must have unique attribute names. Order of attributes is also not important.

5. Predicate Logic

Predicate logic, is another branch of mathematics that the relational model is based on. A predicate is a property or an expression that either holds or doesn’t hold—in other words, is either true or false. In the relational model, predicates are used to maintain the logical integrity of the data and define its structure. One example of a predicate used to enforce integrity is a constraint defined in a table of Employees that allows only employees with a salary greater than zero in the table. The predicate is “salary greater than zero” (T-SQL expression: salary > 0).

You can also use predicates when filtering data to define subsets, and so on. For example, if you need to query the Employees table and return only rows for employees from the sales department, you would use the predicate “department equals sales” in your query filter (T-SQL expression: department = ‘sales’).

In set theory, you can use predicates to define sets. It’s more convenient to define a set based on a property. As an example of an infinite set defined with a predicate, the set of all prime numbers can be defined with the following predicate: “x is a positive integer greater than 1 that is divisible only by 1 and itself.” The set of all prime numbers is the set of all elements for which the predicate holds true.

6. The Relational Model

The relational model is a semantic model for representing data and is based on set theory and predicate logic. The goal of the relational model is to enable consistent representation of data with minimal or no redundancy and without sacrificing completeness, and to define data integrity (enforcement of data consistency) as part of the model. An RDBMS is supposed to implement the relational model and provide the means to store, manage, enforce the integrity of, and query data.

The relational model involves concepts such as propositions, domains, n-ary relations, n-tuples, ordered pairs, and so on.

  • Propositions, Predicates, and Relations

“Relational” actually pertains to the mathematical term relation. A relation is a representation of a set in set theory. In the relational model, a relation is a set of related information, with the implementation in the database being a table. A key point in the relational model is that a single relation should represent a single set (for example, Customers). It is interesting to note that operations on relations (based on relational algebra) result in a relation (for example, a join between two relations).

When you design a data model for a database, you represent all data with relations (tables). You start by identifying propositions that you will need to represent in your database. A proposition is an assertion or a statement that must be true or false. For example, the statement “employee Ashish Jain was born on February 28, 1983 and belongs to department IT” is a proposition. If this proposition is true, it will manifest itself as a row in a table of Employees. A false proposition simply won’t manifest itself.

The next step is to formalize propositions or creating predicates out of propositions. The heading of a relation comprises a set of attributes. An attribute is identified by an attribute name and a domain (type) name. For example, the heading of an Employees relation might consist of the following attributes (expressed as pairs of attribute name and type name): employeeid integer, firstname character string, lastname character string, birthdate date, departmentid integer. A domain is the set of possible/valid values for an attribute. For example, the domain INT is the set of all integers in the range–2,147,483,648 to 2,147,483,647. Probably the best way to think of a domain is as a class—encapsulated data and the behavior supporting it. An example of a complex domain would be a geometry domain that supports polygons.

  • Missing Values

One aspect of the relational model is the source of many debates which is whether propositions should be restricted to use two-valued predicate logic. That is, using two-valued predicate logic, a proposition is either true or false. If a proposition is not true, it must be false. However, some say that there’s room for three-valued predicate logic, taking into account cases where something is unknown. Take, for example, a cellphone attribute of an Employees relation. Suppose that a certain employee’s cell phone number is missing. How do you represent this fact in the database? In a three-valued logic implementation, the cellphone attribute should allow a special mark for a missing value.

SQL implements three-valued predicate logic by supporting the NULL mark to signify the generic concept of a missing value. Support for NULLs and three-valued predicate logic in SQL is the source of a great deal of confusion and complexity, though one can argue that missing values are part of reality and the alternative—using two-valued predicate logic—is no less problematic.

  • Constraints

One of the greatest benefits of the relational model is having data integrity defined as part of the model. Integrity is achieved through rules, or constraints, that are defined in the data model and enforced by the RDBMS. The simplest methods of enforcing integrity are the attribute type and its NULLability, which enforce domain integrity. Constraints are also enforced through the model itself; for example, the relation Orders(orderid, orderdate, duedate, shipdate) allows three distinct dates per order, while the relations Employees(empid) and EmployeeChildren(empid, childname) allow zero to countable infinity of children per employee.

Other examples of constraints include candidate keys that provide entity integrity and foreign keys that provide referential integrity. A candidate key is a key defined on one or more attributes preventing more than one occurrence of the same tuple (row) in a relation. A predicate based on a candidate key can uniquely identify a row (such as an employee). You can define multiple candidate keys in a relation. For example, in an Employees relation you can define candidate keys on employeeid, on ssn (social security number), and others. One of the candidate keys is arbitrarily chosen as the primary key (say, employeeid in the Employees relation), and is used as the preferred way to identify a row. All other candidate keys are also known as alternate keys.

Foreign keys are used to enforce referential integrity. A foreign key is defined on one or more attributes of a relation (known as the referencing relation) and references a candidate key in another (or possibly the same) relation. This constraint restricts the values in the referencing relation’s foreign key attributes to the values that appear in the referenced relation’s candidate key attributes. For example, say that the Employees relation has a foreign key defined on the attribute departmentid, referencing the primary key attribute departmentid in the Departments relation. This means that the values in Employees.departmentid are restricted to the values that appear in Departments.departmentid.

Reference : Book – Microsoft_SQL_Server_2008_T_SQL_Fundamentals by Itzik Ben-Gan.

I Hope, you will like this post. Please share your thoughts on this topic or write comments in the comment section.

Posted in SQL Fundamentals | Tagged , , , | Leave a comment

How To: Find all objects, referencing a single object or all objects, referenced in a single object in SQL Server Database

Many times i come across a situation where i have a requirements like this.

1.   Find out all stored procedures, using a table or

2.   Find out all tables used in a stored procedure.

Today I will show how to get this information using sys.sql_expression_dependencies system object.

Before we go further, we need to know about the concept of Referenced and Referencing object. Here i am explaining this concept by an example.

This is the definition of a stored procedure named Up_Sample.

Create Procedure Up_Sample

AS

Select * from Tbl_Example

GO

Referencing Object: Here Up_Sample stored procedure is using Tbl_Example table, hence Up_Sample is referencing object.

Referenced Object: Here Tbl_Example table is being used by Up_Sample stored procedure, hence Tbl_Example referenced object.

Now that you know the concept of Referenced and Referencing object, we can go further.

I am using AdventureWorks2008 database in my example. If you don’t have AdventureWorks2008 database in your SQL Server instance then please read my post to Download and install AdventureWorks2008 database and install database accordingly.

1.      Find all objects, referencing a single object

Run the below query to get All objects using or referencing table JobCandidate.

USE AdventureWorks2008;

GO

SELECT

SCHEMA_NAME(O.SCHEMA_ID) as referencing_schema_name,

O.name as referencing_object_name,

O.type_desc as referencing_object_type_desc,

SED.referenced_schema_name,

SED.referenced_entity_name as referenced_object_name,

O1.type_desc as referenced_object_type_desc

FROM

sys.sql_expression_dependencies SED

INNER JOIN

sys.objects O ON SED.referencing_id = O.[object_id]

LEFT OUTER JOIN

sys.objects O1 ON SED.referenced_id = O1.[object_id]

WHERE SED.referenced_entity_name = ‘JobCandidate’

 

Here is the output of this query. You can see all objects using JobCandidate table in this output.

 

clip_image002

Fig. 1

2.     Find all objects, Referenced in a single object

Run the below query to get all objects referenced or used in view vEmployee.

USE AdventureWorks2008

GO

SELECT

SCHEMA_NAME(O.SCHEMA_ID) as referencing_schema_name,

O.name as referencing_object_name,

O.type_desc as referencing_object_type_desc,

SED.referenced_schema_name,

SED.referenced_entity_name as referenced_object_name,

O1.type_desc as referenced_object_type_desc

FROM

sys.sql_expression_dependencies SED

INNER JOIN

sys.objects O ON SED.referencing_id = O.[object_id]

LEFT OUTER JOIN

sys.objects O1 ON SED.referenced_id = O1.[object_id]

WHERE

O.name = ‘vEmployee’

 

Here is the output of this query. You can see all objects referenced in vEmployee view in this output

 

clip_image004

Fig. 2

If you have any question or if you want to share your thoughts on this topic then please feel free to write in the comment section.

Posted in SQL Administration | Tagged , , , | Leave a comment

How To: Download and install AdventureWorks database for SQL Server 2008

1. Download AdventureWorks 2008 OLTP Script.zip file from below link.

http://msftdbprodsamples.codeplex.com/releases/view/89502

2. Extract zip file into “AdventureWorks 2008 OLTP Script” folder.

3. In the Script folder you will find collection of .csv files and one AdventureWorks DDL script file named “instawdb.sql”.

4. Open instawdb.sql DDL Script File in SQL Server Management Studio.

5. Update the DDL script path variables (See Fig 1).

clip_image002

Fig. 1

Fig. 1 shows instawdb.sql DDL script file in SSMS. This script depends on two path variables SqlSamplesDatabasePath and SqlSamplesSourceDataPath. SqlSamplesDatabasePath will take path of the folder where AdventureWorks database files will be stored after installation while SqlSamplesSourceDataPath should be set with the actual path of “AdventureWorks 2008 OLTP Script” folder where you have all the csv files in your system. Before running this script in SSMS, You should uncomment setvar commands of both path variables and also both path variables should be set with necessary and appropriate values(As specified in Fig. 1 above). An invalid SqlSamplesSourceDataPath path will generate an error that the bulk load steps failed because the .csv files could not be found.

6. Run DDL Script only in SQLCMD mode. On the Query menu click on SQLCMD Mode. You will find SQLCMD mode menu in Query -> SQLCMD Mode in SSMS.

clip_image003

Fig. 2

7. On a SSMS toolbar click on Execute or Press F5 to execute DDL script.

After successful execution of DDL script you will have the AdventureWorks database in your SQL Server 2008 Instance.

 

I will show you other methods of installing AdvantureWorks database in SQL Server 2008, in my future posts.

Please share your thoughts in the comment section.

Enhanced by Zemanta
Posted in SQL Administration | Tagged , , , , , , , | 1 Comment

How to Find all stored procedures referencing a table object in Database

Sometime as a database developer or DBA, we need to find all stored procedures in a database in which a specific table is being used. We can get this important information with the help of sys.objects and syscomments system objects.

Following is the query to retrieve name of all stored procedure using table object of name “Customer” in Selected Database.

IN SQL Server 2000

Select Distinct O.name, O.type_desc

from sys.objects O Join syscomments C

on O.object_id = C.id

where C.text like ‘%Customer%’

 

IN SQL Server 2005

Select O.name, O.type_desc

from sys.objects O

Where OBJECT_DEFINITION(O.object_id) like ‘%Customer%’

Enhanced by Zemanta
Posted in SQL Administration | Tagged , , , , , | Leave a comment

How should DBA plan for SQL Server 2008 Upgrade

Disclosure: Please note that this post is based on a post in mssqlbits which i came across while searching for upgrade process of SQL server. I really like mssqlbits and want to rewrite this post on my blog so that i can have a copy of it always with me and get benefitted anytime i need. I tried to be honest with the original post. I am writing this post as notes for myself and others, who wants to learn the upgrade process of SQL server 2008. It tried to make this post more compact and easy to understand for readers.

Question:

What are some of the key considerations as DBA plan to upgrade?  How much time should be spent on planning?  Who should do the planning?  what are the technical aspects a DBA should research?  What is the best way to manage the project?

Answer:

At many organizations, significant planning is needed before start upgrading to SQL Server 2008. In this post i will cover what planning is needed before starting an upgrade to SQL Server 2008.

As each upgrade is unique and different, so some of the recommendations in this post may not be applicable for an upgrade that you are facing, so particular aspects may not be included. 



Preliminary research

As you start on your preliminary research, be sure to educate yourself on the following items:

  • Inventory of the servers, applications and users

Ø  SQL Server versions and editions

§  Command to get Sever version, level and edition information.

Select SERVERPROPERTY(‘productversion’), SERVERPROPERTY(‘productlevel’), SERVERPROPERTY(‘edition’)

Ø  Application versions and editions

Ø  User location, operating hours, etc.

  • 32 bit versus 64 bit platform
  • Hardware support from your vendors
  • Business application support from your vendors and/or co-workers
  • Upgrade paths
  • Rollback plans
  • Development and test environments
  • Testing
  • Relational engine upgrade
  • DTS to SSIS upgrade
  • Reporting Services upgrade
  • Analysis Services upgrade
  • Opportunities for improvement
  • Post upgrade tasks
  • Adopting new SQL Server technologies

These were the keys aspects for one or more recent upgrade projects, so we wanted to pass them along.  One key item that I want to mention again is figuring out which users are involved in the upgrade.  Be sure to get them involved early and often as you continue the project.



Project Scope

Here is critical step that is missed all too often. Some people call it ‘figuring out how big the bread box is’ while others assume they know it.  In my opinion scoping and managing a project does not need to be a rigid process, but it is important.  At this phase in the project try to understand the following:

  • Why does the business or department want to upgrade to SQL Server 2008?
  • What SQL Server are in scope?
  • What applications access the SQL Servers?
  • What users leverage the applications?
  • What are the user operating hours?
  • Who are the points of contact\management for the users?
  • What automated processes leverage the SQL Servers?
  • What third party applications, controls, plug-ins, etc. are needed?
  • What are the upstream and downstream applications?
  • Who are the key IT team members?
  • When does the project need to be completed?
  • Which management members are supporting the project?
  • Which budgets are paying for the project?
  • Have the budgets been finalized or is management waiting for estimates?

As a DBA, do a quick sanity check here and make sure all looks correct.  If you are not comfortable with the findings, run a tool like Profiler to determine if any users, applications, processes, etc are not being taken into consideration.  One group that always seems to be overlooked are the Reporting Groups, so be sure to include them.



Project Plan

In some circumstances, the preliminary research generated more questions to vendors, partners and developers.  In other circumstances, it lead to a project plan to work through the upgrade.  In my opinion, building a project plan is the next logical step in formally beginning the upgrade process.  Some companies have a formal project planning methodology, others do not and use task lists more so than project plans.  Regardless of your environment, be sure to have a plan that is shared (physically and electronically) with everyone on the team that includes the following items:

  • Definable tasks, with the person responsible as well as the start and end date
  • Budgets, estimates and purchases for hardware, software, FTE time and consulting time
  • Weekly or daily status meetings
  • Project scope and requirements
  • Environment setup and configuration
  • Preliminary analysis
  • Testing including user, unit, load and DBA
  • Upgrade dates including application freeze, upgrade and thaw
  • Go | No Go Meeting
  • Upgrade and post upgrade testing and DBA tasks
  • Lessons learned


Hardware and Software Needs                                                                

Although I believe SQL Server has introduced a number of great features, the one that I believe will be adopted on a more wide spread basis is support for 64 bit.  Although, 64 bit versions of Windows and SQL Server have been available since SQL Server 2000, I have not seen a wide spread adoption across numerous companies.  I hope with SQL Server 2008 upgrades these changes.

As you are upgrading your hardware, be sure to include your development and test servers in that plan.  Depending on the application, it might not make sense to have a 64 bit production platform and a 32 bit development and test platform. As such, the resources spent on hardware may be more than what you would initially expect, so plan accordingly. 

On a related note, if you organization has not been adopting virtual server technologies, this might be the time to research those vendors to determine if they make sense for your development and test needs.  Many companies have adopted virtual server technologies while others have continued to leverage physical servers for each environment.  These technologies have the potential to save a great deal of power, rack space, hardware, etc costs.  Keep an open mind and see if they make sense for your organization.



SQL Server Upgrade Planning Tasks

When it comes to planning for the technical portion of the upgrade, it is wise to break this down some detailed processes into manageable checklists that various team members can use during the process.  At a high level, here are some of those items:

  • Preliminary tasks
  • Hardware\Software setup
    • Environments
      • Development
      • Test
      • Production
    • Windows and SQL Server licensing
    • Third party tools
  • Development\recoding
  • Testing
  • Rollback plan
    • Building a rollback plan is your insurance plan in case a problem arises.  I firmly believe in building a rollback plan and testing it out.  I have seen a few situations where a problem occurs and the team just tries to work through it.  If you do not know how much time your rollback plan will take, you can reach the end of your planned downtime and not have a system for users to support the business.  This is not good.
    • I have seen other situations where a problem occurs and the team hits a wall and does not have a plan to revert back.  At 2:00 AM in the morning when everyone is exhausted and no one can think straight they are frantically trying to get the system back up and running.  Unfortunately, mistakes are made and they can be costly.
    • If you get push back, ask your team if they want to give up their personal insurance policies.  See if it makes them as uncomfortable as it does you, by not building and testing a rollback plan.
  • Upgrade
    • Relational Engine
      • Preliminary tasks
      • In place upgrade (1 machine)
        • This option is fairly straight forward because the installation process will upgrade the SQL Server application as well as the SQL Server databases.  The down side of this is that the rollback plan would require a uninstall of SQL Server 2008 and a reinstall of the previous version of SQL Server in addition to a restore of the databases.  Most likely you would also need to reconfigure SQL Server.
      • Upgrade to a new machine (2 machine upgrade)
        • When upgrading to a new machine the installation of SQL Server on the new machine can take place well in advance.  Some companies even test with the new machines to understand how the system will perform under load, which I think is a good idea.
        • In terms of the upgrade itself, the databases can be either backed up and restored from the previous version of SQL Server to SQL Server 2008 or detached\attached.  In either case, the upgrade process is fairly straight forward.
        • If you have a very slim window when you can upgrade, I would consider a third party tool which compresses the backups to perform the backup and restore operation. These tools can save a significant amount of time and disk space.  This is one technique we used for SQL Server 2005 upgrades and the tools have worked well again for SQL Server 2008.
    • Reporting Services
    • Analysis Services
  • Post Upgrade tasks

Next Steps

Reference URL: http://www.mssqltips.com/sqlservertip/1933/sql-server-2008-upgrade-plan-for-dbas/

I hope you will like this post. As I want to learn more on SQL Server 2008 upgrade process, Please share your knowledge in SQL Server 2008 upgrade planning and your thoughts in the comment section.

Posted in SQL Installation and Upgrade | Tagged , , , , | Leave a comment

Why planning for upgrade to SQL Server 2008

Disclosure: Please note that this post is based on some other posts i came across while searching for upgrade process of SQL server. I tried to be honest with the original posts. I am writing this post as notes for myself and others, who wants to learn the upgrade process of SQL server 2008. It tried to make this post more compact and easy to understand for readers.

Now that SQL Server 2012 has been released. Some companies which still are using SQL Server 2005 or older versions of SQL Server can surly get benefited from SQL Server 2008 because it is proven technology and have some great features which can help companies achieve their goals. The Question is what are the drivers from a business, technology and career perspective one needs to know about when planning for upgrade to SQL Server 2008? What are the decision making factors a DBA should understand and research before talking to management about moving to SQL Server 2008?

It is very difficult sometimes for DBA to make a compelling reason to upgrade to SQL Server 2008 in some organization. The same is true for new development with SQL Server 2008.  It is necessary is to understand where your organization or specific projects fall into this spectrum. Depending on the expectations at this level, It can drive a great deal of the decision to upgrade or perform new development.  If you are faced with a situation where the upgrade has not been a consideration, try to learn about the decision making factors to understand how the technology is perceived by the business.

Below are the drivers from a business, technology and career perspective one needs to know about when planning for upgrade to SQL Server 2008.

1. Business

When making the case to upgrade or perform new development on SQL Server 2008 with your management team, be sure to speak their language.  They might not understand all of the intricacies of SQL Server and they may not have the time to research the items you bring forth, so consider these aspects when speaking to them:

  • Operating costs – Based on how your organization does business, research the SQL Server 2008 features then outline aspects of SQL Server 2008 that will reduce operating costs or improve business\process efficiency.  Here are a few examples:
    • Resource Governor can be used to split the resources across the SQL Server 2008 instance for multiple business, application or processing needs extending the life of the hardware platform
    • Transparent Data Encryption: Enabling encryption of the database, data files, or log files, including Search encrypted data. Data encryption provides the ability to natively protect your sensitive data and put your customers at ease.
    • Change Data Capture (CDC): Auditing is also natively available to ensure your applications are meeting internal, industry and contractual agreements. Change Data Capture (CDC) makes database auditing easier.
    • Performance Data Collection: Reduce Performance tuning and troubleshooting which is a time-consuming task for DBA. A new centralized data repository for storing performance data, and new tools for reporting and monitoring.
    • Policy-Based Management: It could possibly save DBAs a huge amount of time admin.
  • Microsoft support – It is very surprising that support from Microsoft alone has not driven customers to upgrade from SQL Server 2000 to SQL Server 2005 or SQL Server 2008.  Based on the two entries below, it appears as if Microsoft will continue to support customers, but the support is based on a customer support agreement or an extended support agreement, which may be more costly than working through the upgrade or ensuring new systems are developed with SQL Server 2008.  Based on your agreement with Microsoft find out about these costs and determine what the financial impacts are for upgrading or not.  Please reference this information from Microsoft concerning SQL Server mainstream support:
  • Return on investment (ROI) – One of the first items when I think about ROI for the SQL Server 2008 platform is extending the life of the hardware.  With the support for hot pluggable CPUs and SQL Server Resource Governor you have the ability to scale and segment the resources to enable SQL Server to grow with your business.  To me this enables the organization to extend the life of the platform with the many hardware advances from various vendors.
  • Third party application support – Many vendors have been working with Microsoft to ensure their products support as well as leverage the new SQL Server 2008 features.  If your organization is dependent on many off the shelf products then it is necessary to ensure your third party vendors will support your platform moving forward.  Service and support agreements can be expensive, so make sure your platform does not violate any portion of the agreement based on the software versions.  If so, consider upgrading to not have an additional expense for extended support similar to the support needed with Microsoft.
  • Contractual agreements, external partners and relationships – In some organizations, the decision to upgrade or perform development on a new platform is based on an external organization or contractual agreement.  If that is the case in your organization, make sure you understand the ramifications before making a hard sell to your management.  Your management may be on board, but you may need to ask them to work with the other party to get SQL Server 2008 approved as an acceptable database platform.

2. Technology

From a technology perspective, SQL Server 2008 has introduced not only enhancements to existing features, but also delivered new functionality with the release to include:

  • Relational Engine
    • Development
      • Query guide support
      • MERGE statement
      • Change tracking support
      • LINQ
      • XML enhancements
    • DBA
      • Backup stream and data storage compression
      • Performance data collection
      • Native auditing objects
  • Integration Services
    • Many new toolbox features
    • Persistent lookups
    • Improved threading
  • Reporting Services
    • Native Microsoft Word based rendering
    • Eliminate IIS dependency
    • Improved charting and graphing
  • Analysis Services
    • Star JOIN support
    • Excel data mining interface
    • Partition table parallelism
  • External Key Management – This new functionality relates to consolidation of key management and integration with external products
  • Data Auditing – This is one core feature of SQL Server 2008 that will include a number of new features to include:
    • The introduction of first class ‘AUDIT’ objects
    • Auditing DDL (data definition language) commands
    • Support for multiple logging targets
  • DBA productivity and efficiency (manageability) – For sure SQL Server Management Studio (SSMS) has been improvised from 2005 to 2008. Here are what I consider to be some of the most important improvements in SSMS 2008:
      • Activity Monitor
      • Object Explorer Details
      • Object Search
      • Multi-Server Queries
      • Intellisense Added to the Query Editor
      • T-SQL Debugger Added to the Query Editor
  • Increased performance – Yes, There is a pre-requisite to achieve better performance that how your database has been normalized and designed and how queries are extracting. But in SQL Server 2008 Microsoft has also invested lots of effort to improve our data warehousing performance, and the performance of our BI services. SSIS, SSRS, and SSAS.
  • Increased availability – There is a sure enhancement in availability with the added features to clustering and fine tuning the aspects of database mirroring as compared to 2005 version. I would say the better availability feature of Cluster has been redesigned as per Windows 2008 features; again there is a catch here about using Enterprise Edition as you cannot achieve them with lesser editions.

To know more about new features of SQL Server 2008. Please refer to this blog post: http://www.mssqltips.com/sqlservertip/1313/sql-server-2008-features-function-and-value/

3. Career

Some employers expect candidates to be up on the latest and greatest technologies.  This is a key item that should be kept in mind while job hunting.

As DBAs\Developers are very busy and do not have time to research all of the latest and greatest technology when development deadlines are looming or systems are having issues.  I think what is necessary is to strike some sort of balance and not be complacent.  In reality very few people have production exposure to SQL Server 2008 as of the writing of this tip, so you are lucky if you are one of the few.  However, the masses have the opportunity to download, install and learn from the CTP’s, online resources (web casts, online hosting, tips, articles, etc.) or local events (user groups, code camps, etc.).  Just be sure to take the time to get some exposure and pull together a plan for yourself to learn about the latest and greatest features of SQL Server 2008 that are of interest to you and your organization.

Reference URL:

http://www.mssqltips.com/sqlservertip/1462/why-upgrade-to-sql-server-2008/

Hope you like this post, please share your thoughts in the comment section.

Posted in SQL Installation and Upgrade | Tagged , , , | 1 Comment

How query revision can impact query Performance

When we see any query, performing badly. Our first reaction to improve query performance is to check hardware and network resources for any issue. If hardware and network resources are correct. Then we go for index tuning to improve query performance. But one thing that all SQL DBA and Developers must consider, is to check the query itself that if it is written properly, if it can be written any other way possible, so that it can use optimal index and do seek operation instead of full table scan. If after assuring that query is written in a optimal way, only then we should go for index tuning.

Today I will talk about the topic of “How query revision impacts query Performance”. For demonstration purpose I need some sample data. Please run below code, to create sample database with tables and important indexes for our demonstration.

NOTE : This code should be executed only on test servers.

IF DB_ID(‘sqlzealot’) IS NOT NULL

DROP DATABASE sqlzealot;

GO

CREATE DATABASE sqlzealot;

GO

USE sqlzealot;

GO

SET NOCOUNT ON;

IF OBJECT_ID(‘dbo.Nums’, ‘U’) IS NOT NULL

DROP SYNONYM dbo.Nums;

CREATE TABLE dbo.Nums(n INT NOT NULL PRIMARY KEY);

DECLARE @max AS INT, @rc AS INT;

SET @max = 1000000;

SET @rc = 1;

INSERT INTO dbo.Nums(n) VALUES(1);

WHILE @rc * 2 <= @max

BEGIN

INSERT INTO dbo.Nums(n) SELECT n + @rc FROM dbo.Nums;

SET @rc = @rc * 2;

END

INSERT INTO dbo.Nums(n)

SELECT n + @rc FROM dbo.Nums WHERE n + @rc <= @max

GO

IF OBJECT_ID(‘dbo.Orders’, ‘U’) IS NOT NULL

DROP TABLE dbo.Orders;

GO

IF OBJECT_ID(‘dbo.Customers’, ‘U’) IS NOT NULL

DROP TABLE dbo.Customers;

GO

CREATE TABLE dbo.Customers

(

custid Char(1) NOT NULL,

custname VARChar(20) NOT NULL

);

GO

CREATE TABLE dbo.Orders

(

orderid INT NOT NULL,

custid Char(1) NOT NULL,

orderdate DATETIME NOT NULL

);

GO

Declare @customerCount INT = 5;

Declare @OrderCount INT = 1000000;

Declare @numyears AS INT = 4;

Declare @startdate AS DATETIME = ‘20050101’;

INSERT INTO dbo.Customers(custid, custname)

SELECT custid, N’Customer_’ + custid AS custname

FROM (SELECT CHAR(ASCII(‘A’) 2 + 2 * n) AS custid

FROM dbo.Nums

WHERE n <= @customerCount) AS D;

 

INSERT INTO dbo.Orders(orderid, custid, orderdate)

SELECT n AS orderid,

CHAR(ASCII(‘A’) 2

+ 2 * (1 + ABS(CHECKSUM(NEWID())) % @customerCount)) AS custid,

DATEADD(day, n / (@OrderCount / (@numyears * 365.25)), @startdate)

— late arrival with earlier date

CASE WHEN n % 10 = 0

THEN 1 + ABS(CHECKSUM(NEWID())) % 30

ELSE 0

END AS orderdate

FROM dbo.Nums

WHERE n <= @OrderCount ORDER BY CHECKSUM(NEWID());

GO

CREATE CLUSTERED INDEX idx_cl_od ON dbo.Orders(orderdate);

ALTER TABLE dbo.Orders ADD

CONSTRAINT PK_Orders PRIMARY KEY NONCLUSTERED(orderid);

 

 

In this code I created sqlzealot database. I created dbo.Nums table which is table of 1000000 continuous numbers from 1 to 1000000. I created dbo.Customers and dbo.Orders table. I populated 5 Customers values using ASCII values of alphabets. I populated 1000000 order values in dbo.Orders table using 1000000 values in Nums table. I used ABS(CHECKSUM(NEWID())) to get random custid values for 5 distinct Customers. I populated Orderdate values starting from startdate up-to 4 years. I created Clustered index idx_cl_od on on orderdate column and PRIMARY KEY NONCLUSTERED INDEX on orderid column of dbo.Orders table.

What is your Query ?

The request is to return customers that did Order before year 2004 but did not have any order after 2004. That is, a qualifying customer is one for whom you cannot find an order on or after 2004. You don’t care about customers, who have made no orders at all.

At present in our dbo.Orders table we don’t have any custid entry who has ordered before year 2004. To do some entry, run the following code to add a few customers to the Customers table and a few orders to the Orders table:

INSERT INTO dbo.Customers(custid, custname) VALUES

(‘B’, ‘Customer_B’),

(‘D’, ‘Customer_D’),

(‘F’, ‘Customer_F’),

(‘H’, ‘Customer_H’),

(‘X’, ‘Customer_X’),

(‘Y’, ‘Customer_Y’),

(‘Z’, ‘Customer_Z’);

 

INSERT INTO dbo.Orders(orderid, custid, orderdate) VALUES

(1000001, ‘B’, ‘20030101’),

(1000002, ‘D’, ‘20030101’),

(1000003, ‘F’, ‘20030101’),

(1000004, ‘H’, ‘20030101’);

 

You’re supposed to get the customers IDs B, D, F, and H in the result. These are the only customers that were active at some point but not as of 2004.

Here we want to search for the maximum orderdate value for each custid, so naturally the optimal index would be a nonclustered covering index defined with custid and orderdate as the key columns, in that order:

CREATE NONCLUSTERED INDEX idx_nc_cid_od

ON dbo.Orders(custid, orderdate);

 

The first solution that come to my mind is a natural GROUP BY query that many programmers would come up with:

Solution 1:

SET STATISTICS IO ON;

SET STATISTICS TIME ON;

SELECT custid

FROM dbo.Orders

GROUP BY custid

HAVING MAX (orderdate) < ‘20040101’;

SET STATISTICS TIME OFF;

SET STATISTICS IO OFF;

 

Here I used SET STATISTICS IO and SET STATISTICS TIME for I/O scan details and CPU time and execution time details of a query. Use Ctrl + M to include Actual execution plan before running this query.

This query ran for about half second on my computer. The optimizer produced the execution plan shown in Figure 1 for this query.

clip_image002

Figure 1

The plan shows that our covering index was fully scanned in order. The maximum orderdate was isolated for each custid by the Stream Aggregate operator. Then the filter operator filtered only customers for whom the maximum orderdate was before ‘20040101’.

Here are the vital performance measures I got for this query:

Logical reads : 2483

CPU time : 484 ms

Elapsed time : 475 ms

Of course, this solution can be a big improvement over the cursor-based one in terms of both performance and code readability and maintenance. However, a run time of close to half second for such a query might not be satisfactory. Keep in mind that an Orders table in some production environments can contain far more than one million rows.

Now we need to figure out whether this query has potential for optimization. Remember that in the execution plan for the last query, the leaf level of the index was fully scanned to obtain the latest orderdate for each customer. That scan required 2483 page reads. Our customers table contains 12 customers. In our index, the rows are sorted by custid and orderdate. This means that in some groups of rows—a group for each custid—the last row in each group contains the latest orderdate that you want to inspect.

Of course, if you request the latest orderdate for a particular customer, the optimizer can use a seek directly to the last customer’s row in the index. Such a seek would cost three reads in our case. Then the optimizer can apply a TOP operator going one step backward, returning the desired value—the latest orderdate for the given customer—to a Stream Aggregate operator. The following query demonstrates acquiring the latest orderdate for a particular customer, producing the execution plan shown in Figure 2:

SELECT MAX(orderdate) FROM dbo.Orders WHERE custid =‘A’;

clip_image004

Figure 2

This plan incurs only three logical reads. Now, if you do the math for 12 customers, you will realize that you can potentially obtain the desired result with substantially less I/O than 2483 reads.

Realizing that what you’re after is invoking a seek operation for each customer, you might come up with the following attempt as a step toward the solution (prior to filtering):

SELECT custid,

(SELECT MAX(orderdate)

FROM dbo.Orders AS O

WHERE O.custid = C.custid) AS maxod

FROM dbo.Customers AS C;

 

You query the Customers table, and for each customer, a subquery acquires the latest orderdate value (aliased as maxod).

But strangely enough, you get the plan shown in Figure 3, which looks surprisingly similar to the previous one in the sense that a full ordered scan of the index on the Orders table is used to calculate the MAX aggregate.

clip_image006

Figure 3

You may have expected the optimizer to first scan the 12 customers from the customers table and then use a loop that for each customer applies a seek operation in the index to pull the max orderdate for that customer. It appears that this query fell victim to an attempt the optimizer made to improve the query performance, while in practice it ended up hurting it. The optimizer unnested the correlated subquery, converting it internally to a join. The reason that the optimizer applies such rearrangements is that the join form tends to be optimized better. This query incurred 2483 logical reads against the Orders table and ran for close to one second on my computer. It seems that the optimizer got too sophisticated this time.

Here The optimizer pulls a trick on you; now pull your best trick. One attempt before considering a complete rewrite of the solution is to use a logically equivalent query but with the TOP option instead of MAX. The reasoning behind trying this trick is that from observations of many plans, it appears that the optimizer does not unnest subqueries when you use TOP.

You issue the following query:

SELECT custid,

(SELECT TOP (1) orderdate

FROM dbo.Orders AS O

WHERE O.custid = C.custid

ORDER BY orderdate DESC) AS maxod

FROM dbo.Customers AS C;

 

Now You will see the plan you wished for, as shown in Figure 4.

clip_image008

Figure 4

The customers table is scanned, and for each of the 12 customers, a Nested Loops operator invokes a similar activity to the one you got when invoking a query for a particular customer. This plan incurs only 2 logical reads against customers and 36 logical reads against Orders. The net CPU time is not even measurable with STATISTICS TIME (shows up as 0), and I got about 100 milliseconds of elapsed time. You can now slightly revise the code to have the subquery in the WHERE clause and filter only the customers with a maximum order date that is before 2004, like so (call it set-based solution 2):

Solution 2:

SELECT custid

FROM dbo.Customers AS C

WHERE

(SELECT TOP (1) orderdate

FROM dbo.Orders AS O

WHERE O.custid = C.custid

ORDER BY orderdate DESC) < ‘20040101’;

 

Here are the vital performance measures I got for this query:

Logical reads : 36

CPU time : 0

Elapsed time : 1 ms

The plan is very similar to the one you got prior to filtering, but with an additional filter operator :

 clip_image010

Figure 5

There can be other solutions also by making slight revision (logically meaningless ones, mind you) to the MAX version of the solution:

Solution 3 :

SELECT custid

FROM dbo.Customers AS S

WHERE

(SELECT DISTINCT MAX(orderdate)

FROM dbo.Orders AS O

WHERE O.custid = S.custid) < ‘20040101’;

Solution 4 :

SELECT custid

FROM dbo.Customers AS S

WHERE

(SELECT TOP (1) MAX(orderdate)

FROM dbo.Orders AS O

WHERE O.custid = S.custid) < ‘20040101’;

 

 

In both cases you get the more efficient plan that first scans the 12 customers and in a loop pulls the maximum order date with a seek against the index on the Orders table.

In short, I’d be reluctant to rely on any of the preceding variations just because of the big impact that the slight revisions have on the way the query is optimized. Because all these are unstable solutions which are working fine with current version of SQL Server but we can’t say for sure that they will run as optimally as they are in the future versions. I’d keep looking for alternatives that are more stable.

If you look hard enough, you will find this one (call it set-based solution 5):

Solution 5 :

SELECT custid

FROM dbo.Customers AS S

WHERE NOT EXISTS

(SELECT * FROM dbo.Orders AS O

WHERE O.custid = S.custid

AND O.orderdate >= ‘20040101’)

AND EXISTS

(SELECT * FROM dbo.Orders AS O

WHERE O.custid = S.custid);

 

 

This solution is natural and in fact is quite a literal translation of the English phrasing of the request.

You query the Customers table and filter Customers for whom you cannot find an order on or past ‘20040101’ and for whom you can find at least one order. You get the plan shown in Figure 6.

clip_image012

Figure 6

The Customers table is scanned, yielding 12 rows. For each Customer, a Nested Loops operator invokes a seek against our covering index to check whether an orderdate of ‘20040101’ or later exists for the Customer. If the answer is no, another seek operation is invoked against the index to check whether an order exists at all. The I/O cost against the Orders table is 59 reads—slightly higher than the previous solution. However, in terms of simplicity and naturalness, this solution(Solution 5) wins big time! Therefore, I would go with it.

Now you probably realized that, index tuning alone is not enough; you can do much with the way you write your queries. I hope you will like this post, Please share your thoughts and views in the comment section.

 

Posted in Advanced SQL | Tagged , , , , , | Leave a comment