As an IT professional, it is important to have a lab environment to play around with – whether you’re a developer writing code or a systems administrator building servers – to test ideas and concepts prior to doing it for reals. Time and time again, I’ve heard folks say how expensive it is to build a lab environment. That used to be the case but not anymore. With cloud providers and virtualization options, there is really no excuse to not build a personal lab environment that is both affordable and economical.
A few days ago, one of my customers reached out to me on Yahoo Messenger (yes, it still exists) and asked how to identify what the potential data loss is when DBCC CHECKDB reports corruption of a SQL Server database. My common response is the usual “it depends” in that there are cases when DBCC CHECKDB may recommend using the option REPAIR_ALLOW_DATA_LOSS. And while you may be fine with doing so, it may not be supported. An example of this is a SharePoint database where Microsoft KB 841057 specifically mentions that using this option renders the database in an unsupported configuration. But say you have decided to proceed, how do you know what data potentially gets lost? This blog post walks you thru the process of identifying potential data loss when DBCC CHECKDB reports corruption in your SQL Server database.
This video is a compilation of different database recovery techniques that SQL Server DBAs should be familiar and comfortable with. We will look at recovering a database to a specific point in time, isolating critical objects or using table partitioning as an HA/DR option (more commonly called online piecemeal restore) and performing page-level restores.
I started doing some serious database disaster recovery stuff in 2006 while working for Fujitsu Asia. As the only hard core SQL Server guy in a team of Wintel engineers, the buck stops with me when it comes to anything related to SQL Server. And since our main offerings all revolve around high availability and disaster recovery, I have to be very familiar and well versed with all possible recovery techniques that are available to the version of SQL Server that we are running. On top of that, I need to make sure that all our backup strategies meet a specific application platform’s recovery objectives and service level agreements.
“The only way to walk on water is to step out of the boat – and focus on Jesus.”
I’ve made it official last week on my LinkedIn profile. It’s been interesting to see the different comments I got after my LinkedIn status change. But what’s really exciting is the journey and the story behind it.
I have worked for Pythian since November 2008 – started out as a SQL Server DBA until eventually moving on to become one of their senior principal consultants. They were responsible for helping me and my family migrate from Singapore to Canada. I have had the privilege of working with the smartest and brightest data professionals in my entire career, exposed to some of the most challenging and complex environments and worked on different projects not directly related to SQL Server – Hadoop, SharePoint, System Center, Citrix and VMWare to name a few. I can say that I have been on stable ground in my employment since the day I joined. So, you might ask, “Why leave, then?“
I’ve been working on this personal project since early this year. If you’ve been following my blog posts, my articles on MSSQLTips.com or even my presentations at various events, you know that my area of expertise is on SQL Server high availability and disaster recovery. I’ve compiled years of experience and exposure with SQL Server and related technologies to prepare this online course, some of which were delivered to events and conferences worldwide. One of my personal favourite is the topic on Database Recovery Techniques where I vividly recall delivering my presentation at Microsoft TechEd Southeast Asia back in 2007 in a room full of about 200 attendees where my demos failed dramatically. Imagine trying to present on the topic of database disaster recovery when the most important thing that you need to do was the very thing that you forgot to do. It was the basis of a previous blog post on delivering presentations.
But this is more than just an online course. It is my commitment to continuous personal growth. It’s also an expression of faith and taking risks. I’ve experienced a lot of failures in my entire career, one of which is the now defunct BlogcastRepository.com website where I hosted my very first attempt at creating video lessons on SQL Server 2008 back when it was still in CTP. Part of preparing this online course is realizing that it may or may not work, similar to what happened with the video lessons I recorded for BlogcastRepository.com. But I set aside my fears and decided to work on it anyway – skipping holiday weekends and possible movie nights. I even had to put down my digital camera for a while to focus on this project. This online course contains within it several parts of who I am – the risk taking, fearful, committed, and dedicated individual who chose to persist despite his failures.
This is just the beginning. I’m still experimenting and trying out a couple of ideas. But I have an offer to make. If you’re a SQL Server DBA who is serious about taking your skills and career to the next level and willing to help someone else in the process, let me know how I can help.
The premise of this webcast is based on my conversations with customers who wanted to implement SQL Server 2012 Availability Groups. There have been a lot of confusion about the terminologies, technologies and implementation ever since Availability Groups was introduced in SQL Server 2012. This webcast seeks to explain the underlying fundamentals behind Availability Groups.
“During one of my SQL Server Index Internals presentation, an attendee asked me how filtered indexes look like and if they are stored exactly the way the traditional indexes are. I was tempted to show the internals of how filtered indexes looked like using DBCC PAGE during the presentation but held back because it deserves its own blog post. “
Whenever I do my SQL Server index internals presentation, I show the attendees how to navigate thru the index structure to find the record (or records) retrieved in a query. I start by using the undocumented command DBCC IND to analyze the index structure for a particular index – clustered or non-clustered. I, then, use the page numbers returned by DBCC IND to navigate to the index structure using DBCC PAGE. As an example, I’ll use the [Sales].[SalesOrderDetail] table in the AdventureWorks database. What I did was to copy the contents of the [Sales].[SalesOrderDetail] table in another database to simplify my queries and eliminate the Sales schema in the name.
The original [Sales].[SalesOrderDetail] table has a clustered index defined on the SalesOrderID and SalesOrderDetailID columns.
Use SampleDB GO select name, index_id from sys.indexes where object_id = (select object_id from sys.objects where name = 'SalesOrderDetail') Go sp_helpindex 'SalesOrderDetail'
I grab the index id value of the index that I want to analyze and use it in my DBCC IND command. In this case, the clustered index id value is 1 (since you can only have one clustered index per table, rest assured that the index id value of all clustered indexes in all of your tables will be 1.)
DBCC IND (SampleDB, SalesOrderDetail, 1)
The output of DBCC IND displays a lot of information but what I’m more concerned about are these two: the level of the page in the index structure and its corresponding page number. The highest level in the index structure is the root page and is the starting point of the navigation in the index structure. Now, if you look closely on the output of DBCC IND, only the page numbers in the PagePID column are in increasing order. If you want to know the highest level in the index structure, you need to either store the results of DBCC IND in a table that you can sort by column. Or, if you’re like me who likes Excel a lot, you can just copy-and-paste the results in Excel and sort by the index level column. Having already done that, I’ll use DBCC PAGE to display information about my root page – in this case, page 1794. I’m using the trace flag 3604 to send the output of DBCC PAGE to SQL Server Management Studio.
DBCC TRACEON (3604) GO DBCC PAGE (SampleDB, 1, 1794, 3) -- page id of the root page
(A more detailed description of using the DBCC PAGE command can be found in this a bit outdated (but still relevant) Microsoft KB article. Paul Randal, CEO of SQLSkills.com and former Program Manager on the SQL Server product team, also blogged about using both DBCC IND and DBCC PAGE.)
The page number that I will check next will depend on the query that I run. If my query requires a SalesOrderID value of 50105 (for example, SELECT CarrierTrackingNumber, ModifiedDate FROM SalesOrderDetail where SalesOrderID=50105,) I will look at the page number that contains this index key value since the SalesOrderDetail column is included my clustered index definition. From the screenshot, that page number is 1793. The index key value contains the lower boundary of the values contained in the index page. I’ll use that page number in my next run of the DBCC PAGE command.
DBCC TRACEON (3604) GO DBCC PAGE (SampleDB, 1, 1793, 3)-- page id of the page containing SalesOrderID=50105
If you pay close attention to the results of the DBCC PAGE command, the previous run is still at level 1. This means that we are still at the non-leaf level of the index structure. We still need to go down another level – level 0 – to retrieve the two other columns that we need – CarrierTrackingNumber and ModifiedDate – since these two columns are not included in the index definition. Using the same process that we did in the previous run of the DBCC PAGE command, we’ll grab the page number that contains the index key value of the SalesOrderDetail column equal to 50105. From the screenshot, that page number is 2311.
DBCC TRACEON (3604) GO DBCC PAGE (SampleDB, 1, 2311, 3)-- page id of the page containing SalesOrderID=50105
From the screenshot, you’ll see the contents of the data page. Since a clustered index contains the table itself, the leaf-level of the clustered index contains all of the other columns and their corresponding data. And, since we’re already at the lowest level of the index structure, this page is where the data is stored.
The way I used DBCC IND and DBCC PAGE to navigate thru the index structure is how SQL Server uses indexes to retrieve the records required by your query. Of course, there are other factors that influence query performance like statistics and index fragmentation. This is just the ideal way that SQL Server uses the index structure to retrieve records from the data pages.
Now, you might be asking, “Isn’t this blog post about filtered indexes and how they look like? How come I don’t see any discussion about filtered indexes?” Well, you’re absolutely right. That’s because in order to understand how filtered indexes look like, we need to understand how the index structure looks like and how SQL Server uses the index structure to navigate to the records that we retrieve from our queries. I just used clustered indexes as an introduction. Now, what are filtered indexes? Introduced in SQL Server 2008, filtered indexes are a type of non-clustered indexes that contain a subset of data. Think of it this way: a non-clustered index contains a record for each row in your table while a filtered index is a non-clustered index with a WHERE clause. This means that a filtered index requires less storage space thereby resulting in improved query performance and reduced index maintenance costs.
As an example, I will create two non-clustered indexes on the SalesOrderDetail table using the ModifiedDate column. The first index will be your traditional non-clustered index while the second one is a filtered index.
USE [SampleDB] GO --traditional non-clustered index CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ModifiedDate] ON [SalesOrderDetail] ( [ModifiedDate] ASC ) GO -- filtered index: non clustered index with a WHERE clause CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ModifiedDate_Filtered] ON [SalesOrderDetail] ( [ModifiedDate] ) WHERE ModifiedDate>'2004-01-01' GO
In part 2 of this blog post, we will look at how filtered indexes look like from a storage perspective and compare them with the traditional non-clustered indexes. Stay tuned.
I have not opened a Microsoft Connect item before so I don’t know how the bug reporting and resolution process works at Microsoft for the SQL Server product. I just happened to receive a phone call from one of my customers asking why they are getting the error message below on one of their SQL Server instances.
Cannot show requested dialog “There is no row at position 0”
He is using SQL Server 2008 R2 Management Studio to connect to several SQL Server instances. One particular scenario was that he was getting this error message when retrieving the database properties of one of the SQL Server instances using the Properties dialog box in Management Studio but not on the rest of the instances. Google did return several results on resolving this issue. In case you hit this particular issue, follow the steps outlined below.
- Query the spt_values table in the master database. If you don’t get any results, it means that the table has been truncated. How it got truncated is an exercise for you to figure out
- Look for the u_tables.sql file usually found in the <drive>:\Program Files\Microsoft SQL Server\MSSQL10_50.SQL00\MSSQL\Install folder
- Run the u_tables.sql file to re-populate the spt_values table in the master database
After running the SQL script, verify that you can now open up the Properties dialog box from within SQL Server Management Studio.
Now, my curiosity didn’t stop there. I found Microsoft Connect Bug ID 258528 which is marked as Closed and Not Reproducible. I asked the guys from the SQL Server Product Team about why the Connect item was closed and, according to them, it was a terribly written bug. Initially, I had the feeling that they were just trying to avoid me altogether but knowing how dedicated and committed they are to addressing issues with SQL Server and delivering a great database platform, I asked even further. It gave me an understanding of how bug fixes are addressed. Whenever a Microsoft Connect Item is opened, the teams reviewing the item need to identify a handful of information, including
- how to reproduce the problem
- the root cause of the issue
- the impact on users
- the frequency with which the problem occurs
- the cost and risk involved in fixing it
- any available workarounds
This simply means that the more detailed information provided regarding the issue, the easier it is for Microsoft to triage, fix & then verify the issue is actually fixed. It’s like telling your mechanic that your car won’t start when you didn’t even tell him about your broken car key or that the fuel tank is empty, etc. So, if you’re considering opening a Microsoft Connect item for a Microsoft product – whether it’s SQL Server, Microsoft Exchange, SharePoint, etc. – I encourage you to provide as much detailed information as you possibly can to assist Microsoft Product teams in addressing the issue. Effective communication is key to proper problem resolution.
If you’ve heard about PowerPivot for Excel, you may have heard about the self-service business intelligence capabilities of the tool. I guess it’s about time to explore PowerPivot for Excel. Check out this article to get started. I think this is one of the best things that came out of the collaboration between the Excel product team as well as the Microsoft SQL Server Analysis Services team.
You were in your favourite bar one Saturday night when, suddenly, you hear your mobile phone ring. You pick up the phone and heard the sound of a screaming voice on the other end (no, it’s not your wife telling you to go home and take out the trash). The background noise is preventing you from understanding what is actually being said. You checked on the phone number that registered on the phone – it’s your manager. You get out of the bar to clearly hear what is being said until you barely hear the last phrase, “the production database is in recovering state for more than an hour now…” And, then, your battery went dead. Sounds familiar?
In a previous blog post, I talked about the different acronyms that come with the term disaster recovery. In this blog post, I’ll talk about key items that we sometimes tend to ignore when creating a disaster recovery strategy – the lion, the “switch” and the wardrobe (I’ve been a fan of the Narnia movie series from which I got the idea). And, yes, I did get a phone call similar to that while I was driving with my family that I had to pull over and guide the other person on the line as they try to recover the database.
I call them the lions because they represent people with authority and responsibility over the infrastructure that you are preparing the disaster recovery strategy for. You definitely need to include them in. If you’re the DBA and you’re designing a disaster recovery strategy for the database server, then that lion is you. However, there are cases where the server administrator is not necessarily the DBA. That other lion is the server administrator. Oh, and isn’t your database server connected to the network? Then, you have another lion in the pack – the network administrator. And isn’t that a faulty hard drive that caused the disaster? Do you know who the supplier was for that hardware? Yes, that lion belongs to your pack as well. I can go on and on and include a ton of people in this list – the service provider for your network link, the company that stores your tapes offsite, the junior staff that needs to know what to do in case you’re on vacation, your IT manager who needs to make the tough calls in case the need arise. Make sure you know who the lions are in your pack and how to get in touch with them. Document who is responsible for what because a missing lion in the pack will definitely affect your service level agreement.
This is intentional. The switch (not the witch) represents the other types of “hardware” that affect your service reliability. I’ve had some discussions in the past with one of my former customers who happened to have high availability built into their database servers. They had their SQL Server instances running on top of Windows Failover Cluster which they designed after upgrading to Windows Server 2008. However, one of their past outages clearly show that Windows Failover Clustering is totally meaningless if you do not consider the other components of the hardware stack. While multiple nodes of the cluster have provided high availability for their cluster, the main culprit for the outage was the shared storage. Their SAN happened to be on a dedicated network that was causing a bit of an issue with routing. To make matters worse, the “switch” on which the SAN is connected to was shutting down unexpectedly due to power outages that might have been caused by improper wiring on the UPS. They focused so much on the availability for the cluster that they didn’t look at the storage and the network to be potential causes of outages.
Also, in my previous life as a data center engineer, we have had an incident where the production server suddenly experienced performance issues. We couldn’t figure out why because even our remote access sessions won’t go thru to allow us to perform troubleshooting. Until one of the heat sensors in the data center went off. The high CPU utilization was caused by overheating. One of the air conditioners shut down, thus, causing drastic temperature increase inside the data center. Those who spend time visiting a data center know that you need to be wearing a thick enough coat to keep yourself warm while working. Air conditioners are used to control the temperature and humidity to help prevent equipment overheating and, potentially, disaster. While fixing a faulty air conditioner won’t happen in less than a day, designing the data center to allow for such incidents should be considered as part of the disaster recovery strategy (we have had to bring in electric fans and portable air conditioners to temporarily prevent the temperature from rising while the air conditioners are being fixed). You also need to know the lion in the pack responsible for the data center management in case you have your servers co-located somewhere. Bottom line is that you need to consider the other types of hardware that affect your service reliability and should be included in your disaster recovery strategy.
The wardrobe represents storage of stuff. And stuff could be anything that affects your service reliability. One of my favourite wardrobes as far as disaster recovery is concerned is the runbook. It stores the information for a particular system that can be used by anybody should the need arise. Not too many DBAs or IT professionals like the idea of documentation but, as I’ve heard from a few, it’s a necessary evil. If you need to rebuild the server because of hardware upgrades or, worse, disaster recovery, the runbook will be your guide to have the server rebuilt just as it was before. If you don’t have one, chances are that you won’t be able to rebuild your server with the exact same configuration as before. With a runbook, you can have junior staff go thru the process themselves by simply following it in written format. You can even include processes for recovering databases based on your backup strategy. A common rule of thumb for runbooks is simply this – write it so that even the most junior staff can figure it out. The challenge is keeping the runbook updated with the changes made on the system. However, runbooks are definitely a must for disaster recovery strategies.
What about backups? Where are you keeping them? Do you have access to your backups? Are the backup tapes labelled intuitively? Are they safe? Are they stored offsite? This type of wardrobe should be documented as well so you will know where to look for your backups when you need them.
How about storage media? Yes, the media for installing your operating system, your database server, your patches, service packs, application software, etc. Have you heard about a legacy application that is only supported on Windows 2000 Server only to find out that the installation media is missing after the server crashed? Create backups of storage media and document them accordingly so that you can be sure they’re there when you need them.
And, have you even considered yourself as a wardrobe? Yes, you’ve got a ton of information on your head that needs to be shared within your team so that you don’t get as much of those emergency phone calls. You can set up a mentoring session with the junior staff, write documentation (or a blog post like this), do an internal presentation on how to perform a test restore of a backup – anything to make sure you’re not the only person on your team who can do the job. Most people don’t like this idea for fear of losing their job. But this is one thing that would make you more invaluable to the organization. For now, I’ll leave this topic for a professional development blog post.
I’m tempted to dive into the technical details of SQL Server disaster recovery and high availability but I realized that in order to really appreciate the technology aspects of disaster recovery, we need to understand the other aspects that affect and influence it. And just to give you a heads up, I’ll be in Orlando, FL in May for the first ever PASS SQLRally where I’ll be talking about Disaster Recovery Techniques for the SQL Server DBA. If you happened to be within the area or are planning to attend, come see me at my presentation