On a recent project assignment I was tasked to comb through a client database and target a specific field that resided within a single table. This field was powered by a free form text field on a .NET web application. Instead of separating the required bits of data into their own fields respectively everything was lumped into one NVARCHAR field. Definitely not an ideal situation. Nonetheless the task needed to be done. I spoke with the our Business Analyst and obtained the requirements.
So I fired up SSMS 2008R2 and started to formulate the query. That was my first mistake! I regressed back to my rookie days without really thinking it through and instead of really researching in-depth I started to ask for assistance (twitter hash tag: #sqlhelp to the rescue). Don’t get me wrong… asking for help isn’t terrible, it’s just that thinking it trough first really makes a significant difference. To make a long story short let me tell you what steps I took to make this move along smoother.
Step 1: Always review the data first
If I would have carefully inspected the data first I would have noticed the different ways the end users were entering the data. Then I could have easily constructed special conditions to target those specific instances. For example here is a short list of what the data kind of looked like:
- Select on plan/Spec 283
- Side Menu/Spec 118 Sht 30
- Side Menu/Spec 474 Sht C1
- Select on plan/Spec 283 Sheet 17
- Select on Plan/Spec 278 Sht 44 Dtl A
- Select on Plan/Spec 389 Sht 4 Bypass Piping
- Select on Plan/Spec 408.1 Sht M-2
- Site Piping Menu/Spec 408.1 Sht C-1 Plan
- Surge Tank Menu/Spec 408.1 Sht M-4 Dtl B
- Select on plan/Sheet 8 Detail C
- Select from Side Menu/Sheet 7 Detail C
- Menu from Discharge Relief Valve/Sheet 8 Detail F
- Menu for Valve/Sheet 8 Detail A
- Select on plan/Sheet 12 Detail C
- Menu for Pressure Transmitter Cabinet/Sheet 20 Detail B
- Menu for Suction Surge Tank 2/Sheet 15 Detail V
My task was to find the instance of SPEC, SHEET and SHT and grab the following numbers or char. As you can see there is only one instance of each one so that made it a bit easier.
Step 2: Identify the patterns
I took each keyword and began analyzing the data carefully:
For SPEC the following three were always numeric (e.g. Spec 283) and there were a small handful the had three numeric followed by a period and another numeric digit (e.g Spec 408.1) so my options are grab the following four char or six to the right.
For SHEET it was pretty easy. Always trailing SHEET was a numeric digit but the only difference was either a single digit (e.g. Sheet 2) or double digits (e.g. Sheet 12). So that means I can grab the following three char to the right and trim the left and right to satisfy this requirement.
The SHT on the other hand was a bit more difficult since it varied the most. This either had a trailing single numeric digit (e.g. Sht 2), double numeric digit (e.g. Sht 13) or mixed with alpha-numeric chars like: (e.g. Sht M-1 or Sht P8 or Sht P20).
Now that I can clearly see what patterns to look for I can begin to construct my query. Nothing like a little Function action using CHARINDEX mixed with a little dose of PATINDEX to assist in situations likes this. Of course there are many ways to approach this, but this worked for me and satisfied the requirements.
Step 3: Approach each requirement one at a time
This permitted me to concentrate on a specific goal until I was able to correctly return the desired output. Naturally I felt that attacking the easiest ones first would be best, but after reading the book, “Eat that Frog!” I learned to eat the biggest and ugliest frog first as it provides the greatest sense of accomplishment. So I started swinging at SHT. Then once that was crystal clear I moved onto SPEC and SHEET. I must mention that a fellow SQL buff by the name of Aaron Bertrand [ Twitter | Blog ] actually provided a very cool script that tackled the SPEC and covered all the basis. So Cheers to him!
Step 4: Piece everything together and test again
Once everything is working accordingly piece everything together and test it again. I cannot emphasize this enough. Q/A is necessary and must be done until everything comes out the way we need it to. Data is critical to any organization and we as data professionals know it better than anyone else. So test, test and retest. Everything worked out as needed and the results are positive.
Off the subject
If I didn’t mention this before #sqlhelp is extremely useful and there are many people that are extremely talented that are willing to lend you a helping hand. It’s a community thing and I love it! If twitter ain’t your thing that’s fine no biggie; however, if you are looking into it… do it! You won’t be sorry.