Using CHARINDEX to clean up bad data

Recently I had a small import task as part of a larger project. I needed to import several different Excel Spreadsheets with customer and vendor data. Over all the process was pretty straight forward.

As with most imports from “other” sources they don’t give you good data. There are several reasons this happens but it is also outside the scope of this article, perhaps I will add a post about that in the future.

The issues I had with the data were with the first name and middle initial columns. They were in the same field and to make things a bit more complicated some of the first names had middle initials but not all.

I think it is important to note that I could have done this within the import process, but as with most solutions, there is usually more than one way to accomplish the end result. This is how I solved the problem.

<code>
SELECT
CASE WHEN CHARINDEX(‘ ‘, c.TFirstName) = 0 THEN SUBSTRING(c.TFirstName, 1, LEN(c.TFirstName))
ELSE SUBSTRING(c.TFirstName, 1, (CHARINDEX(‘ ‘, c.TFirstName) -1))
END AS FirstName
, CASE
WHEN CHARINDEX(‘ ‘, c.TFirstName) > 0 THEN (SUBSTRING(c.TFirstName, CHARINDEX(‘ ‘, c.TFirstName), CHARINDEX(‘ ‘, c.TFirstName)))
END AS MiddleInitial
, c.TLastName AS LastName
FROM dbo._stagingCustomer c
</code>

Using the CHARINDEX (you can find detailed information on the usage of CHARINDEX here: http://tinyurl.com/yzjsbg6)

Firstname: What I am doing here is looking for the first occurrence of a  “space” in between the first name and middle initial. If it is equal to zero then it means the first name does not have a middle initial. So I am getting the first name by using the SUBSTRING function starting at the first character in the string and going the length of the string by using the LEN function.

If it is not equal to zero then grab the first name from the field, again using the SUBSTRING function starting at the first character but then using the CHARINDEX function to find the location of the first occurrence of the space and subtracting 1 from it to get the first name.

MiddleInitial: Here I am checking to see if the CHARINDEX is greater than zero. If it is greater than zero, using the SUBSTRING function and starting at the first occurrence of the space by using the CHARINDEX function by the CHARINDEX returned.

LastName: I just out put the last name column since there were no issues with this column.

Let me know if you have questions.

Cheers!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s