Solving development problems  |  About this blog

Archive for the ‘regular expressions replace in sql 2005 and 2008’ tag

Regular Expression Replace in SQL 2005 (via the CLR)

Excellent post for achieving great SQL productivity.

I had to do some data clean up the other day, and really needed some regular expression replacements to do the job.

Since .NET has a great RegularExpressions namespace, and since SQL 2005 allows you to integrate .NET CLR functions in your T-SQL code, I thought I’d go ahead and experiment with creating a RegExReplace() function.

I am not so sure that I recommend using a function like this in production (there’s lots of pros and cons of CLR integration in SQL databases), but for data cleaning or quick tasks or just learning how to use new features or technology, it is very interesting and easy to do. All you need is a SQL Server 2005 database (Express is fine) and Visual Studio 2005.

Open up Visual Studio 2005 and create a new SQL Server Project, and after giving it a name and location, you will be prompted to connect to the SQL Server 2005 database in which you’d like to add your code.

Once the project is created, choose Project->Add User Defined Function, and name the .cs file anything you like, such as “RegExFunction.cs”.

Once the file has been added to your project, open it up and paste in the following code (changes made to the original template are in bold):

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;

public partial class UserDefinedFunctions
{
	[Microsoft.SqlServer.Server.SqlFunction(IsDeterministic=true,IsPrecise=true)]
	public static SqlString RegExReplace(SqlString expression,
                 SqlString pattern, SqlString replace)
	{
	  if (expression.IsNull || pattern.IsNull || replace.IsNull)
	  	return SqlString.Null;

	  Regex r = new Regex(pattern.ToString());

	  return new SqlString(r.Replace(expression.ToString(), replace.ToString()));
	}
};

It’s really quite simple; within the class definition, just define public static methods that accept and return SQLTypes, and if those methods are marked with the SqlFunction attribute, when deployed they become available in your database code as T-SQL User-Defined Functions! Quite cool.

In this example, our function is accepting 3 SQLString parameters, and if any are null, we return null. If they are all legit, we construct a RegEx object from the pattern passed in, do the replace, and return the result. Note that this will not be especially efficient, since the RegEx object is created and destroyed for each call, but it does work and it is interesting at the very least to play around with. You might also want to experiment with other options, such as ignoring whitespace or case sensitivity, provided by the RegEx class. This particular code is very basic, and doesn’t handle error checking or anything like that, you may wish to make improvements or optimizations in your own implementation.

Now that your code is ready to go, choose Build->Deploy Solution. If all goes well, your assembly and new function have been deployed to your SQL database!

There is one final thing you must do before you can use the function, and that is configure your server to allow CLR code to execute, if it hasn’t been configured already. To do this, you must execute the following T-SQL statement:

--ENABLE CLR FUNCTIONS AND PROCEDURES FROM C#
sp_configure 'clr enabled',1
GO
RECONFIGURE
GO

Once that is complete, you can now use your new function like any other User Defined T-SQL function. For example,

select dbo.RegExReplace('Remove1All3Letters7','[a-zA-Z]','')

-------------------------
137

(1 row(s) affected)

Now you can do a standard Regular Expression Replacement within your database directly, for example as an UPDATE:

UPDATE MessyTable
SET MessyColumn = dbo.RegExReplace(MessyColumn, ... , ....)
WHERE ...

Here’s my two cents on using CLR code in a database: If the code is purely a generic function or tool that has nothing specific to do with your data, and it fits and works logically in a database querying language, and there is no way to efficiently implement that code in T-SQL, then it may be worthwhile to implement that function via the CLR. This is a pretty good example. A bad example would be a .NET function that returns a CustomerName when passed a customerID, or something along those lines. That’s just my take on things, for what it’s worth.

So, use wisely and have fun!