How To: Count the occurrence of a character in a string in SQL#
Last night I was trying to cleanup the spammers from the database of devpinoy.org and while I was evaluating the result sets i was able to conclude that aside from using common spam text like 'cheap', 'buy', 'free', 'deal', 'viagra', 'prozac' that 30% of the false emails that spam accounts are using multiple dots on their email address. A good example is a subset below from the list of offenders that I found in the devpinoy db.


Having found that fact I immediately created a sql script that will delete users from the database if they have more than 2 dots in their email address.

Enough with the side note and here is some code.

DECLARE @string2check varchar(50)
DECLARE @character2find char

SET @string2check = 'this is a very long string'
SET @character2find = 'i'

PRINT LEN(@string2check) - LEN(REPLACE(@string2check, @character2find, ''))

What the code above is doing is that it is removing the characters that matched our search key and then subracts the length of that string to the original string to find the total occurrence of the character we are looking for.

Now, if you want to use this as a function you can use this:

CREATE FUNCTION udf_CountCharOccurence
(    @string2check varchar(500)
 ,    @character2find char
)RETURNS INT
BEGIN

RETURN (LEN(@string2check) - LEN(REPLACE(
                                @string2check, 
                                @character2find)
                                        )
                                )

END
GO
The code above works great but there's a catch. If you are concerned with case sensitivity then the code above wont work. The way around it is to use COLLATION which is supported by the SQL function below:

CREATE FUNCTION udf_CountCharOccurenceCaseSensitive
(    @string2check varchar(500)
 ,    @character2find char
)RETURNS INT
BEGIN

RETURN (LEN(@string2check) - LEN(REPLACE(
                                @string2check COLLATE SQL_Latin1_General_Cp1_CS_AS, 
                                @character2find COLLATE SQL_Latin1_General_Cp1_CS_AS, '')
                                        )
                                )

END
GO
In order to use this in your query all you need to do is
PRINT dbo.udf_CountCharOccurenceCaseSensitive('This is a long text','i')
Or if you want to put it to use to meet the criteria that I mentioned about dots on emails you can do it this way:
SELECT * FROM Users
WHERE dbo.udf_CountCharOccurenceCaseSensitive(EmailAddress,'.') > 2

HTH


Wednesday, June 09, 2010 10:35:38 PM (GMT Daylight Time, UTC+01:00) #    Comments [0]  | 

 

All content © 2010, Keith Rull
On this page
This site
Calendar
<June 2010>
SunMonTueWedThuFriSat
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910
Archives
Sitemap
Blogroll OPML
Disclaimer

Powered by: newtelligence dasBlog 2.3.9074.18820

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Send mail to the author(s) E-mail

Theme design by Jelle Druyts


Pick a theme: