Latest Comments
« Orh | Main | Cookie Monster Meditates »

Canonical Strings, or, why I like Python

I needed a quick and easy function to map strings into a canonical form. In this case, punctuation, upper/lower case, and word order are not important. i.e. "!$%!@$!@!This!?! is... a test" == "a test this is". Less than 1 minute and I am good to go with...

import re
re_punctuation = re.compile(
def GetCanonical(input):
canonical = re_punctuation.sub(" ", input.lower()).split()
return ' '.join(canonical)

GetCanonical("This is a test") == GetCanonical("a test this is")

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>