Latest Comments
No website changes have been recorded.
Search
Stuff I like
  • The Essays of Warren Buffett: Lessons for Corporate America, Second Edition
    The Essays of Warren Buffett: Lessons for Corporate America, Second Edition

    Read this review post

  • The Five Rules for Successful Stock Investing: Morningstar's Guide to Building Wealth and Winning in the Market
    The Five Rules for Successful Stock Investing: Morningstar's Guide to Building Wealth and Winning in the Market

    Read this review post

  • Programming Pearls (2nd Edition)
    Programming Pearls (2nd Edition)
  • Pattern Recognition and Machine Learning (Information Science and Statistics)
    Pattern Recognition and Machine Learning (Information Science and Statistics)
  • Programming Collective Intelligence: Building Smart Web 2.0 Applications
    Programming Collective Intelligence: Building Smart Web 2.0 Applications
  • Dyson DC25 Ball All-Floors Upright Vacuum Cleaner
    Dyson DC25 Ball All-Floors Upright Vacuum Cleaner
  • Sigma 30mm f/1.4 EX DC HSM Lens for Canon Digital SLR Cameras
    Sigma 30mm f/1.4 EX DC HSM Lens for Canon Digital SLR Cameras

Powered by Squarespace
« Orh | Main | Cookie Monster Meditates »
Wednesday
14May2008

Canonical Strings, or, why I like Python

I needed a quick and easy function to map strings into a canonical form. In this case, punctuation, upper/lower case, and word order are not important. i.e. "!$%!@$!@!This!?! is... a test" == "a test this is". Less than 1 minute and I am good to go with...

import re
re_punctuation = re.compile(
r"[`~!@#\$%\^&\*\(\)\-_\+={\[}\]\\|;:\'\",<\.>/\?]")
def GetCanonical(input):
canonical = re_punctuation.sub(" ", input.lower()).split()
canonical.sort()
return ' '.join(canonical)

GetCanonical("This is a test") == GetCanonical("a test this is")

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>