Posts Tagged regex
Odd thing character classes
One of the major advantages of programming, in my eyes at least, is that you get punched in the face on a regular basis, which keeps you humble and hardens you for the occasional shitstorms that come by. I thought I had regular expressions down. It’s an easy concept an incredible useful: In fact I feel like it has the same relationship with character data as a chainsaw has with wood. If you pay attention to it, everything works like a charm, if not you can cut your foot off. So this is how I amputated mine:
WANTED: Regular expression that matches a dict URL without a path.
FIRST DRAFT: dict://[0-9a-zA-Z$-_.+!*'(),]+
All the stuff in the character class is obviously characters allowed in an URL according to the standard. And here I already received my uppercut: The third “-” here is intended to mean just the character itself, but what it actually means is every character from $ to _. So this matches crap like dict://]. So now I knew why my head hurt, but not how to cure it. Because you can’t escape it, which you would do with a special character outside a character class, I was pretty clueless. Until it occurred to me that I could put “-” at the end - that did the trick:
FINAL SOLUTION: dict://[0-9a-zA-Z$_.+!*'(),-]+
So now I can get myself a cup of coffee and an aspirin.
P.S.: If my dear reader doesn’t know about regular expressions yet and because of the tone of this articles thinks this is something for weirdos with an affinity to write in a totally inappropriate style, than you are absolutely right. But if you want to avoid the next one hundred line code massacre because you have to check some user input for correctness, go get a decent Book and save yourself tons of time.