Regular expressions is a powerful tool for solving many problems related to text. It can be misused as any good tool, but there are moments when they are the best solution for a given problem. At those moments the lack of regular expressions for Cocoa on Mac OS X and Cocoa Touch on iPhone OS is a pain in the butt.
Or are regular expressions really missing? Regular expressions can be used with
NSPredicate that is part of Core Data, available since Mac OS X 10.4 and officially announced for iPhone OS 3.0. Cocoa’s
WebView and the equivalent
An Ugly Solution
UIWebView, and delegate execution of regular expression to it.
The complexity of an off-screen
UIWebView could be hidden by a utility class. But the extra glue code needed to make something useful out of the single method
stringByEvaluatingJavaScripFromString:, would be allot.
What Apple Recommends
For most problems the official stance is correct; do not use regular expressions. Instead use
NSScanner, that is perfect for sequentially parse texts. It is very fast and can substitute any regular expressions that only relies on:
- Character sets
- Exact string matches
- Numerical matches
- Uniform input text
These conditions hold true to 95% of everything regular expressions is ever used for. For the other 5%, Apple leaves you to fend for your own.
PCRE compiles perfectly with Cocoa, since it is written in C, one of the many advantages of Objective-C. PCRE is very capable, and almost a standard, but also very large. For an iPhone application the PCRE implementation could end up as the majority of your executables file-size. If this is something you can live with, then the open source RegexKit framework wraps PCRE in Cocoa and Cocoa Touch friendly Objective-C.
Another regular expressions framework is OgreKit. The advantage of OgreKit is full unicode support, with the same disadvantage of size. And the fact that the documentation is in Japanese.
A Pretty Solution
It turns out that Mac OS X for years, and iPhone OS since inception, has been shipped with a perfectly good regular expressions engine. This engine is based on the ICU specification, so it works perfectly with unicode and is well on par with PCRE for functionality. This framework is simply called ICU Core, and has a C interface. But for a Cocoa programmers the C interface is not nice enough, and thankfully John Engelhart has done this work for us, with RegexKitLite. RegexKitLite is a little brother to RegexKit that wrapps ICU Core instead of PCRE.
RegexKitLite is published under BSD license, and is simply two files you add to your project, fully compatible with all available versions of both Mac OS X and iPhone OS. The tricky part is that ICU Core is not a public API officially supported by Apple, even though it has existed unchanged for years. Good news is that using ICU Core is not a show stopper for publishing on the iPhone App Store, application out there already uses it, both well known and not so well known.
Setting Up RegexKitLite
- Download the latest version from the sourceforge webpage, or SVN.
RegexKitLite.mto your project.
- Link your project against ICU Core, by adding the linker flag
-licucoreto Other Linker Flags under your projects build settings.
Optionally you can also add the documentation to Xcode with these easy steps:
- Open Help -> Documtantion.
- Press the Gears button in the lower left corder, and select New Subscription….
This post is not a tutorial on regular expressions, but a tutorial on a partical API for executing regular expressions. If you want to learn more about regular expressions themselves I would recomend you look at Regular Expressions.info.
RegexKitLite provides it’s functionality as categories on
NSMutableString. This way using regular expressions with Cocoa is just as easy and normal string manipulation. This is best described using examples.
A simple example that normalizes a text with single white spaces, kind of like how a HML renderer would do, so this is handy when scraping web pages:
NSString* source = @"Onet Two nThree ";
NSString* result = [source stringByReplacingOccurancesOfRegex:@"s+"
Or you can split a text, such as semi-colon delimeted data:
NSString* source = @"Test;12;Y";
NSArray* columns = [source componentsSeparatedByRegex:@";s*"];
And you can extract more complex data using capture groups:
NSString* source = @"<foo no="12">Name</foo>";
NSString* regex = @"<foo no="(.+?)">(.*?)</foo>";
int no = [[source stringByMatching:regex capture:1] intValue];
NSString* data = [source stringByMatching:regex capture:2];
NSLog(@"no: %d data: %@", no, data);
This may look like it could be slow to perform matches on the same regular expression twice, but it is not. RegexKitLite is very smart, and will cache your previous matches for very high performance.
RegexKitLite is a very capable, and also much active open source project, with version 3.0 as a release candidate in SVN. Use it, and use it well.