Regular Expressions and Cocoa

Regular expressions is a powerful tool for solving many problems related to text. It can be misused as any good tool, but there are moments when they are the best solution for a given problem. At those moments the lack of regular expressions for Cocoa on Mac OS X and Cocoa Touch on iPhone OS is a pain in the butt.

Or are regular expressions really missing? Regular expressions can be used with NSPredicate that is part of Core Data, available since Mac OS X 10.4 and officially announced for iPhone OS 3.0. Cocoa’s WebView and the equivalent UIWebView in Cocoa Touch both support JavaScript with regular expressions. So there sure is regular expressions available on the platforms, but how do you make it available for your own code?

An Ugly Solution

You can actually get access to the regular expression engine through JavaScript, unfortunately this requires a roundtrip through WebKit. On an iPhone this means you have to use an off-screen instance of UIWebView, and delegate execution of regular expression to it.

The complexity of an off-screen WebView or UIWebView could be hidden by a utility class. But the extra glue code needed to make something useful out of the single method stringByEvaluatingJavaScripFromString:, would be allot.

What Apple Recommends

For most problems the official stance is correct; do not use regular expressions. Instead use NSScanner, that is perfect for sequentially parse texts. It is very fast and can substitute any regular expressions that only relies on:

  • Character sets
  • Exact string matches
  • Numerical matches
  • Uniform input text

These conditions hold true to 95% of everything regular expressions is ever used for. For the other 5%, Apple leaves you to fend for your own.

Other Solutions

PCRE compiles perfectly with Cocoa, since it is written in C, one of the many advantages of Objective-C. PCRE is very capable, and almost a standard, but also very large. For an iPhone application the PCRE implementation could end up as the majority of your executables file-size. If this is something you can live with, then the open source RegexKit framework wraps PCRE in Cocoa and Cocoa Touch friendly Objective-C.

Another regular expressions framework is OgreKit. The advantage of OgreKit is full unicode support, with the same disadvantage of size. And the fact that the documentation is in Japanese.

A Pretty Solution

It turns out that Mac OS X for years, and iPhone OS since inception, has been shipped with a perfectly good regular expressions engine. This engine is based on the ICU specification, so it works perfectly with unicode and is well on par with PCRE for functionality. This framework is simply called ICU Core, and has a C interface. But for a Cocoa programmers the C interface is not nice enough, and thankfully John Engelhart has done this work for us, with RegexKitLite. RegexKitLite is a little brother to RegexKit that wrapps ICU Core instead of PCRE.

RegexKitLite is published under BSD license, and is simply two files you add to your project, fully compatible with all available versions of both Mac OS X and iPhone OS. The tricky part is that ICU Core is not a public API officially supported by Apple, even though it has existed unchanged for years. Good news is that using ICU Core is not a show stopper for publishing on the iPhone App Store, application out there already uses it, both well known and not so well known.

Setting Up RegexKitLite

  1. Download the latest version from the sourceforge webpage, or SVN.
  2. Add RegexKitLite.h and RegexKitLite.m to your project.
  3. Link your project against ICU Core, by adding the linker flag -licucore to Other Linker Flags under your projects build settings.

Optionally you can also add the documentation to Xcode with these easy steps:

  1. Open Help -> Documtantion.
  2. Press the Gears button in the lower left corder, and select New Subscription….
  3. Enter feed:// as URL.

Using RegexKitLite

This post is not a tutorial on regular expressions, but a tutorial on a partical API for executing regular expressions. If you want to learn more about regular expressions themselves I would recomend you look at Regular

RegexKitLite provides it’s functionality as categories on NSString and NSMutableString. This way using regular expressions with Cocoa is just as easy and normal string manipulation. This is best described using examples.

A simple example that normalizes a text with single white spaces, kind of like how a HML renderer would do, so this is handy when scraping web pages:

NSString* source = @"Onet Two nThree ";
NSString* result = [source stringByReplacingOccurancesOfRegex:@"s+"
    withString:@" "];

Or you can split a text, such as semi-colon delimeted data:

NSString* source = @"Test;12;Y";
NSArray* columns = [source componentsSeparatedByRegex:@";s*"];
NSLog([columns description]);

And you can extract more complex data using capture groups:

NSString* source = @"Name";
NSString* regex = @"(.*?)";
int no = [[source stringByMatching:regex capture:1] intValue];
NSString* data = [source stringByMatching:regex capture:2];
NSLog(@"no: %d data: %@", no, data);

This may look like it could be slow to perform matches on the same regular expression twice, but it is not. RegexKitLite is very smart, and will cache your previous matches for very high performance.

RegexKitLite is a very capable, and also much active open source project, with version 3.0 as a release candidate in SVN. Use it, and use it well.

This Post Has 4 Comments

  1. Hooray, you posted your Cocoaheads presentation! Now we all have something to link to. :)

    As a sidenote, this is what using regexes with NSPredicate looks like:

    NSPredicate *predicate = [NSPredicate predicateWithFormat:@”SELF MATCHES ‘^M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$'”];
    BOOL matches = [predicate evaluateWithObject:string];

    Nothing new and certainly nothing as useful as a general purpose regex engine like RegexKit, but I found out that NSPredicate could do regexes just the other day, so I think it’s kinda cool. ;)

    (Bonus points it you can tell what the regex matches.)

    Further reading:


  2. here

    Hi there it’s me, I am also visiting this website on a regular basis, this web page is truly good and the visitors are genuinely sharing nice thoughts.

Leave a Reply