If you have used case classes in Scala, you can not neglect the power they bring to your applications. They provide a recursive decomposition mechanism via pattern matching.
In this post I go through injectors and mostly extractors. You will see that how extractors can be employed for pattern matching.
Consider that we need to hold first names in an application. We can define a case class for Firstname:
we build a value from this case class in REPL:
scala> val fname = Firstname("Amir") fname: Firstname = Firstname(Amir)
and now a pattern matching on this case class:
scala> fname match { | case Firstname(f) => println(f) | case _ => println("Nothing found") | } Amir
Here fname is matched against its case class by constructor patterns mechanism and its field is extracted and printed. Very powerful and handy.
What if we need to do a pattern matching for a string? The problem is that strings are not case classes.
Scala provides us with a very interesting mechanism called Extractors. An extractor is an object that has a method called unapply as one of its members. Let’s clarify this with an example: assume we want to have an extractor object for IP addresses. We define the extractor as the following:
Method unapply receives a string representing a possible IP address and returns an option of 4 strings. If the string is not a valid IP address, the method returns None. Method isValid is added for validating IP addresses. Let’s try this in REPL:
scala> val ip = "127.0.0.1" ip: java.lang.String = 127.0.0.1 scala> val nonIP = "128.-112.ABC." nonIP: java.lang.String = 128.-112.ABC. scala> IPAddress.unapply(ip) res0: Option[(String, String, String, String)] = Some((127,0,0,1)) scala> IPAddress.unapply(nonIP) res1: Option[(String, String, String, String)] = None
And if we use it in a pattern matching statement:
scala> ip match { | case IPAddress(_, _, _, a) => println(a) | case _ => println("Invalid ip address") | } 1 scala> nonIP match { | case IPAddress(_, _, _, a) => println(a) | case _ => println("Invalid ip address") | } Invalid ip address
So ip was a valid string representation of an IP Address and it is matched against IPAddress while nonIP was not a valid one. In the example above we were only interested in the last byte of IP Address and we skipped the rest by _ wildcard. Of course you can extract all the four parts if you need them.
In IPAddress object we used 4 variables that is wrapped in a Some in the success case and returned. This can be generalized to N variables.
It is also possible that an extractor pattern does not bind to any variables. In this case the corresponding unapply method returns a boolean (true for success and false for failure). As an example we changed the unapply method to the following:
and we try it in REPL:
scala> "127.0.0.1" match { | case IPAddress() => println("Valid") | case _ => println("Invalid") | } Valid
Remember that although there is no binding in this case but you have to put the parentheses in front of IPAddress.
So far we have seen fixed number of element values. But what if we have variable number of element values? For example if you have a string that can contain arbitrary number of IP addresses? In order to handle this case, Scala allows you to define a different extractor method called unapplySeq. To see how it can be used, assume we want to have pattern matching on a string containing arbitrary number of IP Addresses:
This time the method returns an option of a sequence of string. Now let’s see what we can do with it:
scala> val ips = "192.168.0.1,192.168.0.2,192.168.0.3,192.168.0.4" ips: java.lang.String = 192.168.0.1,192.168.0.2,192.168.0.3,192.168.0.4 scala> ips match { | case IPAddresses(IPAddress(a, _, _, _), IPAddress(b, _, _, _), _*) => println(a + " " + b) | case _ => println("Invalid IP addresses") | } 192 192
In this example we used both IPAddress and IPAddresses objects in the pattern matching. There are 4 IP addresses in ips and we are looking for the first byte of the first two IPs in the string. How powerful and clean!
The unapply method is called extractor because it takes an element of the same set and extracts some of its parts. In our example IPAddress takes a string and validates and extracts all 4 bytes of the address. You can also define a method apply on an object that does vice versa which takes some arguments and yields an element of a given set. This method is called an injection. We can define injection for our IPAddress object like this:
Injections and extractions are often located in the same object because then you can use the object name for both constructions and pattern matching. However, it is also possible to have an extraction object without injection. The object itself is called extractor regardless of whether or not it has an apply method [1].
Remember if you include the injection method, it should be a dual to the extraction method, for example:
IPAddress.unapply( IPAddress.apply(a, b, c, d) )
should return:
Some(a,b,c,d)
Extractors vs. Case classes
There is a section in [1] that compares extractors and case classes which I summarize here:
- There is one shortcoming with case classes: they expose the concrete representation of data. This means that if a case succeeds in a matching, you know that the selector expression is an instance of that case class.
- Extractors do not expose the concrete representation of data.
- Case classes have less code and they are easier to set up. Scala compiler can optimize patterns over case classes much better than extractors.
- If a case class inherits from a sealed base class, Scala compiler checks for pattern matches for exhaustiveness and will complain in case there exists something while Scala compiler can not do the same thing for extractors.
- If you write code for a closed application, case classes are preferable because of their advantages in conciseness, speed and static checking.
- If you need to expose a type to unknown clients, extractors might be preferable.
References:
[1]: Martin Odersky, Lex Spoon, Bill Venners, “Programming in Scala”, 2nd Edition
Is a useful tip for Scala programmers. Thanks.
Using scala compiler plugin, it’s also possible customize/use with arguments an extractor, either in match block or in partial function cases: https://github.com/cchantep/acolyte/tree/master/scalac-plugin