Experimenting with Scala Parallel Collections (contd.)

In the previous post we went through Scala parallel collections and you saw how you can convert a sequential collection into a parallel one by using method par on that collection.

In this post I want to show you how you can write your own parallel collection in Scala. The example I use for this post is taken from Alex Prokopec’s presentation at Scala eXchange 2011.

Consider we want to have a parallel String in our application. If you convert a String into a parallel collection you get a ParArray:

scala> "I Love Scala".par
res2: scala.collection.parallel.ParSeq[Char] = ParArray(I,  , L, o, v, e,  , S, c, a, l, a)

This is good enough to take advantage of your multi-core CPU architecture, but for some reason you want to write you own Parallel String collection!

Scala Parallel collection is implemented by a mechanism that is called split-combine. As the name implies a sequential collection can be splitted into sub-collections and each one can be stolen by a thread from a thread pool. Later the results of each sub-collection can be combined to make the final collection. You can find details about this implementation in [1].

In order to start with our Parallel String we need to extend scala.collection.parallel.immutable.ParSeq trait. There are 4 abstract methods that we need to implement:

seq, length and apply have obvious implementations. Method splitter returns a new instance of class ParallelStringSplitter which we are going to define next. Note that this instance is mixed-in into SignalContextPassingIterator which is responsible for passing the signal context along iterators. We define ParallelStringSplitter class inside ParallelString class:

Note that we need to extend Splitter trait. There are 6 abstract methods to implement. Method dup returns a new instance of ParallelStringSplitter with current values. Method split splits the iterator into a sequence of disjunct views and method psplit splits the splitter into a sequence of disjunct views.

Now we have the splitter. Let’s test this parallel String in REPL:

scala> val pstr = new ParallelString("This is a long string" * 250000)
pstr: ParallelString = ParallelString(T, h, i, s,  , i, s,  , a,  , l, o, n, g,  ...

If you don’t provide a combiner, then the default one is used. Sometimes you want to have control over the combination and you want to take the responsibility for it. Let’s write our own combiner to show how you can do that. We need to mix in another trait into the definition of ParallelString: ParSeqLike

In order to specify our own combiner we need to override method newCombiner:

Now we need to define ParallelStringCombiner class by extending the Combiner trait:

There are 5 abstract methods that need to be implemented. As the names imply method combine combines the content of another builder to the current builder and produces a new builder. Method result produces a collection from added elements and += adds a single element to this builder.

To summarize, to build a parallel collection class A, one should:

  • extend ParSeq trait
  • implement all abstract methods from ParSeq trait
  • define a splitter class that mixes in with Splitter and ParIterator traits and put it in class A
  • extend ParSeqLike trait
  • define a combiner class that extend Combiner trait and implement all abstract methods
  • override newCombiner method in class A and return a new instance of defined combiner

For the complete code of this example please refer to Alex Prokopec’s Github.

References

[1]: Aleksandar Prokopec, Tiark Rompf, Phil Bagwell, Martin Odersky, “A Generic Parallel Collection Framework”, Euro-Par 2011

Leave a Reply