Scala streams – out of memory

I was doing Advent of Code and was working on generating numbers based on a pattern (day 15). Let’s say I wanted to generate 0, 1, 2, ... to simplify this example.

The first thing that came to mind was building a stream. One way to do this in Scala is as follows:

or you can do it like this:

The plan was then to lazily generate 40 million numbers, filter out then ones I wanted and count them. Turns out this didn’t work at all as memory started to run out (OutOfMemoryError) for two reasons which at first seemed counterintuitive.

First, if you look at the source code for Stream in Scala you find a lot warnings such as:

* – One must be cautious of memoization; you can very quickly eat up large
* amounts of memory if you’re not careful. The reason for this is that the
* memoization of the Stream creates a structure much like
* [[scala.collection.immutable.List]]. So long as something is holding on to
* the head, the head holds on to the tail, and so it continues recursively.
* If, on the other hand, there is nothing holding on to the head (e.g. we used
* def to define the Stream) then once it is no longer being used directly,
* it disappears.

The take method has more to say:

Unaware of this, you (as I) may think that code like this should work just fine:

Running the code above with -Xmx10m will throw an java.lang.OutOfMemoryError: GC overhead limit exceeded exception. However, if you change the val to def it will work just fine since the stream’s head reference can be garbage collected.

The next tricky thing is that many methods, such as the common filter method, and unlike dropWhile, isn’t implemented in Stream but in TraversableLike. With filter you have to worry about how many elements that are going to be memoized in between the elements which the predicate returns true for. Running the following code with -Xmx10m will throw an OutOfMemoryError:

While this code will work just fine:

With these two things in mind, you should be able to avoid the memory gotchas in Scala streams. In the end though, I realised that I didn’t need Stream but rather Iterator for my Advent of Code solution.

If you’re not going to take advantage of the memoization, go with Iterator instead!

Leave a Reply