August 20, 2008

Accumulation

It's a pretty common operation to take a list of stuff and accumulate the items. For example, if I have a collection of word frequencies and I want to aggregate the counts by word.

The "Java" imperative way of doing this is to create a HashMap of stuff and populating the HashMap. For example (this is Java-style Scala code):


def accumulate(in: Seq[(String, Int)]): Seq[(String, Int)] = {
val map = new HashMap[String, Int]

for ((str, cnt) <- in) {
if (map.isDefinedAt(str)) map(str) = map(str) + cnt
else map(str) = cnt
}

map.toList
}


What are we doing? We're creating some state and iteratively updating the state.

My preferred way of doing this is thinking about the issue as "transformative". We're transforming the input to the output, without mutating anything.


def acc(in: Seq[(String, Int)]): Seq[(String, Int)] =
in.foldLeft[Map[String, Int]](Map.empty){
case (map, (str, cnt)) => map + (str -> (map.getOrElse(str, 0) + cnt))
}.toList


The transformative approach is shorter and I believe, sweeter.

5 comments:

Nick said...

If you use google-collections, then the java equivalent is:

ImmutableMultiset.copyOf(source).entrySet();

Shorter again, and arguable even more sweet :P.

David R. MacIver said...

@nick And also does the wrong thing. :-)

@david:

The only reason your fold version is shorter is because you've done something stupid in the loop one.


def accumulate(in: Seq[(String, Int)]): Seq[(String, Int)] = {
val map = new mutable.HashMap[String, Int]

for ((str, cnt) <- in) {
map(str) = map.getOrElseUpdate(str, 0) + cnt;
}

map.toList
}

In general, simply replacing a loop over a mutable implementation with a fold over an immutable one doesn't really buy you much unless you want that immutable implementation at the end of the operation. When you're converting to a List like this at the end there's really very little difference.

Hm. Someone should write an article about that. ;-)

Seth Tisue said...

I agree with David M, except for the word "stupid" :-). This isn't a particularly compelling example of what's good about immutable data structures. (I understand that the benefits of avoiding mutation grow in as code becomes more complex.)

In the mutable version of the code, instead of doing toList at the end, I would suggest calling readOnly. You get an immutable answer that way, but with next to no overhead, whereas toList requires copying the entire data structure.

Nick said...

Doh! Yeah, I misread the signature. Either way, the same collection can be used to produce code as succint as the mustable.HashMap example.

The map.getOrElseUpdate becomes:

set.put(str, cnt)

You still have to iterate over the input and you'll have angle-brackets and explicit typing flying all over the place.

ARMINA5 said...

Very interesting information!
Thank U