Monoids
Updated 2019-04-26
Programming languages evolve towards always more abstract structures which help developers to reason about their code. Many structures are close to our natural way of thinking : objects, functions, data collections, etc. Monoids are a bit more abstract but once we recognize them, they become very useful.
What is a monoid?
Mathematically, and keeping it short, a monoid is an algebraic structure that satisfies identity element and associativity.
So, let’s assume we have a type T and a binary operation op: op(T,T) → T
,
then T with op is a monoid if
- op is associative, for a, b, c in T, which means
op(op(a , b) , c) = op(a , op(b , c))
- T has an identity element which we call neutral on op like:
op(neutral , a) = op(a, neutral) = a
We call the identity element neutral, sometimes it's called zero and in some collection methods it's just z.
Sometimes operation is called combine or append, because it's usually the kind of thing operation will do.
In Scala we can define a monoid trait:
Let's take a few examples of monoids.
We can define the addition of integers as a monoid:
Or the concatenation of strings:
Or the multiplication of integers:
In the three cases, the two monoid laws hold and are easy to verify. Associativity is mathematically true for addition and multiplication, their identity elements, respectively 0 and 1, are obvious. For string, it seems even more obvious.
However, we could also define a monoid for subtraction, but that would be wrong:
intSubNotMonoid is not a monoid because the subtraction operation is mathematically not associative: 10 - (6 -3) ≠ (10 - 6) - 3
So calling it a monoid does not make it a monoid. We always have to make sure the implementing operation and the identity element hold the monoid laws. A good test suit should make sure it's all fine.
But what can we do with monoids?
As for any abstract structure, recognizing it, is the first step. Once we start seeing monoids, we figure out they are almost everywhere. They become good friends to clean-up or factorize code. We see them a lot when we are working with collections and they are also very powerful abstractions when it comes to parallel programming.
In collections
The fold method from List which is implemented as a foldLeft in TraversableOnce expects a neutral z element and an associative binary operator... Sounds like a monoid isn't it?
And if we consider the signatures of foldLeft and foldRight and restrict their arguments to the same type, say A, we end-up with
It seems like a good fit for monoids. So now we can do something like that:
Clean code which is guaranteed to always produce the same results as long as the given monoid is indeed a monoid!
We can then create some helper object:
and then simplify the previous calls even more:
For parallel programming
The fact that an operation is associative makes it a good candidate for parallel computation. Let's take our previous list and the multiplication monoid.
Folding means something like that:
mult(mult(mult(mult(1 , 2) , 4) , 8) , 16)
That's sequential. However, as mult is associative, we can also do any combination, for example:
mult(mult(1 , 2) , mult(mult(4 , 8) , 16))
And the result is always the same. So, we could run those calculations in parallel on as many separate threads as there would be available. For that, we can safely use the parallel version of the collection library:
The parallel collection library is very well explained in the scala docs.
Behind the scene, par splits the collection and then combines results automatically. Depending on the collection type, the splitters work differently. The results cannot rely on how the collections are splitted because it's not deterministic, therefore the need of associativity and the importance of monoids.
Building-up monoids
Monoids are also like lego briks, we can compose them easily. From two monoids we can get a tuple of monoids:
Now we can combine two monoid calculations at the same time. We would need to give it the two monoids and would get the tuple back.
Merging two maps
An interesting use case is merging two maps and applying a function to the values when the keys are identical. Let's take two maps, the values could be the occurences of the words in some corpus.
A simple merge version applying a passed function to the values:
Another way to do it is with a Monoid. The code looks a bit more complicated but later on it's going to be easier to use because we can further compose it.
And finally, let's run the code:
produces:
Map(home -> 10, here -> 18, ra -> 21, house -> 6, waouh -> 15, yyy -> 8, hello -> 25, earth -> 3)
Map(home -> 10, here -> 18, ra -> 21, house -> 6, waouh -> 15, yyy -> 8, hello -> 25, earth -> 3)
It's worth mentioning that the scalaz library is implementing many algebraic structures keeping very close to their mathematical counterparts. Thus, in scalaz, Monoid extends Semigroup. Semigroup provides the binary associative operation and Monoid adds the identity element.
Thanks for reading and enjoy your daily Scala programming.