What Is Good About Haskell ?, Hacker News

Posted on October 2, 2019

Beginners to Haskell are often confused as to what’s so great about the language. Much of the proselytizing online focuses on pretty abstract (and often poorly defined) concepts like “purity”, “strong types”, and (god forbid) “monads”. These things are difficult to understand, somewhat controversial, and not obviously beneficial (especially when you’ve only been using the language for a short amount of time).

The real tragedy is that Haskell (and other ML-family languages) arepackedwith simple, decades-old features like pattern matching and algebraic data types which have massive , clear benefits and few (if any) downsides. Some of these ideas are finally filtering in to mainstream languages (like Swift and Rust) where they’re used to great effect, but the vast majority of programmers out there haven’t yet been exposed to them.

This post aims to demonstrate some of these features in a simple (but hopefully not too simple) example. I’m going to write and package up a simple sorting algorithm in both Haskell and Python, and compare the code in each. I’m choosing Python because I like it and beginners like it, but also because it’s missing most of the features I’ll be demonstrating. It’s important to note I’m not comparing Haskell and Python as languages: the Python code is just there as a reference for people less familiar with Haskell. What’s more, the comparison is unfair, as the example deliberately plays to Haskell’s strengths (so I can show off the features I’m interested in): it wouldn’t be difficult to pick an example that makes Python look good and Haskell look poor .

This post is not meant to say “Haskell is great, and your language sucks”! It’s not even really about Haskell: much of what I’m talking about here applies equally well to Ocaml, Rust, etc. I’m really writing this as a response to the notion that functional features are somehow experimental, overly complex, or ultimately compromised. As a result of that idea, I feel like these features are left out of a lot of modern languages which would benefit from them. There exists a small set of simple, battle-tested PL ideas, which have been used for nearly forty years now: this post aims to demonstrate them, and argue for their inclusion in every general-purpose programming language that’s being designed today.

We’ll be using askew heapto sort lists in both languages. The basic idea is to repeatedly insert stuff into the heap, and then repeatedly “pop” the smallest element from the heap until it’s empty. It’s not in-place, but it is $mathcal {O} (n log n)$ , and actually performs pretty well in practice.

A Skew Heap is represented by abinary tree:

classTree:def__ init __(self, is_node, data, lchild, rchild):self._ is_node=is_nodeSelf. _data=dataSelf._lchild=Lchildself._ rchild=Rchilddefleaf ():(return) ****************************** (Tree)False,None,None,None)********************** (def)  node (data, lchild, rchild):return  (Tree)True, data, lchild, rchild)

I want to point out the precision of the Haskell definition: a tree is either a leaf (an empty tree), or a node, with a payload and two children. There are no special cases, and it took us one line to write (spread to 3 here for legibility on smaller screens).

In Python, we have to write a few more lines¹⁾. This representation uses the_ is_nodefield is (False) for an empty tree (a leaf). If it’sTrue, the other fields are filled. We write some helper functions to give us constructors like the leaf and node ones for the Haskell example.

This isn’t the standard definition of a binary tree in Python, in fact it might looks a little weird to most Python people. Let’s run through some alternatives and their issues.

The standard definition:

Instead of having a separate field for “is this a leaf or a node”, the empty tree is simplyNone:

With this approach, if we define anymethodson a tree, they won’t work on the empty tree!
We’ll do inheritance! Python even has a handyabclibrary to help us with some of this:

Methods will now work on an empty tree, but we’re faced with 2 problems: first, this is very verbose, and pretty complex. Secondly, we can’t write a mutating method which changes a tree from a leaf to a node. In other words, we can’t write aninsertmethod!
We won’t represent a leaf as the wholetreebeingNone, just the data!

This (surprisingly) pops up in a few places. While it solves the problem of methods, and the mutation problem, it has a serious bug. We can’t haveNoneas an element in the tree! In other words, if we ask our eventual algorithm to sort a list which containsNone, it will silently discard some of the list, returning the wrong answer.

There are yet more options (using a wrapper class), none of them ideal. Another thing to point out is that, even with our definition with a tag, we can only represent types with 2 possible states. If there was another type of node in the tree, we couldn’t simply use a boolean tag: we’d have to switch to integers (and remember the meaning of each integer), or strings! Yuck!

What Python is fundamentally missing here isalgebraic data types. This is a way of building up all of your types out of products (“my type has thisandthis”) and sums (“my type is thisorthis ”). Python can do products perfectly well: that’s what classes are. The tree class itself is the product ofBool,data,Tree, and (Tree) . However it’s missing anentire half of the equation! This is why you just can’t express binary trees as cleanly as you can in Swift, Haskell, OCaml, etc. Python, as well as a host of other languages like Go, Java, etc, will let you expressonekind of “sum” type: “or nothing” (the null pointer ). However, it’s clunky and poorly handled in all of those languages (the method problems above demonstrate the issues in Python), and doesn’t work for anything other than that one special case.

Again, there’s nothing about algebraic data types that makes them ill-suited to mainstream or imperative languages. Swift uses them, andpeople love them!

The core operation on skew heaps is theskew merge.

The standout feature here is pattern matching. In Haskell, we’re able to write the function as we might describe it: “in this case, I’ll do this, in this other case, I’ll do this, etc.”. In Python, we are forced to think of the truth tables and sequential testing. What do I mean by truth tables? Consider the following version of the Python function above:

You may even write this version first: it initially seems more natural (because_ is_nodeis used in the positive). Here’s the question, though: does it do the same thing as the previous version? Are yousure? Which else is connected to which if? Does every if have an else? (some linters will suggest youremovesome of the elses above, since the if-clause has areturnstatement in it!)

The fact of the matter is that we are forced to do truth tables of every condition in our minds, rather thansaying what we mean(as we do in the Haskell version).

The other thing we’re saved from in the Haskell version is accessing undefined fields. In the Python function, we know accessinglhs._datais correct since we verified that (LHS) is a node. But the logic to do this verification is complex: we checked if itwasn’ta node, and returned if that was true… so if itis truethatlhsisnta node, we would have returned, but we didn’t, so…

Bear in mind all of these logic checks happened four lines before the actual access: this can get much uglier in practice! Compare this to the Haskell version:we only get to bind variables if we’re sure they exist. The syntax itself prevents us from accessing fields which aren’t defined, in a simple way.

Pattern matching has existed for years in many different forms: even C has switch statements. The added feature of destructuring is available in languages like Swift, Rust, and the whole ML family. Ask for it in your language today!

Now that we have that function, we get to define others in terms of it:

I haven’t mentioned Haskell’s type system so far, as it’s been quite unobtrusive in the examples. And that’s kind of the point: despite more complex examples you’ll see online demonstrating the power of type classes and higher-kinded types, Haskell’s type systemexcelsin these simpler cases.

Without much ceremony, this signature tells us:

The function takes two trees, and returns a third.
Both trees have to be filled with the same types of elements.
Those elements must have an order defined on them.

I feel a lot of people miss the point of this particular feature. Technically speaking, this feature allows us to write fewer type signatures, as Haskell will be able to guess most of them. Coming from something like Java, you might think that that’s an opportunity to shorten up some verbose code. It’s not! You’ll rarely find a Haskell program these days missing top-level type signatures: it’s easier to read a program with explicit type signatures, so people are advised to put them as much as possible.

(Amusingly, I often find older Haskell code snippets which are entirely devoid of type signatures. It seems that programmers were so excited about Hindley-Milner type inference that they would put it to the test as often as they could.)

Type inference in Haskell is actually useful in a different way. First, if I write theimplementationof themergefunction, the compiler will tellmethe signature, which is extremely helpful for more complex examples. Take the following, for instance:

Remembering precisely which numeric typexneeds to be is a little difficult (Floating ?Real?Fractional?), but if I just ask the compiler it will tell me without difficulty.

The second use is kind of the opposite: if I have a hole in my program where I need to fill in some code, Haskell can help me along by telling me thetypeof that hole automatically. This is often enough information to figure out the entire implementation! In fact, there are some programs which will use this capability of the type checker to fill in the hole with valid programs, synthesising your code for you.

So often strong type systems can make you feel like you’re fighting more and more against the compiler. I hope these couple examples show that it doesn’t have to be that way.

The next function is “pop-min”:

At first glance, this function should be right at home in Python. Itmutatesits input, and it has an error case. The code we’ve written here for Python is pretty idiomatic, also: other than the ugly deep copy, we’re basically just mutating the object, and using an exception for the exceptional state (when the tree is empty). Even the exception we use is the same exception as when you try andpop ()from an empty list.

The Haskell code here mainly demonstrates a difference in API style you’ll see between the two languages. If something isn’t found, we just useMaybe. And instead of mutating the original variable, we return the new state in the second part of a tuple. What’s nice about this is that we’re only using simple core features like algebraic data types to emulate pretty complex features like exceptions in Python.

You may have heard that “Haskell uses monads to do mutation and exceptions”. This is not true. Yes, state and exceptions have patterns which technically speaking are “monadic”. But make no mistake: when we want to model “exceptions” in Haskell, we really just return a maybe (or an either). And when we want to do “mutation”, we return a tuple, where the second element is the updated state. You don’t have to understand monads to use them, and you certainly don’t “need” monads to do them. To drive the point home, the above code could actually equivalently have a type which mentions “the state monad” and “the maybe monad”:

But there’s no need to!

The main part of our task is now done: all that is left is to glue the various bits and pieces together. Remember, the overall algorithm builds up the heap from a list, and then tears it down usingpopMin. First, then, to build up the heap.

To my eye, the Haskell code here is significantly more “readable” than the Python. I know that’s a very subjective judgement, butfoldris a function so often used that it’s immediately clear what’s happening in this example.

Why didn’t we use a similar function in Python, then? We actually could have: python does have an equivalent tofoldr, calledreduce(it’sbeen relegated to functoolssince Python 3 (also technically it’s equivalent to (foldl) , notfoldr)). We’re encouragednotto use it, though: the more pythonic code uses a for loop. Also, it wouldn’t work for our use case: theinsertfunction we wrote ismutating, which doesn’t gel well withreduce.

I think this demonstrates another benefit of simple, functional APIs. If you keep things simple, and build things out of functions, they’ll tend to glue togetherwell, without having to write any glue code yourself. The for loop, in my opinion, is “glue code”. The next function,heapToList, illustrates this even more so:

Again, things are kept simple in the Haskell example. We’ve stuck to data types and functions, and these data types and functions mesh well with each other. You might be aware that there’s some deep and interesting mathematics behind thefoldrandUnfoldrfunctions going on, andhow they relate. We don’t need to know any of that here, though: they just work together well.

Again, Python does have a function which is equivalent tounfoldr:iterhas an overload which will repeatedly call a function until it hits a sentinel value. But this doesn’t fit with the rest of the iterator model! Most iterators are terminated with theStopIterationexception; ours (like thepopfunction on lists (is terminated by theIndexErrorexception; and this function excepts a third version, terminated by a sentinel!

Finally, let’s writesort:

This is just driving home the point: programs workwellwhen they’re built out of functions, and youwantyour language to encourage you to build things out of functions. In this case, thesortfunction is built out of two smaller ones: it’s theessenceof function composition.

So I fully admit that laziness is one of the features of Haskell that does have downsides. I don’t think every language should be lazy, but I did want to say a little about it in regards to the sorting example here.

I tend to think that people overstate how hard it makes reasoning about space: it actually follows pretty straightforward rules, which you can generally step through in yourself (compared to, for instance, rewrite rules, which are often black magic! )

In modern programming, people will tend to use laziness it anyway. Python is a great example: theitertoolslibrary is almost entirely lazy. Actually making use of the laziness, though, is difficult and error-prone. Above, for instance, theheapToListfunction is lazy in Haskell, but strict in Python. Converting it to a lazy version is not the most difficult thing in the world:

But now, suddenly, the entire list API won’t work. What’s more, if we try and access thefirstelement of the returned value, we mutate the whole thing: anyone else looking at the output of the generator will have it mutated out from Under them!

Laziness fundamentally makes this more reusable. Take ourpopMinfunction: if we just want to view the smallest element, without reconstructing the rest of the tree, we can actually usepopMinas-is. If we don’t use the second element of the tuple we don’t pay for it. In Python, we need to write a second function.

Testing thesortfunction in Haskell is ridiculously easy. Say we have an example sorting function that we trust, maybe a slow but obvious insertion sort, and we want to make sure that our fast heap sort here does the same thing. This is the test:

In that single line, theQuickChecklibrary will automatically generate random input, run each sort function on it, and compare the two outputs, giving a rich diff if they don’t match.

This post was meant to show a few features like pattern-matching, algebraic data types, and function-based APIs in a good light. These ideas aren’t revolutionary any more, and plenty of languages have them, but unfortunately several languages don’t. Hopefully the example here illustrates a little why these features are good, and pushes back against the idea that algebraic data types are too complex for mainstream languages.