Lambdas and higher-order functions

Learning goal

By the end of this chapter you should understand what lambdas (also called anonymous functions) are, what are they used for, and how they are the basic building block of writing real-world Haskell code. We’ll also revisit all the exercises of the previous chapter and solve them using higher-order functions like map, filter, etc and simplify our life!

You have probably used lambdas before

If you have made any AJAX calls using jQuery, you have already used lambdas:

$.getJSON("http://something.com/file.json", function(data)  {
  console.log("received JSON data", data);
});

If you have written a click handler in jQuery, you have already used lambdas:

$('#myButton').click(function() {
  alert("You clicked the button!");
});

If you have ever printed each line of a CSV file in Ruby, you have already used lambdas:

CSV.foreach("file.csv") do |fields|
  puts fields.inspect
end

If you have ever filtered a list in Python, you have already used lambdas:

>>> print filter(lambda x: x % 3 == 0, [2, 18, 9, 22, 17, 24, 8, 12, 27])
 [18, 9, 24, 12, 27]

Now that we have almost established that you have already used lambdas before let’s start by building intuition of why they are needed in the first place.

Why are lambdas needed

Let’s take a step back. Why are functions needed? You probably know intuitively why functions are needed and when you would write your own function, but let’s articulate it nevertheless. We will build upon it to demonstrate why lambdas are needed.

Functions are used to define common piece of logic/computation/steps that need to be used at multiple places. Is it the exact same logic/computation/steps? No. That would be very limiting. The logic/computation is parameterized somehow, so that every time it is needed, the behaviour can be slightly altered. This is achieved by passing different arguments to various function parameters.

For example the sqrt function, depicted with the awesome ASCII art below, is fairly simple. It does one thing and does it well. It give us one parameter with to control its output - the number we want to find the square-root of.

              +------+
4 (input) --> | sqrt | --> 2 (output)
              +------+

               +------+
16 (input) --> | sqrt | --> 4 (output)
               +------+

Similarly, an exponentiation functions gives us two parameters - the base and the power.

Now, let’s take this up a notch. What if we were to look at another very common task in real-world apps: open a file => read data line by line => do something with each line => close the file. If you have written code in Java, there is a very high chance that you have written the following code over and over and over again:

BufferedReader br = null;
try {

  // open the file...
  br = new BufferedReader(new FileReader("somefile.txt"));

  // read each line till you reach EOF (end-of-file)
  String line = br.readLine();
  while (line != null) {

    // ... finally do something with each line

  }
} finally {

  // Let's close the file...
  //
  if (br!=null) {
    br.close()
  }
}

Processing a file line by line is a series of steps that need to be peformed at multiple places in your app. Yet we find ourselves writing similarly “shaped” code over and over again all over our app. The only difference would be the action that is being done with each line once it is read from the file.

Can we use regular functions, like sqrt, to reduce this code duplication? What parameters will we pass our hypothetical processFileLineByLine function? One parameter will obviously be for the filename that we want to process, but what about what we want to do with each line after it is read?

To elaborate:

  1. open file
  2. read a line
  3. do something with the line <== This is what we want to change in different places in our app
  4. Go back to step #2 if we haven’t reached the end of the file
  5. Close the file

At one place we might need to print each line to the terminal. In another place we might want to print only those lines which contain the term ERROR (say we are filtering a log file). In yet another place, we might want to write each line to a separate file masking out any email addresses to something@redacted.com (say, for data anonymization purposes).

What we need is the ability to pass this hypothetical processFileLineByLine function a step (not a value, but a step). A step which contains the logic/computation of what to do with each line after it has been read.

This is exactly what lambdas (and higher-order functions) allow us to do. They allow us to write functions, where we can pass-in a step, a piece of logic, or a computation that changes the behaviour of the original function.

And, how can we write a step, a computation, or a piece of logic in any language? Why, functions, of course! So, we end-up with functions that can accept other functions as arguments. Thus, they are called “higher order” functions.

Examples: CSV processing in Ruby

Let’s consider the Ruby example from earlier. This is very similar to the Java example of reading files. Here are the steps that CSV.foreach must be performing internally:

  • Opens file.csv
  • Reads each row from the file taking care of detecting the newlines (there is a difference in how Unix represents newlines and how Windows represents newlines)
  • For each row it separates the data into fields by splitting at the commas , taking care that commas may be present within a fields value itself!
  • Finally, it has access to all the data in the CSV neatly split into rows & fields.
  • Now what? What needs to be done with the fields in each row? the CSV library doesn’t really know about that. That’s what you, the application developer, has to write.
CSV.foreach("file.csv") do |fields|
  puts row.inspect
end

This is where the lambda comes in:

do |fields|
  puts fields.inspect
end

You are telling the CSV library, to run this piece of code for each row in the CSV file. The fields in each row are passed to your lambda (anonymous function) neatly in the form of an array. You have, thankfully, been isolated from all the boilerplate of reading and parsing a CSV file – you are concentrating only on the core application logic of what to do with each row in the CSV.

Example: AJAX call in Javascript

Let’s look at the example of making an AJAX call using jQuery:

$.getJSON("http://something.com/file.json", function(data)  {
  console.log("received JSON data", data);
});

Here, the lambda is the following anonymous function:

function(data)  {
  console.log("received JSON data", data);
}

Notice, that this function doesn’t have a name! It’s just passed as any other argument to the getJSON function. In fact even the following is valid Javascript and will work:

var dataProcessingFunction = function(data)  {
  console.log("received JSON data", data);
};

$.getJSON("http://something.com/file.json", dataProcessingFunction);

Why is the lambda really being used here? Why can’t this be written in the following manner:

var data = $.getJSON("http://something.com/file.json");
console.log("received JSON data", data);

It’s written using a lambda because AJAX calls in Javascript are asynchronous. That is, if you write the following code the second console.log will be executed before the first!

$.getJSON("http://something.com/file.json", function(data)  {
  console.log("received JSON data", data);
});

// Note the following line
console.log("This will be printed before JSON data is received");

Usually, whenever an AJAX call is made in Javascript, the system doesn’t wait for the AJAX call to complete. It makes the request in a parallel thread and continues executing the next line of code. Now, when the AJAX call completes, the getJSON function needs a way to do something with that data. That something is the lambda. The lambda is executed and the JSON data received from the URL is passed to it.

This is known as an “event handler” or a “callback”. Any asynchronous callback or event-handler will usually be using some form of higher-order functions / lambdas to achieve this.

Example: Filtering a list in Python

In the examples we discussed till now, we had lambdas that represented very “real” steps. The lambda was “doing” something, eg. processing a row from a file, processing fields from a CSV, processing JSON data, etc. Now, we’ll discuss an example where the lambda doesn’t represent a “step” in a series of steps, but just a piece of abstract logic

Consider the process of filtering a list based on some condition:

  1. Start at the top of the list: i=0
  2. Initialise an empty list: result=[]
  3. Check the condition on the i-th element of the list <== this is what we want to change
  4. If the condition holds true, add the eleemnt to the result
  5. Increment the loop index: i=i+1
  6. Go back to step 3 if we haven’t reach the end of the list
  7. Return the filtered list: return result;

Now, this may seem like a very trivial thing to do and may be something that you have already committed to your muscle memory. In any real-world app there would be dozens of places where you would be copy-pasting the same procedure over, and over, and over again. Each time simply changing step #3 (the condition on which each element needs to be checked against).

Some real-life examples:

  • User gives you a list of email addresses. You need to filter out all blank strings.
  • User gives you a list of email addresses. You need to filter out all invalid emails.
  • You need to filter out all punctuation from user-input before storing in DB.

Let’s use lambdas to simplify our life. Consider the Python example from earlier:

>>> print filter(lambda x: x % 3 == 0, [2, 18, 9, 22, 17, 24, 8, 12, 27])
 [18, 9, 24, 12, 27]

Here, filter is a higher-order function that has “abstracted away” step #1-#7 and given you the ability to pass-in step #3. Note, that this “step” is really just a boolean condition. It is passed an element of the list and it needs to evaluate to a boolean true/false. If the result is true, the element is included in the final list, else it is omitted.

lambda x: x % 3 == 0

Here’s how you would filter out all blank strings from a list of email addresses:

>>> print filter(lambda x: x != "", ["saurabh@vacationlabs.com", "", "support@vacationlabs.com"])
 ["saurabh@vacationlabs.com", "support@vacationlabs.com"]

We just passed a different lambda with a different condition. That’s it.

Lambdas in Haskell

Now, moving on to what lambdas look like in Haskell. Here’s a lambda which accepts a single Char argument and checks if it is equal to 'a'. Remember, lambdas are just anonymous functions. So, the type of the following lambda, Char -> Bool is the same as any other function that accepts a single Char argument and returns a Bool

\x -> x == 'a'

-- GHCi> :t (\x -> x == 'a')
-- (\x -> x == 'a') :: Char -> Bool

Here’s a lambda which accepts two arguments and checks if the first is a multiple of the second. Notice that the two parameters of the lambda x & y are separated by spaces, not commas - just like regular functions.

\x y -> (mod x y == 0)

The three main differences between lambdas and regular named functions are:

  • The parameter list of lambdas start with a backslash: \

  • The parameter list of lambdas is separated from the function definition by a stabby arrow -> instead of an =, i.e.

    \x -> (mod x 2) == 0
    
    -- vs
    
    checkEven x = (mod x 2) == 0 -- this is not a lambda, we need to give it a name!
    
  • Lambdas, by default, do not have a type-signature. However, towards the end of this chapter, we will demonstrate a technique using which you can add type-signatures to lambdas for the purpose of clarity and debugging.

Commonly used higher-order functions in Haskell

In this section, let’s revisit all the mental gymnastics that we did with tail-recursion in the previous chapter, and make our lives simpler by using higher-order functions.

map

This higher-order function abstracts out the procedure of going through every element in a list and transforming it to a new value. That’s it. You cannot change the order of elements. You cannot skip elements. You cannot add elements. The lambda that you pass, will be run on every single eleemnt of the list and the output values will be collected in a list and returned. Which means, that the length of the input list and output list will always be the same. Example use-cases:

  • Convert all characters in a String to uppercase (remember, a String is just [Char]):

    GHCi> import Data.Char (toUpper)
    GHCi> map (\x -> toUpper x) "abc"
    "ABC"
    
  • Double every element in an Int list:

    GHCi> map (\i -> i * 2) [1, 2, 3]
    [2, 4, 6]
    
  • Convert every character in a string to an Int:

    GHCi> import Data.Char (digitToInt)
    GHCi> map (\c -> digitToInt c) "3345"
    [3, 3, 4, 5]
    

Detour: Understanding the type-signature of “map”

Take a look at the signature of map in GHCi and compare it with, say, Data.Char.toUpper

GHCi> :t map
map :: (a -> b) -> [a] -> [b]

GHCi> :t toUpper
toUpper :: Char -> Char

toUpper seems to have types that you recognize. It takes a Char and gives you back a Char. On the other hand, map doesn’t seem to have any recognizable type at all - just a and b! What are these a and b?

Note

Legend has it, that if you can understand how to read such type signatures you don’t need API docs at all! Just the type signatures will tell you how to use the function :)

If you notice in the examples above, once we passed a Char list to map (conversion to uppercase AND converting all characters to integers). The other time we passed an Int list to map (doubling every number in a list). And all three times it worked! Same function, different types.

That is, once we passed a lambda of type Char -> Char, another time we passed an Int -> Int, and yet another time we passed an Char -> Int. Again, it worked without a compile error.

Do you notice a pattern here?

(Char -> Char) -> [Char] -> [Char]
(Int  -> Int)  -> [Int]  -> [Int]
(Char -> Int)  -> [Char] -> [Int]
(a    -> b)    -> [a]    -> [b]

This is what is called a polymorphic function in Haskell. Any type that is represented by lowercase identifiers in the type-signature, like a, b, server, file, instead of Server or File, will work if you pass values of any type to it. However, these “type-variables” will specify a relationship between the various arguments of the function. In the case of map the type-signature is telling us that the lambda (first argument) should accept elements of the same type that are passed in the second argument, because one is represented by a and the other by [a]. It’s also telling us that the map function will return a list containing elements of the same type as the type returned by the lambda (one is b and the other is [b]).

Now, try violating this relationship between the types of the arguments that you are passing to map. Try calling map with arguments that do not follow this pattern, and stare at the big-fat compile error that it gives (note, we are using intToDigit, the reverse of digitToInt in the following example):

GHCi> import Data.Char (intToDigit)
GHCi> :t intToDigit
intToDigit :: Int -> Char

GHCi> map (\x -> intToDigit x) "123"

-- <interactive>:81:26: error:
--   • Couldn't match type ‘Char’ with ‘Int’
--    Expected type: [Int]
--      Actual type: [Char]
--  • In the second argument of ‘map’, namely ‘"123"’
--    In the expression: map (\ x -> intToDigit x) "123"
--    In an equation for ‘it’: it = map (\ x -> intToDigit x) "123"

You intuitively know why this wouldn’t work. You want to transform a list of Char, but the lambda that you have supplied is accepting an Int. You can’t pass a Char to a function that is expecting an Int.

Note

Even though the lambda does not have an explicit type-signature, Haskell intelligently guessed that the type of x in the lambda must be an Int because x is being passed to intToDigit, which accepts only Int arguments.

But, linking this back to the type-signature of the map function, here’s why this is giving a compile error:

(a   -> b)    -> [a]    -> [b]
(Int -> Char) -> [Char] -> [Int]

For this particular function call, if a is Char (because the second argument is a [Char]), then it cannot simultaneously be an Int (becuase that’s what the type-signature of the lambda says). In dynamically typed languages, like Ruby, Python, or Javascript, the language would have happily accepted this erroneous code, only to blow-up once you run it.

Important

A note about parentheses in type-signatures

Understand the different between the following type signatures:

(a -> b) -> [a] -> [b]

-- vs

a -> b -> [a] -> [b]

The first is a function which accepts two arguments. The first argument is a lambda/function of the type a -> b and the second argument is a list of type [a].

The second is a function which accepts three arguments. The first is a, the second is b and the third is [b]

Whenever you see parentheses in a type-signature, remember that you need to pass a function/lambda of that shape/type to that argument.

filter

Filter allows us to select elements, which satisy a given condition, from a list. It is abstracts away the repetitive procedure of going through every element in an input list, checking each element against a condition, and collecting all elements which pass that condition into a new list.

Note

The filter function does not change the list that you pass it. The original list is untouched. Instead a new list is returned. In fact, this is a common theme across Haskell. Functions do NOT modify the values that you pass them. They operate on the values and return a new value. This is why there is no “pass by value” or “pass by reference” in Haskell (like it is in C / C++).

Let’s take a look at the type of filter:

GHCi> :t filter
filter :: (a -> Bool) -> [a] -> [a]

Here’s what the type tells us:

  • Because of the parentheses in the signature, the first argument needs to be a lambda/function that accepts a value of a type a and return a Bool
  • The second argument must be a list containing elements of the same type a.
  • Finally, the function will return a list of type [a] (the same type that we passed). Note, the type of the list is same, the values will be different.

Let’s see it in action. Filter out only even numbers from a list:

GHCi> filter (\x -> (mod x 2) == 0) [1, 2, 3, 4, 5, 6]
[2, 4, 6]

Filter out only alphabets from a given string (i.e. strip out all punctuation characters):

GHCi> import Data.Char (isAlpha)
GHCi> filter (\x -> isAlpha x) "Hello world!"
"Helloworld"

Remove blank strings from a list of strings:

GHCi> filter (\x -> x /= "") ["hello", "", "world"]
["hello", "world"]

foldl’

Warning

There is another function called foldl (notice the missing ') at the end. While foldl and foldl' are very similar, there is a very subtle difference, which makes foldl unfit for most use-cases. It still exists in the standard library due to historical reasons and backward compatibility. Do not use foldl. Use foldl' instead.

The strangely named foldl' is an extremely important higher-order function. In some senses, it is the “mother function” of all higher-order functions that deal with processing lists. Using foldl' you can implement, map, filter, and every other such function that visits each element and do something with it.

To understand why foldl' is so important, consider the procedure that map abstracts away for you:

  1. Initialize an empty list and the loop index: result=[], i=0
  2. Process the i-th element of the list <== This is what we change with the lambda
  3. Append the output of this processing to the result list
  4. Increment the loop index: i=i+1
  5. Repeat from step #2, is we haven’t reached the end of the list
  6. Return the result list

Next, consider the procedure that filter abstracts away for you:

  1. Initialize an empty list and the loop index: result=[], i=0
  2. Check the condition for the i-th element of the list <== This is what we change with the lambda
  3. If the condition is true, append the output of this processing to the result list, else do not change the result list
  4. Increment the loop index: i=i+1
  5. Repeat from step #2, if we haven’t reached the end of the list
  6. Return the result list

Notice, how your lambda (step #2), has no idea of the result being accumulated in both the cases? What if your filtering logic depends on the elements that have already been filtered? For example, try writing the following filtering logic using only the filter function: Given a list of integers, select at most one occurence of each integer, i.e. if the input is [1, 2, 3, 3, 2, 4] the output should be [1, 2, 3, 4].

You will be unable to write this function using only the filter function. This is because the condition in step 2, the lambda that you are passing, needs access to the result of each step.

This is exactly what foldl' gives you. The lambda that you pass to foldl' is of the shape b -> a -> b, where the first argument is the accumulated result at the end of each iteration, the second argument is the current element being processed, and the lambda should return the new value of the accumulated result (this will be passed again to the lambda in the next iteration).

Here’s the simplified type-signature of foldl'. Please note, if you check the type of foldl' in GHCi, you will see something else. Ingore that for now. We will revisit the actual type-signature once we cover typeclasses in a later chapter.

foldl' :: (b -> a -> b) -> b -> [a] -> b

Let’s look at foldl' in action. Summing up a list of integers:

GHCi> foldl' (\total x -> total + x) 0 [1, 2, 3, 4]
10

Notice that we are passing the second argument as 0. This is the initial value of the “result” that is being accumulated at every step. This is the value that will be passed to your lambda along with the first element of the list.

Tip

Please take a pen & paper and work out the mechanics of how foldl' is working using the following table:

+----------------+---------------------+-------------+
| total          | x (current element) | new total   |
+----------------+---------------------+-------------+
| 0              | 1                   | 1           |
| 1              | 2                   | 3           |
| ...            | ...                 | ...         |
+----------------+---------------------+-------------+

Another example: Filtering out unique elements in an integer list. Note the following:

  • We are using the elem function to check if an element exists in a list
  • The accumulated value itself is a list, in this case.
  • We are using if-then-else in a single line to define the lambda in a terse manner. This is allowed by the many rules of indentation in Haskell.

Please use a pen & paper and draw a table to understand how this example is working

GHCi> foldl' (\result x -> if (x `elem` result) then result else (x:result)) [] [1, 2, 3, 3, 2, 3, 1]
[3, 2, 1]

Another example: Let us defend the initial claim that foldl' is the “mother function”. Let us implement map in terms of foldl':

-- Doubling all elements of a list...

GHCi> foldl' (\result x -> result ++ [x * 2]) [] [1, 2, 3]
[2, 4, 6]

-- ... is _functionally_ equivalent to...

GHCi> map (\x -> x * 2) [1, 2, 3]
[2, 4, 6]

foldr

foldl' processes the input list from left-to-right (hence the l in the function), and foldr processes the list from right-to-left. Here’s the simplified type signature:

(a -> b -> b) -> b -> [a] -> b

Notice how the order of arguments in the lambda are swapped (compared to foldl'

Detour: Tuples

In the previous chapter, we skipped a basic data-type of Haskell - tuples. Now that you have some understanding of how to read polymorphic type signatures, i.e. (a -> b) -> [a] -> [b], let’s discuss tuples and how they are frequently used in higher-order functions.

  • A tuple is a collection of fixed number of items (as opposed to a list, which can contain any number of items - or even zero items).
  • Items in a tuple may be of different types. (as opposed to a list, where every item in the list must be of the same type).
  • A tuple with 2 elements is a completely different type from a tuple with 3 elements. We will elaborate on this below.

What does a tuple look like? Tuples are written within parentheses () and the elements of a tuple are separated by commas. For example:

GHCi> :t ('a', 'b')
('a', 'b') :: (Char, Char)

GHCi> :t ('a', True)
('a', True) :: (Char, Bool)

GHCi> :t ('a', 'b', True, False)
('a', 'b', True, False) :: (Char, Char, Bool, Bool)

If you go back to the Defining functions section in the previous chapter, and see the code-snippet where we were erroneously passing (function, arguments, in, parentheses) you will now be able to appreciate the compiler’s error message:

Couldn't match expected type ‘Int’ with actual type ‘(Integer, Integer)’

The compiler is wondering why we are passing an (Integer, Integer) tuple for an Int argument!

Now, let’s see when you would reach out for a tuple in real code…

Returning multiple values from functions

Whenever you want to return multiple values from a function where the type and number of return-values is known at the time of writing the code (compile time, as opposed to run-time). For example, take a look at the divMod function, which divides two integers, and returns the quotient and the remainder. The return type is a 2-tuple of type (Int, Int). Again, here’s the simplified tyep-signature of divmod (what you’ll see in GHCi will be different):

divMod :: Int -> Int -> (Int, Int)

-- GHCi> divMod 10 2
-- (5, 0)

-- GHCi> divMod 10 3
-- (3, 1)

How do we use the return value of this function? Let’s understand by using divMod in a real-world example of time arithemetic. Say, you want to write a function that calculates the end-time of an event, given the start-time and the duration of the event (in minutes):

calculateEndTime :: Int -> Int -> Int -> (Int, Int)
calculateEndTime hr mn durationInMins =
  let (addHour, finalMins) = divMod (mn + durationInMins) 60
  in (hr + addHour, finalMins)

-- GHCi> calculateEndTime 10 30 45
-- (11, 15)

-- GHCi> calculateEndTime 10 30 115
-- (12, 25)

Note

Quick note about let..in

Note: We suddenly introduced a new let...in syntax in this function. We will discuss this in greater details in later chatpers, but for now you can intuitively understand what it means. The structure of a let..in expression is the following:

let binding1 = expression1
    binding2 = expression2
    binding3 = expression3
in <some expression, or function call, that uses binding1, binding2, and binding3>

The “return value” of a let...in expression is the whatever the expression after in evaluates to.

So, in our example, the let is creating two bindings – addHour and finalMins, corresponding to the values of the 2-tuple returned by divMod. These are then used in building another tuple (hr + addHour, finalMins), which is the “return value” of the let...in expression, which, in turn, is the “return value” of the calculateEndTime function.

At the time of writing the code, we know that this function will always return exactly two values - the quotient and the remainder. Now, could the divMod function have returned a [Int]? Yes it could have. In fact, that’s exactly how the divmod function in Ruby’s standard library is written.

Why is, then, a tuple preferred over a list in this case? Because of correctness. When you call a function that returns a list, you can never be sure of how many elements it has. It can have two, three, 100, or even zero. If at the time of writing the code (compile-time) you know that a particular function will always return two values, you can encode that guarantee by picking the right type, i.e. a 2-tuple. This way every user of the function can stop worrying about the edge-case of getting a list which does not contain exactly two values. In the future, if you have to evolve your function and it now returns a 3-tuple, the compiler is going to help you by pointing out all the places where you were expecting the function to return a 2-tuple, but it now returning a 3-tuple instead. There would be a very good reason why the function is returning 3 values intead of 2 now, right? All the places where you have used the function now also need to deal with the third value. If you had used a list instead, the compiler wouldn’t be able to help you, and you would have to actually run the code and give it various inputs to see where it is breaking. This is the power of type-safety.

PS: In fact, Python’s divmod function also returns a 2-tuple instead of a list.

Ad-hoc way to “associate” elements

A lot of times you need a way to temporarily “associate” one piece of data with another piece to achieve the larger goal of the app. For example, you are writing a phone-book (or contact-list) app and have to build a UI which looks like the following:

-- A -------------------
      Adam
      Anuarg
      Ashton
-- B -------------------
      Bhushan
      Bimal
      Banksy
      Beth
-- C -------------------
      Charlie
      Chaman
      Chilgoza

(and so on)

In all probability you will query the “Contacts API” on the phone, which will return a flat list of all the contacts:

-- The list may not necessarily be sorted alphabetically
--
[ "Adam", "Bhushan", "Bimal", "Chilgoza", "Beth", "Anurag"
, "Ashton", "Chaman", "Charlie", "Banksky",
-- and more
]

Now, on the UI, if you notice there are two lists being displayed. One list is the alphabets - A, B, C, D… - and the other list is the contacts “associated” with that alphabet.

How will you process the flat list into a simpler data-structure that your UI can easily loop over and display the final output?

If you have spent some time with dynamically typed languages (like Javascript, Ruby, Python), you may be tempted to reach out for something that looks like a nested list, i.e.:

[ "a", ["Adam", "Anurag", "Ashtong"]
, "b", ["Bhushan", "Bimal", "Banksky", "Beth"
, "c", ["Charlie", "Chaman", "Chilgoza"]
]

While, this would be an acceptable solution in many dynamic languages (though still not the best), it won’t work in Haskell at all. Remember, that, in Haskell all elements in a list should be of the same type. However, in the data-structure given above, the list elements are alternating between String and [String].

Aha, let’s fix this problem, by making it a [[String]]. What about the following:

[ ["a", "Adam", "Anurag", "Ashtong"]
, ["b", "Bhushan", "Bimal", "Banksky", "Beth"
, ["c", "Charlie", "Chaman", "Chilgoza"]
]

Now this is a valid list in Haskell of the type [[String]], but this has an implicit assumption that the first element of every inner list is going to the alphabet, and the other elements are going to be the names. This is the same problem as the divmod example given above. At the time of writing the code (compile time), we know that the first element must be an alphabet and the others must be names. However, our data-structure is not enforcing this. This is bound to cause bugs – if not today, then three weeks later when someone else makes some changes to the code (say, the UI needs to change to display the name and the phone).

Using Haskell’s type-system to our advantage, we can make this implicit assumption completely explicit and use the compiler to help us prevent bugs. A better data structure for something like this would be of the type: [(Char, [String]] – a list of 2-tuples. For example:

[ ('a', ["Adam", "Anurag", "Ashtong"])
, ('b', ["Bhushan", "Bimal", "Banksky", "Beth"])
, ('c', ["Charlie", "Chaman", "Chilgoza"])
]

Notice, that because we are now using a tuple, we get the added advantage of using a Char to represent the alphabet. This makes another implicit assumption absolutely explicit – it will no longer permit us to push a multi-letter string in place of what should have been a single alphabet. This will actually help the UI as well – while displaying each header in the contact list, the UI can stop worrying about the edge-case that it may get a mulit-letter string instead of a single alphabet – the types wont allow it!

Passing multiple values to a lambda which accept only a single parameter

Now, connected to the previous example (contact list / phone book), how do we actually transform our flat list of contacts to a list of tuples? That is, we want to build the following data transformation:

Transform this =>
[ "Adam", "Bhushan", "Bimal", "Chilgoza", "Beth", "Anurag", ...]

To this =>
[ ('a', ["Adam", "Anurag", ...]
, ('b', ["Beth", "Bhushan", ...]
, ('c', ["Chaman, "Charlie", ...]
]

This is where we will use tuples to pass multiple values to a lambda which accepts only a single value.

Let’s elaborate this with an example of foldl':

foldl' :: (b -> a -> b) -> b -> [a] -> b

Here, foldl' is giving you access to a single accumulator variable, of any type b. What if you want to write logic where you need access to two accumulator variables (which seems to be what we need in our case)? Tuples to the rescue!

Remember, that b cany be any type. foldl' doesn’t care. All it says is that, the lambda’s first argument, the lambda’s return value, and the initial value passed to the foldl' (which is technically the second argument) should be of the same type. Well, in that case b can very well be an n-tuple, if we need it to be!

Let’s see…

module ContactList where

import Data.Char (toLower)
import Data.List (foldl')

groupNamesByAlphabet :: [String] -> (Char, [String])
groupNamesByAlphabet names =
  foldl'

    -- the lambda
    (\ (alphabet, collectedNames) name ->
        if (length name > 0) && (toLower (head name) == (toLower alphabet))
        then (alphabet, collectedNames ++ [name])
        else (alphabet, collectedNames)
    )

    -- the initial value
    ('a', [])

    -- the list
    names

-- GHCi> :l ContactList
-- GHCi> groupNamesByAlphabet [ "Adam", "Bhushan", "Bimal", "Chilgoza", "Beth", "Anurag", "Ashton", "Chaman", "Charlie", "Banksky"]
  • We’ve written a function that gives us a (Char, [String]) back, but only for the names that start with a
  • Notice how we have split the arguments to foldl' across multiple lines for better readability. This is allowed in Haskell’s many indentation rules.
  • Notice, how the lambda itself is split across multiple line - again allowed by Haskell’s indentation rules.
  • The lambda itself is defined using a new-ish syntax, called “pattern matching”, discussed below.
  • The alphabet that we want names for, is actually being passed as the first element of the tuple to the lambda along with the names collected so far (second element of the tuple).

Note

Quick note about pattern matching

If you notice, in the example we discussed above, the lambda was defined using a new syntax:

\ (alphabet, collectedNames) name -> ...

This is an example of “pattern matching” which we will cover in greater detail in the next chapter. For now, you can intuitively understand what it is doing. The two parameters of the function are separated by spaces, as usual (Note: The lambda is still accepting just two parameters, don’t let the 2-tuple confuse you). On top of that, we are further destructuring the first parameter, which is a 2-tuple, into alphabet and collectedNames. Try changing that to a 4-tuple, like (x, y, z, w) or a 3-tuple, liek (x, y, z) and notice how the compiler won’t allow you to do that. Again, type-safety.

However, we still haven’t built the complete data transformation that we originally wanted. Let’s build upon this idea. First, let’s change our function to work for any alphabet that we pass-in:

 groupNamesByAlphabet :: Char -> [String] -> (Char, [String])
 groupNamesByAlphabet alpha names =
   foldl'

   -- the lambda
   (\ (alphabet, collectedNames) name ->
       if (length name > 0) && (toLower (head name) == (toLower alphabet))
       then (alphabet, collectedNames ++ [name])
       else (alphabet, collectedNames)
   )

   -- the initial value
   (alpha, [])

   -- the list
   names

 -- GHCi> :r
 -- GHCi> groupNamesByAlphabet 'b' [ "Adam", "Bhushan", "Bimal", "Chilgoza", "Beth", "Anurag", "Ashton", "Chaman", "Charlie", "Banksky"]
 -- ('b',["Bhushan","Bimal","Beth","Banksky"])
 --
 -- GHCi> groupNamesByAlphabet 'c' [ "Adam", "Bhushan", "Bimal", "Chilgoza", "Beth", "Anurag", "Ashton", "Chaman", "Charlie", "Banksky"]
 -- ('c',["Chilgoza","Chaman","Charlie"])

Finally, let’s wrap this up in a map to make it work for all alphabets:

module ContactList where

import Data.Char (toLower)
import Data.List (foldl')

groupNamesByAlphabet :: Char -> [String] -> (Char, [String])
groupNamesByAlphabet alpha names =
  foldl'

    -- the lambda
    (\ (alphabet, collectedNames) name ->
        if (length name > 0) && (toLower (head name) == (toLower alphabet))
        then (alphabet, collectedNames ++ [name])
        else (alphabet, collectedNames)
    )

    -- the initial value
    (alpha, [])

    -- the list
    names

groupNamesByAllAlphabets :: [String] -> [(Char, [String])]
groupNamesByAllAlphabets names =
  map (\alphabet -> groupNamesByAlphabet alphabet names) "abcdefghijklmnopqwxyz"

-- GHCi> :r
-- GHCi> groupNamesByAllAlphabets  [ "Adam", "Bhushan", "Bimal", "Chilgoza", "Beth", "Anurag", "Ashton", "Chaman", "Charlie", "Banksky"]
-- [('a',["Adam","Anurag","Ashton"]),('b',["Bhushan","Bimal","Beth","Banksky"]),('c',["Chilgoza","Chaman","Charlie"]),
-- ('d',[]),('e',[]),('f',[]),('g',[]),('h',[]),('i',[]),('j',[]),('k',[]),('l',[]),('m',[]),('n',[]),('o',[]),
-- ('p',[]),('q',[]),('w',[]),('x',[]),('y',[]),('z',[])]

Now, what if we dont want alphabets that don’t have any names? Well, it’s just a filter away:

groupNamesByAllAlphabets :: [String] -> [(Char, [String])]
groupNamesByAllAlphabets names =
  filter

    -- the filter condition -- notice how it's destructuring the tuple...
    (\ (alpha, collectedName) -> length (collectedNames) > 0)

    -- the list that we want to filter...
    (map (\alphabet -> groupNamesByAlphabet alphabet names) "abcdefghijklmnopqwxyz")

-- GHCi> :r
-- GHCi> groupNamesByAllAlphabets  [ "Adam", "Bhushan", "Bimal", "Chilgoza", "Beth", "Anurag", "Ashton", "Chaman", "Charlie", "Banksky"]
-- [('a',["Adam","Anurag","Ashton"]),('b',["Bhushan","Bimal","Beth","Banksky"]),('c',["Chilgoza","Chaman","Charlie"])]

Let’s look at another example:

For example, what if we want to use foldl' to compute the first 10 terms of the Fibonacci sequence? To calculate the n-th term we’ll need access to the (n-1)-th term, and the (n-2)-th term at each step. But, foldl' allows us to retain only one variable/value between steps? No problem – let’s use a 3-tuple…

GHCi> foldl' (\ (x, y, fiboList) i -> (y, x + y, fiboList ++ [x+y])) (0, 1, []) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
(55,89,[1,2,3,5,8,13,21,34,55,89])

Let’s align the call to foldl' with it’s type-signature to understand what’s going on:

foldl' :: (   b               -> a   ->  b)                            ->  b        ->  [a]
foldl'    (\ (x, y, fiboList)    i   ->  (y, x + y, fiboList ++ [x+y]))   (0, 1, [])    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
  • In this function call, type a is Int and type b is (Int, Int, [Int]). The latter is our 3-tuple saviour!

  • What do the 3 elements in the tuple represent: - (n-1)the term of the Fibo series - n-th term of te Fibo series - Fibo series containing n-terms

  • The lambda evaluates to (y, x + y, fiboList ++ [x+y]), which is of the same type (Int, Int, [Int]). It calculates the n-th element of the Fibo series and also appends it to the 3rd element of the tuple.

  • The initial value being passed to foldl' is (0, 1, []), which is of the same type (Int, Int, [Int]). It’s the first two terms of the Fibo series, along with an empty-list (which indicates that we have not accumulated any terms of the Fibo series, yet).

  • The third argument to foldl' is [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] which is of type [Int]. Each element of this list is passed as the second argument to the lambda (the first argument is our 3-tuple accumulator). But the lambda doesn’t seem to be doing anything with this value! That is correct. The actual numbers in this list don’t make any difference; only that the number of elements should be 10 (because we want the first 10 terms of the Fibo series). Try the following in GHCi, and verify this for yourself:

    GHCi> foldl' (\ (x, y, fiboList) i -> (y, x + y, fiboList ++ [x+y])) (0, 1, []) [14, 19, 2, 8, 8, 9, 20, 12, 22, 29]
    

zip, zipWith, and friends

Now, with the detour about tuples out of the way, let’s look at zip and friends. First let’s understand why they are required.

Using only map, filter and foldl', try solving the following problem. Suppose, for some government certification, required for your business, you had to find out the days on which a government office is going to be functioning AND the governemnt secretary is going to be present. Now, as it is the case with most government departments, these two pieces of data are made available separately by separate people who don’t really talk to each other. Thankfully, they have provided this data in the form of Bool lists:

-- 31 elements to represent 31 days. Each Bool value
-- represents whether the governemnt office is open, or not.
[ True, True, True, True, True, False, False
, True, True, True, True, True, False, False
, True, True, True, True, True, False, False
, True, True, True, True, True, False, False
, True, True, True
]


-- 31 elements to represent 31 days. Each Bool value
-- represents whether the governemnt secretary is working,
-- or, on leave.
[ True, True, True, True, True, False, False
, False, False, False, False, False, False, False
, False, False, False, False, False, False, False
, True, True, True, True, True, False, False
, False, False, False
]

So, basically you need to write a function that takes these lists as an input and returns a “unified” list:

daysOfActualWork :: [Bool] -> [Bool] -> [Bool]
daysOfActualWork officeOpenDays secretaryPresentDays = _todo

Seriously! Don’t read ahead. Try solving the problem only with map, filter or foldl' to really appreciate why zipWith is required.

Here’s a possible solution with foldl' -

Show/hide code
-- Here's a possible, but, ugly, solution

daysOfActualWork :: [Bool] -> [Bool] -> (Int, [Bool])
daysOfActualWork officeOpenDays secretaryPresentDays =
  foldl'
    (\(i, finalList) isOfficeOpen ->
      (i + 1, finalList ++ [isOfficeOpen && (secretaryPresentDays !! i)])
    (0, [])
    officeOpenDays

We had to resort to the !! operator to lookup the secretaryPresentDays list by index AND we had to resort to manually preserving a loop counter. If you really think about it, this isn’t very different from writing a for loop directly. Writing code like this doesn’t really fulfil the reason of why we wanted to use lambdas and higher-order functions in the first place.

Now, look at how you can solve this with zipWith:

zipWith (\isOfficeOpen isSecretaryPresent -> isOfficeOpen && isSecretaryPresent) officeOpenDays secretaryPresentDays

That’s it! Check the type-signature of Data.List.zipWith and see if you can make sense of it. Next, check the type-signature of Data.List.zip and see if you can make sense of it. Make sure you play around with zip and zipWith and understand how they behave in the edge-case that you pass lists of different lengths to them.

There is a also a zip3, zipWith3, zip4, zipWith4, and so on, till zipWith7, if you ever need them. There is a also a series of unzip functions that exist.

Lots & lots of other functions

There are many, many higher-order functions in Haskell libraries. The abstraction being provided by (almost) every higher-order fucntion can be re-implemented by force-fitting a foldl'. But it is not a good idea to do that. Most of these higher-order funcions “abstract away” common procedures. The choice of higher-order function should immediately communicate your intent to the reader (the common procedure that you have abstracted away). Using a foldl' doesn’t always do that. Take a look at the zipWith example above. What kind of code would you rather take responsibility of?

Please browse around the documentation of Data.List. There are a lot of useful higher-order functions given there. To name a few: takeWhile, dropWhile, dropWhileEnd, break, groupBy, partition, sortOn, deleteBy, deleteFirstBy, and many more.

There are also a host of useful functions in the Data.List module, which aren’t necessarily higher-order functions, but you will find yourself reaching out for them every now & then when writing real-world code. For example: reverse, intercalate, replicate, take, drop, splitAt, etc. Remember: since String is just a list of Char, all of these functions can be used with String as well. They can obviously be used with a list of any type.

Exercises

Reimplementing old problems in a more Haskell-y way

We used tail-recursion to solve all the problems of the previous chapter to completely unlearn our muscle-memory of writing for loops and changing values of variables (called “mutation” in the FP world). Now, we’ll solve them again using a more real-world Haskell approach:

(Note: In all the exercises given below, the indicative functions are only the higher-order functions. You are free to use other utiliity functions like mod, toUpper, digitToInt, etc. as required by the solution)

  1. Sum a list of integers using foldl'
  2. Sum a list of integers using: (a) just foldl', and (b) combination of foldl' and filter
  3. Double a list of integers using map
  4. Check if all characters in a String are contained within another String: Use foldl' & elem
  5. Generating a list of all even numbers till “N”: (a) Use foldl' alone, (b) Use filter and the short-hand of generating a list containing all number within a range:[1..n]
  6. Generate a list of first “N” even numbers: (a) Use foldl' alone, (b) Use map and [1..x] short-hand syntax
  7. Sum of first “N” multiples of 3 or 5: (a) Use foldl' alone
  8. Sum square difference: Use foldl'
  9. Even Fibonacci numbers: Modify the solution already given above using filter
  10. ISBN Verifier: (a) Build your solution using only foldl', (b) modify you solution to use zip or zipWith at the appropriate place.
  11. Pangram: (a) write using only foldl', (b) write it using map and filter, (c) write it using any
  12. Reimplement the contact list (phone book) example from above using a higher order function that seems more approriate. (There is no single right answer to this question).

Add line numbers to source code

Write a function that accepts lines-of-code, represented as a list of Strings, and return a new list where each line-of-code is pre-fixed with the line number?

prefixLineNumbers :: [String] -> [String]
prefixLineNumbers = _todo

-- prefixLineNumbers
--  [ "module Main where"
--  , ""
--  , "increment :: Int -> Int"
--  , "increment x = x + 1"
--  ]
--
-- OUTPUT:
-- [ "1: module Main where"
-- , "2: "
-- , "3: increment :: Int -> Int"
-- , "4: increment x = x + 1" ]
  • First write it using only functions and recursion and if-then-else.
  • Write it using map. Is it possible? If not, why?
  • Write it using foldl'
  • Write it using zip
  • Write it using zipWith

Do you still miss your loop indices – i, j or idx?

Generalized Fibonacci selector

Write a function that can select the first N terms of the Fibonacci series that match a given condition. Take note:

  • The result of the function itself should contain N terms, i.e. if the first 3 even terms are requested, the result should be: [2, 8, 34]. Similarly, if the first two multiples-of-3 are requested, the result should be: [3, 21]
  • You should consider writing your own higher-order function (write a function that accepts the condition as a lambda)

Exam-score processing

Say, you have parsed the exam-scores of all students from a CSV, and have it in the following data-structure:

[ ("Saurabh Nanda", [("English", 84), ("Chemistry", 80), ("Physics", 95), ("Geography", 75)])
, ("John Doe", [("Chemistry", 80), ("Physics", 95), ("Geography", 75)])
, ("Jane Doe", [("Chemistry", 66), ("Phsyics", 33), ("Geography", 56)]) -- Note the spelling error in "Physics"
, ("John Doe", [("Chemistry", 90), ("Economics", 45), ("Geography", 56)]) -- Note that "John Doe" is a repeat entry
, ("Bahubali", [("Hindi", 45), ("Biology", -90), ("Geography", -75)]) -- Note the negative marks in "Biology" & "Geography"
, ("Rajnikant", [("Tamil", 110), ("Biology", 100), ("Geography", 100)]) -- Note that marks in "Tamil" are greater than 100
...
]

(Can you work out the type of the data-structure given above without inspecting it in GHCi?)

Further, the list of known subjects is also available to you in the following data-structure:

[ "English"
, "Geography"
, "Physics"
, "Chemistry"
, "Economics"
, "Computer Science"
...
]

Write the following functions:

  1. Calculate the average marks for a given subject taking care that invalid entries are not included in the calculation.
  • All scores that are negative should be considered invalid
  • All scores that are greater than 100 should be considered invalid
  • All scores where the subject name is not known should be considered invalid
  • All entries where the student’s name is duplicated should be considered invalid, i.e. none of the scores for those entries should be considered.
  1. Calculate the standard deviation for a given subject taking care of invalid entries.
  2. List all the students whose names are duplicated
  3. List all students, who have invalid scores in any subject, along with the subjects in which they have invalid scores and the reason why the score is considered invalid. For the sample data given above, the result should be:
[ ("Jane Doe", [("Phsyics", 33, "Unknown subject")]
, ("Bahubali", [("Biology", -90, "Negative score"), ("Geography", -75, "Negative score")])
, ("Rajnikant", [("Tamil", 110, "Score is greater than 100")]
]
  1. Give a list of all subjects and the students who took an exam in that subject, taking care of invalid entries & scores. For example:
[ ("English", ["Saurabh Nanda"])
, ("Geography", ["Saurabh Nanda", "Jane Doe", "Rajnikant"])
, ("Physics", ["Saurabh Nanda"])
, ("Chemistry", ["Saurabh Nanda", "Jane Doe"]
, ("Economics", [])
, ("Computer Science", [])
]
  1. Group together all students that have taken the exact same exams, taking care of invalid entries & scores. Invalid scores can be considered as exams not taken. The result should be of the type [([String], [String])] where the first element of each tuple is the list of exams, and the second element is the list of students who have taken those exams.

Follow-up: Generalize your solution for 1, 2, & 4 by writing your own higher-order function. Generalize your solution for 5 & 6 by writing your own higher-order function.

Stuff that you might struggle with

Thinking “functionally” vs “imperatively”

You will probably have the logic for most of the exercises in your head, but you might struggle with actually writing the logic using only map, fold, or zip. Do not worry - this is expected. There is a bit of re-wiring that your brain needs to undergo, to start thinking functionally. That’s the whole point of this chapter. Do NOT skip the exercises.

Is there a function to do X ?

If you’ve been spending time on the API-documentation (which you should be doing), you would have noticed that map, fold and zip are not the only higher-order functions that are available. There seem to be dozens of variatons of map, fold and zip for common use-cases. Initially, you might end-up re-implementing a higher-order function that is already present in the standard library (most probably using a foldl'). However, as your familiarity with the standard library increases, you will realise that a lot of code can be reduced to a one-liner by using the correct higher-order function. Do not be afraid to go back and refactor your code when you learn of something new. Refactoring is a super-power in Haskell - if you mess-up something while refactoring, the compiler is going to shout at you till you fix it. So, don’t be afraid to refactor.

What the hell is going on? Can I print the value of this variable?

You have been explicitly forbidden to write output to the terminal - GHCi will automtically print the return value of your function. There is a reason for doing this, which we will cover later. However, if, for debugging you need to print values to the terminal use various functions from the Debug.Trace module.

For example, here’s how you can use traceShow to print the values passed to your lambda, while summing a list of itnegers:

GHCi> import Debug.Trace (traceShow)
GHCi> import Data.List (foldl')
GHCi> foldl' (\total x -> traceShow (total, x) (total + x)) 0 [1, 2, 3, 4, 5]

The simplified type-signature of traceShow is:

traceShow :: a -> b -> b

You can pass it two arguments of any type, it will print the first argument to the console, and will return the second argument as it is. Which means that traceShow will always evaluate to the second argument passed to it. So, in our example above (total, x) is being printed, but the function call to traceShow is evaluating to total + x; and, as a result, the entire lambda is evaluating to total + x – thus our summation logic is not changing. In other words, (total, x), which is being printed has absolutely no effect on the “return value” of traceShow. The “return value” of traceShow will always be the second argument you pass to it.