Link

Compiling a custom Haskell runtime in Docker

Now, to introduce an unavoidable moving part to your build system. You can NOT build your bootstrap binary (i.e. your custom runtime) on your local machine (or your CI server), upload it to AWS Lambda, and expect it to work without glitches.

This is because the OS that finally executes your Lambda Function, is not your regular Ubuntu/Debian/Centos/etc. It’s a completely different Linux distribution, called AmazonLinux. It is an extremely minimal OS and does not have a number of native libraries that your Haskell runtime may depend on. Leave aside additional native libraries, such as, libpq (postgres) or libimagic, even the version of the core libc on your build machine may be different from what is available on AmazonLinux. If that happens - which in all probability it will - your bootstrap binary will refuse to start.

In theory, there are two ways around this problem:

  1. Statically linking your bootstrap binary, so that it does not have any dynamic library dependencies
  2. Building your bootstrap binary in an AmazonLinux Docker container, so that once it runs on actual AWS Lambda, the available libraries are the same ones that had been used during build-time.

Static-linking is a waste of time

I have tried the first approach, i.e. statically linking your binary. It is a world of pain. The “correct” way of building a statically linked Haskell program is not well-known. You will waste a lot of time with incomplete/incorrect instructions in various blog posts and Reddit discussions. DO NOT WASTE YOUR TIME WITH THEM.

Complication with Docker builds

The second approach of using a Docker contains is also not that simple. Firstly, it introduces a new moving part in your build pipeline. Secondly, you need to deal with the fact that you do not have the ability to install native libaries on the underlying machines that AWS Lambda uses. Within the Docker build container, you have root access, and can install any native library your code needs. However, on actual AWS Lambda, your bootstrap binary will run on a bare-bones OS, which will not have the native libraries that you’re expecting.

Thankfully, AWS Lambda adds ${LAMBDA_TASK_ROOT}/lib to the LD_LIBRARY_PATH environment variable. This is a directory that you have control over. Here’s how: the ZIP file containing your code is unzipped and put into LAMBDA_TASK_ROOT; so, if you add native libraries to the lib folder within your ZIP file, they will show-up in ${LAMBDA_TASK_ROOT}/lib at your bootstrap binary will be able to automatically access them, thanks to the way how LD_LIBRARY_PATH works.

Here’s what the complete LD_LIBRARY_PATH looked like at the time I was researching/preparing the solution outlined below:

LD_LIBRARY_PATH: /lib64:/usr/lib64:/var/runtime:/var/runtime/lib:/var/task:/var/task/lib:/opt/lib

Automatically packaging custom/non-standard native libraries

Now, this leads to (hopefully) the final problem that we need to solve, i.e. how do we determine which native libraries to add to <zip-file>/lib?

The simpler, but error-probe way to do this is to be carefully analyze all Haskell packages your Lambda Function depends on, then analyze which native libraries these Haskell packages depend on, and then copy the correct versions of those native libraries to <zip-file>/lib.

Alternatively, you can use the aws-lambda-packager utility to automate this process for you. Here’s how it conceptually works:

  1. Right at the top of the Dockerfile (i.e. with a bare amazonlinux docker image), we capture the default libraries available on the AmazonLinux distribution.
  2. After compiling the bootstrap binary, we determine it’s dynamic library dependencies by parsing the output of ldd
  3. Then we figure out the extra libraries that bootstrap binary needs, i.e. they aren’t in the default library set (as determined in step #1)
  4. Finally, we copy only these extra/custom libraries to <zip-file>/lib

Final working solution with Docker

Here’s the Dockerfile that you can use as a starting point for your own builds.

One neat hack is to use du to recursively list all files within a directory along with the complete path. I wasn’t able to find a way to do this using ls, and unfortunately find is not available on the bare-bones AmazonLinux OS. (I found this hack in this StackOverflow answer)

# NOTE: This Docker image is required ONLY to compile the lambda-haskell
# component that is deployed to AWS Lambda. This is the reason why it is not
# being built on an ubuntu container but on an amazonlinux container. The core
# OS libraries, like glibc, etc. are sufficiently different between the two
# systems, that a binary compiled on Ubunut 18.04 cannot work when deployed to
# AWS Lambda.
FROM amazonlinux:2018.03.0.20191014.0 

SHELL ["/bin/bash", "--rcfile", "~/.profile", "-c"]

################################
#### Switching to root user
################################

USER root

# Saving default system libraries before doing anything else
RUN du -a /lib64 /usr/lib64 | cut -f2 > /root/default-libraries

# Installing basic dependencies
RUN yum install -y \
    git-core \
    tar \
    sudo \
    xz \
    make \
    gmp-devel \
    postgresql-devel \
    libicu libicu-devel \
    libyaml libyaml-devel

RUN yum groupinstall -y "Development Tools" "Development Libraries"

# Installing Haskell Stack
RUN sudo curl -sSL https://get.haskellstack.org/ | sh


ARG STACK_RESOLVER=lts-12.1
# Setting up GHC
RUN stack setup --resolver=${STACK_RESOLVER}

# # Installing common packages so that docker builds are faster
RUN stack install --resolver=${STACK_RESOLVER} aeson text bytestring unliftio async
RUN stack install --resolver=${STACK_RESOLVER} http-client http-client-tls http-types
RUN stack install --resolver=${STACK_RESOLVER} string-conv safe hsyslog postgresql-simple

RUN mkdir /root/lambda-function

COPY stack.yaml lambda-function.cabal /root/lambda-function/
RUN cd /root/lambda-function && \
    stack install --resolver=${STACK_RESOLVER} hsyslog-tcp


ARG PACKAGER_COMMIT_SHA=d3312736a38f7b93f87c313a8fb9c0798938b403
RUN cd /tmp && \
    git clone https://github.com/saurabhnanda/aws-lambda-packager.git && \
    cd /tmp/aws-lambda-packager && \
    git checkout ${PACKAGER_COMMIT_SHA} && \
    stack install --resolver=${STACK_RESOLVER}

# Copying the source code of the lambda function into the Docker container
COPY . /root/lambda-function/

# Building the lambda-function and copying it to the output directory
RUN cd /root/lambda-function && \
    stack build

ARG OUTPUT_DIR=/root/output
RUN mkdir ${OUTPUT_DIR} && \
    mkdir ${OUTPUT_DIR}/lib

RUN cp $(stack path --local-install-root)/bin/bootstrap ${OUTPUT_DIR}/bootstrap

# Finally, copying over all custom/extra libraries with the help of aws-lambda-packager
RUN /root/.local/bin/aws-lambda-packager copy-custom-libraries \
    -l /root/default-libraries \
    -f /root/output/bootstrap \
    -o /root/output/lib