Link Search Menu Expand Document

Motivation

I am generally not a fan of using AWS unless you truly have scalability concerns. I find AWS to be too expensive [1] and too complicated. I prefer hosting on bare-metal servers, and configuring all my infrastructure services (eg. Postgres, nginx, haproxy, etc.) by-hand instead.

Having said that, at Vacation Labs, we have one problem that is best solved by off-loading it to a scalable computing platform. We have a particular “computation” (for the lack of a better word), that needs to be performed for every single account/customer using our platform, every time we update a particular part of our codebase. This “computation” needs to be finished rather quickly. Running it serially for all customers, takes 3-4 hours. Running it on an 8-core machine, with 16 - 64 parallel threads, brings down the execution time to under an hour, but it is still too long.

So, we were forced to introduce another moving part into our production infra by offloading this “computation” to AWS Lambda for immediate, and massive, parallelism. Unfortunately, when we wrote this code (in 2017-2018), it seems that the only way to run Haskell in AWS Lambda was via the serverless-haskell platform/toolkit. To make matters worse, the Linux distro that is actually installed on AWS’ servers is not a standard Ubuntu installation (it’s “AmazonLinux” instead). So, not only were we forced to deal with a new tool (serverless-haskell), we were also forced to perform this build in a separate docker container, thus, complicating our CI builds (and introducing even more moving parts.

While this solution got the job done, I was never happy with it. It had too many moving parts. Every time this piece of code changed hands, the next person had a nightmare wrapping their head around all these parts, while also struggling with setting this up locally for development and testing.

As of 2020, there is a much better way. This repo, and accompanying documentation, is a walk-through of a simpler way to get Haskell running on AWS Lambda. This simpler way involves using AWS Lambda Custom Runtimes.

[1] There are cheaper (and simpler) cloud providers, like Digital Ocean or Hetzner. If you aren’t using auto-scaling, consider hosting on a beefy bare-metal server for a fraction of the cost (compared to a cloud provider).