Custom AWS Lambda Runtime for Haskell

At Vacation Labs, we chose to write a custom runtime, because:

We wanted to log everything in our centralised syslog server, instead of Cloudfront
It is not that hard to write a custom runtime

If you do not have such unique requirements, do not roll your own runtime. Write your Lambda Function in Haskell and use AWS Lambda Haskell Runtime written by The Agile Monkeys, to package and deploy your Haskell code. Read through this section as a fun learning exercise.

Moving parts in an AWS Lambda Custom Runtime

Important: The most up-to-date documentation for Custom Runtimes is available with AWS itself. What follows next, is a quick summary of the following pages taken from the official AWS documentation: Custom AWS Lambda Runtimes, AWS Lambda Runtime Interface, and Tutorial – Publishing a Custom Runtime

Your ZIP file should contain an executable file called bootstrap. While it may optionally contain other files as well, for example, supporting libraries, other executables, etc, the bootstrap executable must be present.
When your Lambda Function is invoked, AWS is going to run your bootstrap file, but you will not receive any function arguments yet (keep reading to find out why)
Your runtime will need to take note of two important environment variables - AWS_LAMBDA_RUNTIME_API and _HANDLER
After you have initialised your custom runtime (eg. acquired common resources, like log files, DB connections, etc), and your runtime is ready to “serve” the next function invocation, it has to make an HTTP call to http://{AWS_LAMBDA_RUNTIME_API}/runtime/invocation/next.
Here’s what happens next (in a simplified manner):
1. Make HTTP GET call to http://{AWS_LAMBDA_RUNTIME_API}/runtime/invocation/next
2. Get Lambda Function arguments as resposne body (in stringified JSON format).
3. Take note of Lambda-Runtime-Aws-Request-Id response header.
4. Based on the value of _HANDLER, call the actual function in your Haskell codebase. You might need to parse the stringified JSON arguments into a Haskell value/record. Remember from earlier, that the same ZIP file can contain code for multiple Lambda Functions, distiguished by the “handler” associated with each Lambda Function.
5. Communicate the response of the Lambda Function invocation back to AWS by making an HTTP POST call to http://{AWS_LAMBDA_RUNTIME_API}/runtime/invocation/{AwsRequestId}/response. The AwsRequestId needs to be the same as obtained from step #3 above.
6. Repeat step #1 in an infinite loop. When AWS has had enough of your runtime, it will terminate it automatically.

Preliminaries

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE NamedFieldPuns #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Main where

import Network.HTTP.Client
import System.Environment
import Control.Concurrent
import Network.HTTP.Types
import Data.List as DL
import Safe (fromJustNote)
import Data.Aeson as Aeson
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as BSL
import Control.Monad (forever)
import Data.String.Conv (toS)
import Data.Functor (void)
import Data.Text (Text)
import System.Posix.Syslog.TCP as Syslog
import Data.Maybe (fromMaybe)
import Data.Either (either)
import UnliftIO.Exception (throwString, SomeException, catch, displayException, throwIO)
import HandlerTypes
import Control.Monad (forM_)
import Database.PostgreSQL.Simple

Credentials

In our custom runtime we are:

logging to a remote syslog server (because that’s one of the core requirements we had at Vacation Labs), and
also connecting to a remote Postgres server for the purpose of demonstrating how non-standard library dependencies work when building+packaging for AWS Lambda.

Therefore, you’ll have to set the following variables for this runtime to actually work. You can use the following free services:

Papertrail for a free remote syslog server
Heroku Postgres Add-on for a remotely accessible PG database server

syslogHost :: String
syslogHost = Prelude.error "You forgot to set syslogHost"

syslogPort :: String
syslogPort = Prelude.error "You forgot to set syslogPort"

dbConnString :: BS.ByteString
dbConnString = Prelude.error "You forgot to set dbConnString"

Starting with the runtime loop

The runtime loop pretty much does what is listed in the moving parts section above, i.e. it first initialises the runtime, then repeatedly calls the nextInvocation HTTP API endpoint to get function arguments (JSON payload), and runs the underlying Lambda Function for each incoming JSON payload.

main :: IO ()
main = do
  runtimeEnv@RuntimeEnv{envNextInvocationReq, envHttpManager, envWrappedFunction, envSysLogger} <- prepareRuntimeEnvironment
  envSysLogger User Info "Runtime initialisation complete"
  forever $ do
    res <- httpLbs envNextInvocationReq envHttpManager
    envWrappedFunction res

Environment/invariants throughout the lifetime of this runtime

Now, let’s zoom into runtime initialistion, which in our case is initialisation of the RuntimeEnv data-type given below. All the stuff in the RuntimeEnv record does not change as long as a particular instance of the runtime is running. This is where we acquire resources that will be used across multiple invocations. There are two notable things in this function:

the initialisation error handler (which is separate from the invocation error handler)
envWrappedFunction and invocationWrapper which are disucced in Lambda Function Invocation

data RuntimeEnv = RuntimeEnv
  { envLambdaApiBase :: !String
  , envHandlerName :: !String
  , envHttpManager :: !Manager
  , envSysLogger :: !SyslogFn
  , envNextInvocationReq :: !Request
  , envWrappedFunction :: (Response BSL.ByteString -> IO ())
  , envDbConn :: !Connection
  }

prepareRuntimeEnvironment :: IO RuntimeEnv
prepareRuntimeEnvironment = do
  envLambdaApiBase <- getEnv "AWS_LAMBDA_RUNTIME_API"

  -- Note, we have purposely put a 2 minute timeout because the AWS docs
  -- specifically mention that the runtime may be kept alive between Lamdba
  -- Function invocations, so it might take more than a few seconds for this API
  -- call to complete.
  envHttpManager <- newManager defaultManagerSettings {managerResponseTimeout = (responseTimeoutMicro $ 1000000 * 120)}

  -- Now that we have `envLambdaApiBase` and `envHttpManager` we can define wrap
  -- all the step below this point in a `catch` block and use `initErrorHandler`
  -- defined in the next line
  let initErrorHandler e = do
        -- Notifying the appropriate AWS Lambda endpoint about a runtime
        -- initialisation erro
        initErrorReq <- parseRequest $
                        "http://" <> envLambdaApiBase <> "/2018-06-01/runtime/init/error"
        void $ (flip httpLbs) envHttpManager initErrorReq
          { requestBody = RequestBodyLBS $ prepareErrorPayload e
          , method = "POST"
          }
        -- Re-throwing the error, which will halt executing of this runtime,
        -- because we are assuming that we can't proceed if there is an error in
        -- acquiring some import resources during the runtime initialisation
        -- process.
        throwIO e

  (flip catch) initErrorHandler $ do

    -- In each invocation, we will end-up with the same `envNextInvocationReq`,
    -- so not point doing this step repeatedely. Computing this once, and
    -- storing this in `RuntimeEnv`
    envNextInvocationReq <- parseRequest ("http://" <> envLambdaApiBase <> "/2018-06-01/runtime/invocation/next")

    -- As long as this runtime is alive, it will deal with only one _HANDLER. In
    -- other words, even though this particular ZIP file may have code for
    -- multiple Lambda Functions, the current runtime deal with only one of
    -- those functions as long as it is alive (other instances of this runtime
    -- may be dealing with other handlers)
    envHandlerName <- getEnv "_HANDLER"

    -- Acquiring a TCP connection to our remote syslog server. This is one step
    -- that can easily fail (due to network errors, firewall misconfigration,
    -- etc)
    (Just syslogConfig) <- Syslog.defaultConfig syslogHost syslogPort
    SyslogConn{_syslogConnSend=envSysLogger} <- initSyslog syslogConfig


    -- Acquiring a DB connection to our Postgres server. Again, this is one step
    -- that can easily fail (due to network errors, incorrect credentials, etc.)
    envDbConn <- (flip catch) initErrorHandler $
                 connectPostgreSQL dbConnString

    -- NOTE: This is a mind-bending let-binding which depends on Haskell's
    -- laziness. `runtimeEnv` depends on `envWrappedFunction` and vice versa. I
    -- think this is called "tying the knot" or suchlike.
    let runtimeEnv = RuntimeEnv{..}
        envWrappedFunction = invocationWrapper runtimeEnv

    pure runtimeEnv

-- This function is required to construct the JSON payload for the
-- initialisation error, as well as the invocation error
prepareErrorPayload :: SomeException
                    -> BSL.ByteString
prepareErrorPayload e = Aeson.encode $ Aeson.object
  [ "errorType" Aeson..= ("SomeException" :: String)
  , "errorDescription" Aeson..= (displayException e)
  ]

Lambda Function invocation

At the time of preparing RuntimeEnv, envWrappedFunction stores the result of partially applying invocationWrapper to envHandlerName. This ensures that, during the lifetime of this runtime, we look at envHandlerName just once to determine which underlying function to actually call. After that, we can keep calling the envWrappedFunction repeatedly, by passing it the incoming function arguments/payload. Here’s what this function does:

Logs the start & end of an invocation
Uses the dispatcher to:
- parse the incoming JSON arguments to a Haskell type,
- execute the underlying Lambda Function with the given arguments (which have now been coverted to a Haskell type/value)
- convert the Lambda Function’s results back to stringified JSON
POST the function results (as stringified JSON) back tof the invocationResponse endpoint as mentioned in step #5 of moving parts section above.
Logs any error encountered while running the underlying function to:
- our remote syslog
- the invocationError endpoint exposed by the AWS Custom Runtime interface (this detail is NOT mentioned in the moving parts section)

invocationWrapper :: RuntimeEnv                  -- ^ the common RuntimeEnv, which doesn't change during this runtime's lifetime
                  -> (Response BSL.ByteString)   -- ^ the incoming invocation arguments
                  -> IO ()
invocationWrapper RuntimeEnv{..} invocation = (flip catch) logInvocationError $ do
  logger $ "Execution started with payload " <> (toS $ responseBody invocation)
  respBody <- curriedDispatcher pload
  invocationResultReq <- parseRequest ("http://" <> envLambdaApiBase <> "/2018-06-01/runtime/invocation/" <> toS reqId <> "/response")
  void $ (flip httpLbs) envHttpManager invocationResultReq
    { requestBody=(RequestBodyLBS respBody)
    , method="POST"
    }
  logger "Execution completed"
  where
    mReqId =
      DL.lookup "Lambda-Runtime-Aws-Request-Id" (responseHeaders invocation)
    reqId =
      fromJustNote "Could not find Lambda-Runtime-Aws-Request-Id header" mReqId
    pload =
      responseBody invocation
    logger txt = do
      let reqId = fromMaybe "no-request-id" mReqId
      envSysLogger User Info $ "[" <> toS reqId <> "] " <> txt
      void $ execute envDbConn "insert into logs(msg) values(?)" (Only txt)

    -- the "dispatcher" is discussed in next section
    curriedDispatcher =
      dispatcher envHandlerName

    logInvocationError e = do
      logger $ toS $ "Error: " <> displayException e
      invocationErrorReq <- parseRequest ("http://" <> envLambdaApiBase <> "/2018-06-01/runtime/invocation/" <> toS reqId <> "/error")
      void $ (flip httpLbs) envHttpManager invocationErrorReq
        { requestBody = RequestBodyLBS $ prepareErrorPayload e  -- prepareErrorPayload is defined at the very end of this file
        , method = "POST"
        }

The “dispatcher”

Let’s discuss the dispatcher and parseArgsAndCallFunctionfunctions that that we have been literally forced to write. This boilerplate is required because of the way Haskell’s type-system works. If we were writing this in a dynamically typed language, we may have chosen to use the language’s in-built dynamic dispatch system. Such dynamic dispatch systems are usually able to call a function based on a string containing the function’s name. This is not possible in Haskell. You will be forced to write this “dispatching boilerplate” and keep it up-to-date as and when you keep adding more handlers. If you’d like to avoid manual maintenance of this boilerplate, there are two possible ideas:

Use Template Haskell
Or, try to cook up something using existential data-type and type-classes, (but seriously, just use Template Haskell)

If you are sure that you will NOT be using multiple handlers, you can choose to ignore the _HANDLER environment variable, and directly call the underlying function, without bothering with the handler and dispatcher.

dispatcher :: String              -- ^ envHandlerName
           -> BSL.ByteString      -- ^ incoming arguments (stringified JSON)
           -> IO BSL.ByteString   -- ^ function results (stringified JSON)
dispatcher handlerName pload =
 case handlerName of
   -- for the purpose of this tutorial, the actual lambda-functions are not very interesting.
   -- both of them are defined at the end of this file.
   "handler1" -> parseArgsAndCallFunction pload "Handler1Req" lambdaFunction1
   "handler2" -> parseArgsAndCallFunction pload "Handler2Req" lambdaFunction2

parseArgsAndCallFunction :: (FromJSON req, ToJSON res)
                         => BSL.ByteString       -- ^ the incoming function arguments (in stringified JSON format)
                         -> String               -- ^ the human readable name of the argument's Haskell type. This is used for error logging.
                         -> (req -> IO res)      -- ^ the actual function that takes a Haskell value and returns a Haskell value
                         -> IO BSL.ByteString    -- ^ function results (in stringified JSON format)
parseArgsAndCallFunction pload hname hfunction = either
  (\e -> throwString $ "Unable to decode payload to " <> hname <> ": " <> e)
  (\x -> Aeson.encode <$> (hfunction x))
  (Aeson.eitherDecode pload)

The underlying Lambda Functions - and their arguments

Here are the Lambda Functions that are part of this custom runtime. The functions themselves are uninteresting from the standpoint of this tutorial. Instead, notice that types of the arguments to these functions, i.e. Handler1Req and Handler2Req have deliberarly been defined in a separate exposed-module. This allows the invoking program to reuse these req/res types and increases the type-safey slightly.

lambdaFunction1 :: Handler1Req
                -> IO Handler1Res
lambdaFunction1 Handler1Req{int1, int2} = pure $ Handler1Res (int1 + int2)

lambdaFunction2 :: Handler2Req
                -> IO Handler2Res
lambdaFunction2 Handler2Req{str1, str2} = pure $ Handler2Res (str1 ++ str2)

← Managing simple Lambda Functions using Haskell

Compiling a custom Haskell runtime in Docker →