Link

Custom AWS Lambda Runtime for Haskell

At Vacation Labs, we chose to write a custom runtime, because:

  • We wanted to log everything in our centralised syslog server, instead of Cloudfront
  • It is not that hard to write a custom runtime
If you do not have such unique requirements, do not roll your own runtime. Write your Lambda Function in Haskell and use AWS Lambda Haskell Runtime written by The Agile Monkeys, to package and deploy your Haskell code. Read through this section as a fun learning exercise.

Moving parts in an AWS Lambda Custom Runtime

Important: The most up-to-date documentation for Custom Runtimes is available with AWS itself. What follows next, is a quick summary of the following pages taken from the official AWS documentation: Custom AWS Lambda Runtimes, AWS Lambda Runtime Interface, and Tutorial – Publishing a Custom Runtime
  • Your ZIP file should contain an executable file called bootstrap. While it may optionally contain other files as well, for example, supporting libraries, other executables, etc, the bootstrap executable must be present.
  • When your Lambda Function is invoked, AWS is going to run your bootstrap file, but you will not receive any function arguments yet (keep reading to find out why)
  • Your runtime will need to take note of two important environment variables - AWS_LAMBDA_RUNTIME_API and _HANDLER
  • After you have initialised your custom runtime (eg. acquired common resources, like log files, DB connections, etc), and your runtime is ready to “serve” the next function invocation, it has to make an HTTP call to http://{AWS_LAMBDA_RUNTIME_API}/runtime/invocation/next.
  • Here’s what happens next (in a simplified manner):
    1. Make HTTP GET call to http://{AWS_LAMBDA_RUNTIME_API}/runtime/invocation/next
    2. Get Lambda Function arguments as resposne body (in stringified JSON format).
    3. Take note of Lambda-Runtime-Aws-Request-Id response header.
    4. Based on the value of _HANDLER, call the actual function in your Haskell codebase. You might need to parse the stringified JSON arguments into a Haskell value/record. Remember from earlier, that the same ZIP file can contain code for multiple Lambda Functions, distiguished by the “handler” associated with each Lambda Function.
    5. Communicate the response of the Lambda Function invocation back to AWS by making an HTTP POST call to http://{AWS_LAMBDA_RUNTIME_API}/runtime/invocation/{AwsRequestId}/response. The AwsRequestId needs to be the same as obtained from step #3 above.
    6. Repeat step #1 in an infinite loop. When AWS has had enough of your runtime, it will terminate it automatically.

Preliminaries

Credentials

In our custom runtime we are:

  1. logging to a remote syslog server (because that’s one of the core requirements we had at Vacation Labs), and
  2. also connecting to a remote Postgres server for the purpose of demonstrating how non-standard library dependencies work when building+packaging for AWS Lambda.

Therefore, you’ll have to set the following variables for this runtime to actually work. You can use the following free services:

Starting with the runtime loop

The runtime loop pretty much does what is listed in the moving parts section above, i.e. it first initialises the runtime, then repeatedly calls the nextInvocation HTTP API endpoint to get function arguments (JSON payload), and runs the underlying Lambda Function for each incoming JSON payload.

Environment/invariants throughout the lifetime of this runtime

Now, let’s zoom into runtime initialistion, which in our case is initialisation of the RuntimeEnv data-type given below. All the stuff in the RuntimeEnv record does not change as long as a particular instance of the runtime is running. This is where we acquire resources that will be used across multiple invocations. There are two notable things in this function:

  • the initialisation error handler (which is separate from the invocation error handler)
  • envWrappedFunction and invocationWrapper which are disucced in Lambda Function Invocation
data RuntimeEnv = RuntimeEnv
  { envLambdaApiBase :: !String
  , envHandlerName :: !String
  , envHttpManager :: !Manager
  , envSysLogger :: !SyslogFn
  , envNextInvocationReq :: !Request
  , envWrappedFunction :: (Response BSL.ByteString -> IO ())
  , envDbConn :: !Connection
  }

prepareRuntimeEnvironment :: IO RuntimeEnv
prepareRuntimeEnvironment = do
  envLambdaApiBase <- getEnv "AWS_LAMBDA_RUNTIME_API"

  -- Note, we have purposely put a 2 minute timeout because the AWS docs
  -- specifically mention that the runtime may be kept alive between Lamdba
  -- Function invocations, so it might take more than a few seconds for this API
  -- call to complete.
  envHttpManager <- newManager defaultManagerSettings {managerResponseTimeout = (responseTimeoutMicro $ 1000000 * 120)}

  -- Now that we have `envLambdaApiBase` and `envHttpManager` we can define wrap
  -- all the step below this point in a `catch` block and use `initErrorHandler`
  -- defined in the next line
  let initErrorHandler e = do
        -- Notifying the appropriate AWS Lambda endpoint about a runtime
        -- initialisation erro
        initErrorReq <- parseRequest $
                        "http://" <> envLambdaApiBase <> "/2018-06-01/runtime/init/error"
        void $ (flip httpLbs) envHttpManager initErrorReq
          { requestBody = RequestBodyLBS $ prepareErrorPayload e
          , method = "POST"
          }
        -- Re-throwing the error, which will halt executing of this runtime,
        -- because we are assuming that we can't proceed if there is an error in
        -- acquiring some import resources during the runtime initialisation
        -- process.
        throwIO e

  (flip catch) initErrorHandler $ do

    -- In each invocation, we will end-up with the same `envNextInvocationReq`,
    -- so not point doing this step repeatedely. Computing this once, and
    -- storing this in `RuntimeEnv`
    envNextInvocationReq <- parseRequest ("http://" <> envLambdaApiBase <> "/2018-06-01/runtime/invocation/next")

    -- As long as this runtime is alive, it will deal with only one _HANDLER. In
    -- other words, even though this particular ZIP file may have code for
    -- multiple Lambda Functions, the current runtime deal with only one of
    -- those functions as long as it is alive (other instances of this runtime
    -- may be dealing with other handlers)
    envHandlerName <- getEnv "_HANDLER"

    -- Acquiring a TCP connection to our remote syslog server. This is one step
    -- that can easily fail (due to network errors, firewall misconfigration,
    -- etc)
    (Just syslogConfig) <- Syslog.defaultConfig syslogHost syslogPort
    SyslogConn{_syslogConnSend=envSysLogger} <- initSyslog syslogConfig


    -- Acquiring a DB connection to our Postgres server. Again, this is one step
    -- that can easily fail (due to network errors, incorrect credentials, etc.)
    envDbConn <- (flip catch) initErrorHandler $
                 connectPostgreSQL dbConnString

    -- NOTE: This is a mind-bending let-binding which depends on Haskell's
    -- laziness. `runtimeEnv` depends on `envWrappedFunction` and vice versa. I
    -- think this is called "tying the knot" or suchlike.
    let runtimeEnv = RuntimeEnv{..}
        envWrappedFunction = invocationWrapper runtimeEnv

    pure runtimeEnv

-- This function is required to construct the JSON payload for the
-- initialisation error, as well as the invocation error
prepareErrorPayload :: SomeException
                    -> BSL.ByteString
prepareErrorPayload e = Aeson.encode $ Aeson.object
  [ "errorType" Aeson..= ("SomeException" :: String)
  , "errorDescription" Aeson..= (displayException e)
  ]

Lambda Function invocation

At the time of preparing RuntimeEnv, envWrappedFunction stores the result of partially applying invocationWrapper to envHandlerName. This ensures that, during the lifetime of this runtime, we look at envHandlerName just once to determine which underlying function to actually call. After that, we can keep calling the envWrappedFunction repeatedly, by passing it the incoming function arguments/payload. Here’s what this function does:

  • Logs the start & end of an invocation
  • Uses the dispatcher to:
    • parse the incoming JSON arguments to a Haskell type,
    • execute the underlying Lambda Function with the given arguments (which have now been coverted to a Haskell type/value)
    • convert the Lambda Function’s results back to stringified JSON
  • POST the function results (as stringified JSON) back tof the invocationResponse endpoint as mentioned in step #5 of moving parts section above.
  • Logs any error encountered while running the underlying function to:
    • our remote syslog
    • the invocationError endpoint exposed by the AWS Custom Runtime interface (this detail is NOT mentioned in the moving parts section)
invocationWrapper :: RuntimeEnv                  -- ^ the common RuntimeEnv, which doesn't change during this runtime's lifetime
                  -> (Response BSL.ByteString)   -- ^ the incoming invocation arguments
                  -> IO ()
invocationWrapper RuntimeEnv{..} invocation = (flip catch) logInvocationError $ do
  logger $ "Execution started with payload " <> (toS $ responseBody invocation)
  respBody <- curriedDispatcher pload
  invocationResultReq <- parseRequest ("http://" <> envLambdaApiBase <> "/2018-06-01/runtime/invocation/" <> toS reqId <> "/response")
  void $ (flip httpLbs) envHttpManager invocationResultReq
    { requestBody=(RequestBodyLBS respBody)
    , method="POST"
    }
  logger "Execution completed"
  where
    mReqId =
      DL.lookup "Lambda-Runtime-Aws-Request-Id" (responseHeaders invocation)
    reqId =
      fromJustNote "Could not find Lambda-Runtime-Aws-Request-Id header" mReqId
    pload =
      responseBody invocation
    logger txt = do
      let reqId = fromMaybe "no-request-id" mReqId
      envSysLogger User Info $ "[" <> toS reqId <> "] " <> txt
      void $ execute envDbConn "insert into logs(msg) values(?)" (Only txt)

    -- the "dispatcher" is discussed in next section
    curriedDispatcher =
      dispatcher envHandlerName

    logInvocationError e = do
      logger $ toS $ "Error: " <> displayException e
      invocationErrorReq <- parseRequest ("http://" <> envLambdaApiBase <> "/2018-06-01/runtime/invocation/" <> toS reqId <> "/error")
      void $ (flip httpLbs) envHttpManager invocationErrorReq
        { requestBody = RequestBodyLBS $ prepareErrorPayload e  -- prepareErrorPayload is defined at the very end of this file
        , method = "POST"
        }

The “dispatcher”

Let’s discuss the dispatcher and parseArgsAndCallFunctionfunctions that that we have been literally forced to write. This boilerplate is required because of the way Haskell’s type-system works. If we were writing this in a dynamically typed language, we may have chosen to use the language’s in-built dynamic dispatch system. Such dynamic dispatch systems are usually able to call a function based on a string containing the function’s name. This is not possible in Haskell. You will be forced to write this “dispatching boilerplate” and keep it up-to-date as and when you keep adding more handlers. If you’d like to avoid manual maintenance of this boilerplate, there are two possible ideas:

  • Use Template Haskell
  • Or, try to cook up something using existential data-type and type-classes, (but seriously, just use Template Haskell)

If you are sure that you will NOT be using multiple handlers, you can choose to ignore the _HANDLER environment variable, and directly call the underlying function, without bothering with the handler and dispatcher.

The underlying Lambda Functions - and their arguments

Here are the Lambda Functions that are part of this custom runtime. The functions themselves are uninteresting from the standpoint of this tutorial. Instead, notice that types of the arguments to these functions, i.e. Handler1Req and Handler2Req have deliberarly been defined in a separate exposed-module. This allows the invoking program to reuse these req/res types and increases the type-safey slightly.