Custom AWS Lambda Runtime for Haskell
At Vacation Labs, we chose to write a custom runtime, because:
- We wanted to log everything in our centralised syslog server, instead of Cloudfront
- It is not that hard to write a custom runtime
Moving parts in an AWS Lambda Custom Runtime
- Your ZIP file should contain an executable file called
bootstrap
. While it may optionally contain other files as well, for example, supporting libraries, other executables, etc, thebootstrap
executable must be present. - When your Lambda Function is invoked, AWS is going to run your
bootstrap
file, but you will not receive any function arguments yet (keep reading to find out why) - Your runtime will need to take note of two important environment variables -
AWS_LAMBDA_RUNTIME_API
and_HANDLER
- After you have initialised your custom runtime (eg. acquired common resources, like log files, DB connections, etc), and your runtime is ready to “serve” the next function invocation, it has to make an HTTP call to
http://{AWS_LAMBDA_RUNTIME_API}/runtime/invocation/next
. - Here’s what happens next (in a simplified manner):
- Make HTTP GET call to
http://{AWS_LAMBDA_RUNTIME_API}/runtime/invocation/next
- Get Lambda Function arguments as resposne body (in stringified JSON format).
- Take note of
Lambda-Runtime-Aws-Request-Id
response header. - Based on the value of
_HANDLER
, call the actual function in your Haskell codebase. You might need to parse the stringified JSON arguments into a Haskell value/record. Remember from earlier, that the same ZIP file can contain code for multiple Lambda Functions, distiguished by the “handler” associated with each Lambda Function. - Communicate the response of the Lambda Function invocation back to AWS by making an HTTP POST call to
http://{AWS_LAMBDA_RUNTIME_API}/runtime/invocation/{AwsRequestId}/response
. TheAwsRequestId
needs to be the same as obtained from step #3 above. - Repeat step #1 in an infinite loop. When AWS has had enough of your runtime, it will terminate it automatically.
- Make HTTP GET call to
Preliminaries
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE NamedFieldPuns #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Main where
import Network.HTTP.Client
import System.Environment
import Control.Concurrent
import Network.HTTP.Types
import Data.List as DL
import Safe (fromJustNote)
import Data.Aeson as Aeson
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as BSL
import Control.Monad (forever)
import Data.String.Conv (toS)
import Data.Functor (void)
import Data.Text (Text)
import System.Posix.Syslog.TCP as Syslog
import Data.Maybe (fromMaybe)
import Data.Either (either)
import UnliftIO.Exception (throwString, SomeException, catch, displayException, throwIO)
import HandlerTypes
import Control.Monad (forM_)
import Database.PostgreSQL.Simple
Credentials
In our custom runtime we are:
- logging to a remote syslog server (because that’s one of the core requirements we had at Vacation Labs), and
- also connecting to a remote Postgres server for the purpose of demonstrating how non-standard library dependencies work when building+packaging for AWS Lambda.
Therefore, you’ll have to set the following variables for this runtime to actually work. You can use the following free services:
- Papertrail for a free remote syslog server
- Heroku Postgres Add-on for a remotely accessible PG database server
syslogHost :: String
syslogHost = Prelude.error "You forgot to set syslogHost"
syslogPort :: String
syslogPort = Prelude.error "You forgot to set syslogPort"
dbConnString :: BS.ByteString
dbConnString = Prelude.error "You forgot to set dbConnString"
Starting with the runtime loop
The runtime loop pretty much does what is listed in the moving parts section above, i.e. it first initialises the runtime, then repeatedly calls the nextInvocation
HTTP API endpoint to get function arguments (JSON payload), and runs the underlying Lambda Function for each incoming JSON payload.
main :: IO ()
main = do
runtimeEnv@RuntimeEnv{envNextInvocationReq, envHttpManager, envWrappedFunction, envSysLogger} <- prepareRuntimeEnvironment
envSysLogger User Info "Runtime initialisation complete"
forever $ do
res <- httpLbs envNextInvocationReq envHttpManager
envWrappedFunction res
Environment/invariants throughout the lifetime of this runtime
Now, let’s zoom into runtime initialistion, which in our case is initialisation of the RuntimeEnv
data-type given below. All the stuff in the RuntimeEnv
record does not change as long as a particular instance of the runtime is running. This is where we acquire resources that will be used across multiple invocations. There are two notable things in this function:
- the initialisation error handler (which is separate from the invocation error handler)
envWrappedFunction
andinvocationWrapper
which are disucced in Lambda Function Invocation
data RuntimeEnv = RuntimeEnv
{ envLambdaApiBase :: !String
, envHandlerName :: !String
, envHttpManager :: !Manager
, envSysLogger :: !SyslogFn
, envNextInvocationReq :: !Request
, envWrappedFunction :: (Response BSL.ByteString -> IO ())
, envDbConn :: !Connection
}
prepareRuntimeEnvironment :: IO RuntimeEnv
prepareRuntimeEnvironment = do
envLambdaApiBase <- getEnv "AWS_LAMBDA_RUNTIME_API"
-- Note, we have purposely put a 2 minute timeout because the AWS docs
-- specifically mention that the runtime may be kept alive between Lamdba
-- Function invocations, so it might take more than a few seconds for this API
-- call to complete.
envHttpManager <- newManager defaultManagerSettings {managerResponseTimeout = (responseTimeoutMicro $ 1000000 * 120)}
-- Now that we have `envLambdaApiBase` and `envHttpManager` we can define wrap
-- all the step below this point in a `catch` block and use `initErrorHandler`
-- defined in the next line
let initErrorHandler e = do
-- Notifying the appropriate AWS Lambda endpoint about a runtime
-- initialisation erro
initErrorReq <- parseRequest $
"http://" <> envLambdaApiBase <> "/2018-06-01/runtime/init/error"
void $ (flip httpLbs) envHttpManager initErrorReq
{ requestBody = RequestBodyLBS $ prepareErrorPayload e
, method = "POST"
}
-- Re-throwing the error, which will halt executing of this runtime,
-- because we are assuming that we can't proceed if there is an error in
-- acquiring some import resources during the runtime initialisation
-- process.
throwIO e
(flip catch) initErrorHandler $ do
-- In each invocation, we will end-up with the same `envNextInvocationReq`,
-- so not point doing this step repeatedely. Computing this once, and
-- storing this in `RuntimeEnv`
envNextInvocationReq <- parseRequest ("http://" <> envLambdaApiBase <> "/2018-06-01/runtime/invocation/next")
-- As long as this runtime is alive, it will deal with only one _HANDLER. In
-- other words, even though this particular ZIP file may have code for
-- multiple Lambda Functions, the current runtime deal with only one of
-- those functions as long as it is alive (other instances of this runtime
-- may be dealing with other handlers)
envHandlerName <- getEnv "_HANDLER"
-- Acquiring a TCP connection to our remote syslog server. This is one step
-- that can easily fail (due to network errors, firewall misconfigration,
-- etc)
(Just syslogConfig) <- Syslog.defaultConfig syslogHost syslogPort
SyslogConn{_syslogConnSend=envSysLogger} <- initSyslog syslogConfig
-- Acquiring a DB connection to our Postgres server. Again, this is one step
-- that can easily fail (due to network errors, incorrect credentials, etc.)
envDbConn <- (flip catch) initErrorHandler $
connectPostgreSQL dbConnString
-- NOTE: This is a mind-bending let-binding which depends on Haskell's
-- laziness. `runtimeEnv` depends on `envWrappedFunction` and vice versa. I
-- think this is called "tying the knot" or suchlike.
let runtimeEnv = RuntimeEnv{..}
envWrappedFunction = invocationWrapper runtimeEnv
pure runtimeEnv
-- This function is required to construct the JSON payload for the
-- initialisation error, as well as the invocation error
prepareErrorPayload :: SomeException
-> BSL.ByteString
prepareErrorPayload e = Aeson.encode $ Aeson.object
[ "errorType" Aeson..= ("SomeException" :: String)
, "errorDescription" Aeson..= (displayException e)
]
Lambda Function invocation
At the time of preparing RuntimeEnv
, envWrappedFunction
stores the result of partially applying invocationWrapper
to envHandlerName
. This ensures that, during the lifetime of this runtime, we look at envHandlerName
just once to determine which underlying function to actually call. After that, we can keep calling the envWrappedFunction
repeatedly, by passing it the incoming function arguments/payload. Here’s what this function does:
- Logs the start & end of an invocation
- Uses the
dispatcher
to:- parse the incoming JSON arguments to a Haskell type,
- execute the underlying Lambda Function with the given arguments (which have now been coverted to a Haskell type/value)
- convert the Lambda Function’s results back to stringified JSON
- POST the function results (as stringified JSON) back tof the
invocationResponse
endpoint as mentioned in step #5 of moving parts section above. - Logs any error encountered while running the underlying function to:
- our remote syslog
- the
invocationError
endpoint exposed by the AWS Custom Runtime interface (this detail is NOT mentioned in the moving parts section)
invocationWrapper :: RuntimeEnv -- ^ the common RuntimeEnv, which doesn't change during this runtime's lifetime
-> (Response BSL.ByteString) -- ^ the incoming invocation arguments
-> IO ()
invocationWrapper RuntimeEnv{..} invocation = (flip catch) logInvocationError $ do
logger $ "Execution started with payload " <> (toS $ responseBody invocation)
respBody <- curriedDispatcher pload
invocationResultReq <- parseRequest ("http://" <> envLambdaApiBase <> "/2018-06-01/runtime/invocation/" <> toS reqId <> "/response")
void $ (flip httpLbs) envHttpManager invocationResultReq
{ requestBody=(RequestBodyLBS respBody)
, method="POST"
}
logger "Execution completed"
where
mReqId =
DL.lookup "Lambda-Runtime-Aws-Request-Id" (responseHeaders invocation)
reqId =
fromJustNote "Could not find Lambda-Runtime-Aws-Request-Id header" mReqId
pload =
responseBody invocation
logger txt = do
let reqId = fromMaybe "no-request-id" mReqId
envSysLogger User Info $ "[" <> toS reqId <> "] " <> txt
void $ execute envDbConn "insert into logs(msg) values(?)" (Only txt)
-- the "dispatcher" is discussed in next section
curriedDispatcher =
dispatcher envHandlerName
logInvocationError e = do
logger $ toS $ "Error: " <> displayException e
invocationErrorReq <- parseRequest ("http://" <> envLambdaApiBase <> "/2018-06-01/runtime/invocation/" <> toS reqId <> "/error")
void $ (flip httpLbs) envHttpManager invocationErrorReq
{ requestBody = RequestBodyLBS $ prepareErrorPayload e -- prepareErrorPayload is defined at the very end of this file
, method = "POST"
}
The “dispatcher”
Let’s discuss the dispatcher
and parseArgsAndCallFunction
functions that that we have been literally forced to write. This boilerplate is required because of the way Haskell’s type-system works. If we were writing this in a dynamically typed language, we may have chosen to use the language’s in-built dynamic dispatch system. Such dynamic dispatch systems are usually able to call a function based on a string containing the function’s name. This is not possible in Haskell. You will be forced to write this “dispatching boilerplate” and keep it up-to-date as and when you keep adding more handlers. If you’d like to avoid manual maintenance of this boilerplate, there are two possible ideas:
- Use Template Haskell
- Or, try to cook up something using existential data-type and type-classes, (but seriously, just use Template Haskell)
If you are sure that you will NOT be using multiple handlers, you can choose to ignore the _HANDLER
environment variable, and directly call the underlying function, without bothering with the handler and dispatcher.
dispatcher :: String -- ^ envHandlerName
-> BSL.ByteString -- ^ incoming arguments (stringified JSON)
-> IO BSL.ByteString -- ^ function results (stringified JSON)
dispatcher handlerName pload =
case handlerName of
-- for the purpose of this tutorial, the actual lambda-functions are not very interesting.
-- both of them are defined at the end of this file.
"handler1" -> parseArgsAndCallFunction pload "Handler1Req" lambdaFunction1
"handler2" -> parseArgsAndCallFunction pload "Handler2Req" lambdaFunction2
parseArgsAndCallFunction :: (FromJSON req, ToJSON res)
=> BSL.ByteString -- ^ the incoming function arguments (in stringified JSON format)
-> String -- ^ the human readable name of the argument's Haskell type. This is used for error logging.
-> (req -> IO res) -- ^ the actual function that takes a Haskell value and returns a Haskell value
-> IO BSL.ByteString -- ^ function results (in stringified JSON format)
parseArgsAndCallFunction pload hname hfunction = either
(\e -> throwString $ "Unable to decode payload to " <> hname <> ": " <> e)
(\x -> Aeson.encode <$> (hfunction x))
(Aeson.eitherDecode pload)
The underlying Lambda Functions - and their arguments
Here are the Lambda Functions that are part of this custom runtime. The functions themselves are uninteresting from the standpoint of this tutorial. Instead, notice that types of the arguments to these functions, i.e. Handler1Req
and Handler2Req
have deliberarly been defined in a separate exposed-module
. This allows the invoking program to reuse these req/res types and increases the type-safey slightly.
lambdaFunction1 :: Handler1Req
-> IO Handler1Res
lambdaFunction1 Handler1Req{int1, int2} = pure $ Handler1Res (int1 + int2)
lambdaFunction2 :: Handler2Req
-> IO Handler2Res
lambdaFunction2 Handler2Req{str1, str2} = pure $ Handler2Res (str1 ++ str2)