Opaleye Implementation Guide

Learning Goal

If you are considering using Opaleye in a mid-to-large scale project, there are a number of decisions that you will need to take as an Opaleye user. This chapter guides you through all such decisions. This is battle-tested (but opinionated) advice based on our experience of using Opaleye at Vacation Labs, where our product has more than 150 DB tables and more than 70,000 lines of Haskell code1

  • At the very beginning you should decide whether you like Opaleye's Query DSL, or not (ref: Opaleye Query Tutorial). See if the tradeoffs discussed in Opalaye's Query DSL - Pros & Cons are acceptable for your use-case.
  • Next, you will need to decide what kind of data-structures you'd like to use for the DB <=> Haskell bridge, and how type-safe you'd like this bridge to be. This is discussed in Pick your data-structures
  • For sizeable projects that are evolving over time (which ones aren't?), you'll need to contend with the Opaleye boilerplate required to configure various Table writeFields viewFields values. This is discussed in Auto-generate the boilerplate
  • Since Opaleye does not deal with managing database connections, connection pooling, or logging, it is highly recommended that you write your own wrappers on top of few core Opaleye functions. Finally, you should probably Formalize all of this in a custom monad

Opalaye's Query DSL - Pros & Cons

The two biggest advantages of using Opaleye's DSL, instead of writing raw SELECT queries by hand, are:

  • Composability and correctness
  • Type-safety

The biggest disadvantages of Opaleye's DSL are:

  • Very large queries, possibly resulting in a performance penalty: This, in my opinion, is a direct consequence of how Opaleye achieves composability and correctness. Every time you compose two queries (for example, using a join), or add a restriction to a query (i.e. aWHERE clause), or even add a LIMIT/OFFSET clause, the query is wrapped into another nested level of SELECT. This is done to ensure no part of the sub-query conflicts with anything else that is added later to the query. This results in very large queries and very nested queries that potentially cause performance issues as are currently being discussed on the following Github issues:
  • Using ad-hoc joins will force you to make one of one of the following choices. If you want to understand them in detail please read Opaleye Query Tutorial
    • abandon your core data-structures, and revert to tuples
    • have complicated core data-structures that depend on TypeFamilies and other type-level magic
  • Another minor disadvantage could be the dependency on using the Arrows language extension. Some people still consider it to be highly experimental.

Pick your data-structures

First, you will need to configure a someTable :: Table writeFields viewFields value for each table that you wish to interface with, via Opaleye. This will force you to pick concrete types for writeFields and viewFields.

In the example given at Teaching Opaleye your table structure we used tuples for writeFields and viewFields (and both types were same), to get the job done. While simple to understand, this approach doesn't scale to an application that has many tables (with each table having dozens of columns). In all probability you will want use record-types instead of tuples.

Now, this gives you the following choices:

  1. Use same record-types for writeFields and viewFields as discussed in Using Opaleye with simple Haskell records
  2. Use polymorphic records as discussed in Using Opaleye with polymorphic Haskell records
  3. Go one step further down the rabit-hole of type-level magic, and use polymorphic records along with type-families, as discussed in Supercharged polymorphic records with type-families

Danger

Lest you think this problem is unique to Opaleye, it isn't. This is present in any Haskell DB library that tries to provide a reasonably type-safe layer on top of SQL. The underlying reason is what I like to call Haskell's "records problem". You will face this even if you design a type-safe HTTP/JSON API where the response-JSON depends on the request-JSON. If you don't like this tradeoff, learn to live with a lot of Maybe x fields in your records and stick with Using Opaleye with simple Haskell records

Second, you will need to decide whether you'd like each table's primary key to be a different type or not - in all probability it will be some type of newtype over Int (or Int64). This will have a ripple effect on using the same newytpes for foreign-keys that reference these primary-keys. Even here, you have two choices, and you'll need to pick one:

Third, you will need to decide whether you'd like to use primitive Haskell types for "special" columns, or you'd like to define custom types for them. Few examples of such "special columns":

  • Enumerations: fulfilment_status, which can only contain pending, processing, shipped, delivered. On the Haskell side, you can use String for it, or something like:

    data FulfilmentStatus = FSPending | FSProcessing | FSShipped | FSDelivered
                            deriving (Eq, Show, Enum, Generic)
  • JSON/JSONB columns: By default, json and jsonb columns are converted to values of type Aeson.Value on the Haskell side. You might want to use a custom data-types if your JSON structures are fixed.

  • HSTORE columns: Today, with top-notch support for jsonb columns in Postgres, hstore columns aren't very popular nowadays. However, at Vacation Labs, we started off with with v9 which had very poor support for json, and IIRC, didn't even have jsonb. Therefore, we have quite a few hstore columns in our schema. If you do to, you will need to decide whether you'd like to map them to [(Text, Text)], Map Text Text, HashMap Text Text, or use a custom-type depending upon what you're storing in them ((similar to jsonb).

  • Custom Postgres types: If you've created custom Postgres types then Opaleye won't even know the standard/native Haskell type to use for them. You'll need to teach it how to map your custom Postgres type to a custom Haskell type.

All of these scenarios are discussed in Core mechanism for mapping custom Haskell types to PG types

Auto-generate the boilerplate

Once you've decided what kinds of types/data-structures you want to use on the Haskell side, you'll be staring at the daunting task of writing (and maintaining) tonnes of boilerplate code:

  • Creating someTable :: Table writeFields viewFields - one for each table in your DB schema, and keeping them up-to-date every time your schema changes.
  • Write, and maintain, FromFields, ToFields and Default FromFields typeclass instances for each custom Haskell type you wish to use with Opaleye.

It is highly recommended to save yourself this pain and auto-generate this boilerplate using one of the following methods:

  • Use opaleye-steroids (TODO: Write a note about when this library will be released!)
  • Use opaleye-gen which is similar to opaleye-steroid, but smaller and simpler (you can use this as a building block to write your own scripts)
  • If none of the above fit your requirements, consider writing your own script, or TemplateHaskell functions, to auto-generate this boilerplate.

Formalize all of this in a custom monad

It is strongyly recommend that you formalize DB access in your application by writing a custom monad, instead of coming up with different ad-hoc solutions to the following at different places in your codebase:

There are multiple approaches to building your custom monad and they strongly correlated to how you want to manage your DB connections:

Managing your database connections

Opaleye does not deal with the mechanics of maintaining connections to the Postgres database. It expects you to pass a Connection to any function that deals with the DB. Based on what we saw in Instant Gratification, this doesn't seem like too much of a hassle, but heed the warning given below (unless you absolutely know what you are doing).

Danger

Do not deploy a concurrent, multi-threaded application in production without using a connection pool properly. If you do so, different threads will end-up blocking on each other for use of the Connection, and you will get very poor concurrency and performance.

Tip

A connection pool opens multiple connections to the database when your application starts-up. Each thread in your app requests a connection from the pool (say, one connection for each incoming HTTP request). There is no time lag in this, since the connection is already established. No other thread will be allotted that connection as long as the thread is using it. Once the thread formally returns the connection back to the pool, it can be assigned to some other thread (upon request). There are other things that a connection pool does as well, but this is just a simplified explanation.

If you are using Opaleye in a concurrent and multi-threaded application (the most common being a webapp, or an HTTP API backend), you should use a connection pool. The most commonly used resource pooling library is resource-pool. Even when you are using a connection pool there are two ways to request a connection from the pool. These are discussed below:

  • One connection per thread: Consider an RDBMS-backed webapp where every HTTP request-handler is defintely going to read/write to the DB. In such a case, you might want to take a DB connection from the pool, put it your request-handler's ReaderT environment, and let that thread use that single connection for whatever DB operations it needs to do. This is how some frameworks, like Ruby on Rails, are set up. The monad and code structure that logically follows from this decision is described at Custom monad with one DB connection per thread
  • One connection per logical DB operation: On the other hand consider a case where every HTTP request-handler does not necessarily interact with the DB; or the handler does a whole bunch of stuff other than talking to the DB (and talking to the DB is a very small part of what it does). Similarly consider an ETL application (not a webapp) which wakes-up once every few hours, takes a whole lot of data from some external data-sources, inserts it into the DB in one big batch, runs some reports, and then goes to sleep. In all of these cases it doesn't make sense to hold-on to a DB connection (from a connection-pool) for the entire duration of execution. It is better to request a connection, perform the DB operations, and then release the connection as quickly as possible. The monad and code structure that logically follows from thi decision is described at Custom monad with one DB connection per logical DB operation

Dealing with database transactions

It is highly likely that there are some areas in your application where you want to execute multiple SQL statements inside a transaction block (either all should succeed, or none should succeed). Opaleye has no functions to BEGIN, COMMIT, or ROLLBACK DB transactions. However, postgresql-simple, the underlying library that Opaleye uses, does have such functions in the Database.PostgreSQL.Simple.Transaction module. Here's a simplified example of how you can use the withTransaction function from postgresql-simple along with Opalaye:

Danger

Please refer to the relevant custom monad for a complete code sample (ref: Formalize all of this in a custom monad)

-- We are using a connection pool
transferMoney :: Pool Connection -> AccountId -> AccountId -> Int -> IO ()
transferMoney dbPool fromAccount toAccount amount = 
  Data.Pool.withResource dbPool $ \conn -> do
    withTransaction conn $ do
      -- be careful to use the SAME CONNECTION as you've used in withTransaction
      updateAccountBalance conn fromAccount (negate amount)
      updateAccountBalance conn toAccount amount

updateAccountBalance :: Connection -> AccountId -> Int -> IO ()
updateAccountBalance conn acId = void $ runUpdate_ conn Update
  { uTable = ..         -- whatever
  , uUpdateWith = ..    -- whatever
  , uWhere = ..         -- whatever
  , uReturning = rCount
  }

Danger

TODO - nested DB transactions

Logging related to DB/Opaleye

In any app where the RDBMS plays an important role (as is the case with most RDBMS-backed webapps), you should probably set-up some basic logging around SQL statements for:

  1. investigating performance bottlenecks in production, i.e. slow-queries. This can also be done at the Postgres level, but sometimes it is simpler to see query execution times along with your application logs so that you can correlate the query being fired with the incoming request.
  2. logging complete queries being issued, in case you need them for auditing or low-level debugging
  3. logging only those queries that threw an error

Since Opaleye has no in-built support for logging, this is possible only if you write (and use) wrappers on top of the four core functions provided by Opaleye, i.e. runSelect, runInsert_, runUpdate_, and runDelete_. Here's a short example:

Danger

Please refer to the relevant custom monad for a complete code sample (ref: Formalize all of this in a custom monad)

-- picking a different name so that it doesn't clash with runSelect and 
-- forces us to use qualified imports
dbSelect :: (Default FromFields fields haskells)
         => FastLogger
         -- ^ From the fast-logger library. You may be using a different
         -- logging library and may need to change this.
         -> Connection
         -> Select fields
         -> IO [haskells]
dbSelect logger conn sel = do
  startTime <- getCurrentTime
  let action = logSql logger startTime (runSelect conn sel)
      handler = logSqlError logger startTime conn sel
  catch action handler
  where
    logSql = do
      result <- runSelect conn sel
      endTime <- getCurrentTime
      logger $ toLogStr $ "SELECT " <> 

Danger

TODO - complete the code example.


  1. In September, 2020↩︎