Haskell Job Queues: An Ultimate Guide

Last updated: 30 Apr, 2020

Motivation
A note about storage backends for job-queues
Feature-set of odd-jobs
Discussing alternatives to OddJobs
Breakdown of packages
Footnotes

Motivation

This first version of this document was a text file where I was jotting down notes as I was searching for a job-queue in Haskell, to be deployed at Vacation Labs. I couldn’t find anything that fit our requirementss, which were otherwise being fulfilled by delayed_job on the Rails side of things.

This led to the development of odd-jobs, which has been used in production since 2016 at Vacation Lab, and is now in the process of being open-sourced.

Now, this document serves two purposes:

A feature-comparison between odd-jobs and other, similar, libraries
A quick guide for other people searching for similar job-queues

If you are searching for a job-queue, please read the secion on storage backends before making a decision.

A note about storage backends for job-queues

Even though Haskell has a great support for concurrency, do NOT simply forkIO to create threads for long-running (or retryable) actions/computations. It seems like the simplest thing to do, but it doesn’t address a number of real-life concerns:

What happes when the main Haskell process itself crashes (most likely on some FFI call, or a lazy accumulation of thunks causes it to run out of memory)?
What happens to all those in-memory threads if you need to deploy a new version of your app?
What happens if your server cannot handle the load and it needs to offload expensive actions/computations to some other machine?

The recommended way, is to offload such tasks to a queue with persistent storage, and then fork multiple threads to process each item in the queue. This is precisely what job-queues do.

However, please do not over-engineer your queue storage-backend. One doesn’t need Kafka, RabbitMQ, ActiveMQ, etc. for most use-cases. In all probability your app is already using an RDBMS. It’s perfectly capable of being used as a job-queue.

No, it is not a bad idea to use an RDBMS as a job/message queue. Postgres has been used to run 10,000 jobs per second. By using existing components already in production, you’ll be saving yourself a lot of dev-ops pain (and cost as well)
- This also allows you to enqueue jobs in the same DB transaction as the larger action, thus simplifying error-handling and transaction rollbacks.
Once you start hitting the scalability limits of an RDBMS, consider using something like Redis. If your application is already achieving some reasonable scale, in all probability, you’ve already introduced Redis for caching. Use it for the job-queue, we well.
- As a caveat, don’t introduce Redis just for the job-queue – your RDMBS is good enough.
- By introducing Redis you’re giving up the transactional guarantees mentioned earlier.
If your scalability requirements are still not met, only then, should you look at the highly (and horizontally) scalable messaging systems.

Feature-set of odd-jobs

Library	odd-jobs
Backend	PostgreSQL
Storage format	JSON
Logging	Structured logging (with default functions for simple text logging)
Execution timeout	Yes
Schedule jobs for execution in the future	Yes
Cron-like job-scheduling	No
Job retries	Yes
Graceful shutdown ¹	Yes
Concurrency control	Yes - via ConcurrencyControl
Multiple queues	WIP
Documentation	Hopefully, good enough :-)
CLI Library	Yes
Web/admin UI	WIP
Used in production?	Used in production at Vacation Labs since 2016

Discussing alternatives to OddJobs

hworker: A reliable at-least-once job queue built on top of redis

Library	hworker
Backend	Redis
Storage format	JSON
Logging	Plain-text only
Execution timeout	No
Schedule jobs for execution in the future	No
Cron-like job-scheduling	No
Job retries	Yes. Interestingly, via ExceptionBehaviour it allows you to configure what happens when an unhandled exception is thrown.
Graceful shutdown	No
Concurrency control	No
Multiple queues	Yes
Documentation	Adequate
CLI Library	No - you will have to write your own
Web/admin UI	No
Used in production?	Unknown
Other comments	(a) Based on the documentation alone, seems fairly easy to use (b) Using a type-class for the job runner can result in some challenges in setting up the application-specific environment for running the job. (c) The purpose of the state parameter being passed around is not very clear, and neither is it explained in the docs. One possible explanation is that it has been introduced to deal with the challenges that a type-class based approach brings (as mentioned in the previous point).
Recommendation	Seems like a reasonable solution if you want to use Redis as a backend for your job queue.

jobqueue: A job queue library

Library	jobqueue
Backend	Configurable, but it seems Apache Zookeper and Sqlite are the only ones available.
Storage format	Custom. Your job data-type needs to implement an instnace of the Unit type-class.
Logging	Plain-text
Execution timeout	Probably no. Couldn’t find anything related to job-timeouts in the docs.
Schedule jobs for future execution	Yes
Cron-like job-scheduling	No
Job retries	Yes
Graceful shutdown	No
Concurrency control	No - couldn’t find anything in the docs
Multiple queues	Yes
Documentation	Adequate
CLI library	None. You will have to write your own
Web/Admin UI	None
Being used in production?	Unknown
Other comments	(a) The concept of “multi-step” jobs might introduce complexity which is not required for most use-cases. (b) Seems to be thought-through; and the code is well-structured. (c) Has a pluggable backend. Might be possible to write a Postgres backend as well, but, currently, it does not exist.
Recommendation	Reasonable solution if you can write your own backend for queue storage. I would not recommend Apache Zookeper or Sqlite - the only two backends available out-of-the-box, as of today.

mongodb-queue: message queue using MongoDB

Library	mongodb-queue
Backend	MongoDB
Storage format	Document/BSON (this is the equivalnet of JSON in the MongoDB world.
Logging	Plain-text
Execution timeout	Probably not. Could not find anything in the docs.
Cron-like job scheduling	No
Job retries	Probably not. Could not find anything in the docs.
Graceful shutdown	No
Concurrency control	No
Multiple queues	Yes
CLI Library	No - you will have to write your own.
Web/Admin UI	No
Used in production?	Unknown
Other comments	(a) Documentation could be better. (b) Actual usage examples are missing. (c) Initial blog post announcing the launch throws a 404. (d) Based on the documentation it seems easy to setup and use. (e) Seems to be unmaintained (Github repo is archived, and announcement blog post is throwing 404)
Recommendation	Would not recommend using it in any serious project.

postgresql-simple-queue: A PostgreSQL backed queue

Library	postgresql-simple-queue
Backend	PostgreSQL
Storage format	JSON
Logging	No in-built support for logging
Execution timeout	No
Schedule jobs for future execution	No
Cron-like job-scheduling	No
Job retries	Yes
Graceful shutdown	No
Concurrency control	No
Multiple queues	No, but using building-blocks from this library, you can write something that supports multiple queues.
CLI library	No - will have to write your own.
Web/Admin UI	No
Used in production?	Unknown
Other comments	(a) Documentation is incomplete and buggy. (b) Documentation is missing examples. (c) However, the library is very simple and you can probably read the source-code and fill-in the missing bits yourself. (d) I would not call it a fully-functioning job-queue, but a simple library to quickly build your own job-queue. (e) Personally, feels too under-powered to me.
Recommendation	Use with caution. Foundations are similar to `odd-jobs` (and delayed-job), but a lot of required features are missing

redis-job-queue: Simple priority job queue backed by Redis

Library	redis-job-queue
Backend	Redis
Storage format	ByteString, but the example code actually uses JSON.
Logging	No built-in support for loggin
Execution timeout	No
Schedule jobs for future execution	No
Cron-like job-scheduling	No
Job retries	No
Graceful shutdown	No
Concurrency control	No
Multiple queues	Yes
CLI library	None. You will have to write your own.
Web/Admin UI	None
Being used in production	Unknown
Other comments	(a) Documentation is non-existent, except for one example. (b) Although to be honest the library is less than 100 lines of code and you can probably read the entire source code itself! (c) Too simple and doesn’t address a number of real-life concerns around job-queues.
Recommendation	Would not recommend – you will end-up writing a lot of stuff from scratch.

yesod-job-queue: Background jobs library for Yesod

Library	yesod-job-queue
Backend	Redis
Storage format	String. It seems that the `show` instance of your associated JobType is used.
Logging	No. Only text-based logging is available (which is built into the Yeosd framework).
Execution timeout	No
Schedule jobs for future execution	Yes
Cron-like job-scheduling	Yes
Job retries	Probably not. Could not find anything in the docs.
Graceful shutdown	No
Concurrency control	Yes - can control the maximum number of concurrent threads via threadNumber
Multiple queues	Probably not. And it seems that setting up multiple queues won’t be easy. The way the YesodJobQueue type-class has been setup, it seems like each Yesod “subsite” can have exactly one job-queue.
CLI library	None. You will have to write your own.
Web/Admin UI	Yes. In fact, the README even has has a screenshot of the admin UI.
Being used in production	Unknown
Other comments	(a) Detailed API docs are NOT available. However, the project README has an adequate usage example. (b) Addition of periodic cron-like execution of jobs is a nice feature.
Recommendation	Seems like a good option if you’re already using Yesod (and Redis). Outside of Yesod, it will be too cumbersome to use.

Breakdown of packages

At the time of writing this document there were 88 packages that showed up if you searched for ‘queue’ on hackage:

6 packages could be considered as reasonable alternatives to odd-jobs
24 packages can help achieve the same results but are based on unusual, complicated, or proprietary data-stores.
58 packages are irrelevant to the task at hand. These packages are either in-memory queues/data-structures, network streaming libraries, or absolutely irrelevant ²
- One of them is for managing a toilet queue - a very noble endeavour indeed!

Haskell job queues that could be alternatives to OddJobs

I could find 6 packages that offer similar functionality to odd-jobs and could be considered as reasonable alternatives for achieving similar results:

hworker: A reliable at-least-once job queue built on top of redis.
jobqueue: A job queue library
mongodb-queue: message queue using MongoDB
postgresql-simple-queue: A PostgreSQL backed queue
redis-job-queue: Simple priority job queue backed by Redis.
yesod-job-queue: Background jobs library for Yesod.

Job queues (or task queues) that are dependent on an unusual (or proprietary) data-store

24 packages in this category

Note: These are considered unusual data-stores from the point-of-view of a typical mid-scale application. If your application needs massive scalability, or “multiple nines” of uptime, these choices may no longer be unusual. Similarly, if you are already on AWS and don’t mind paying extra for SQS, it may not be unusual for you.

3 packages related to ZeroMQ:
- zeromq-haskell: Bindings to ZeroMQ 2.1.x
- zeromq3-haskell: Bindings to ZeroMQ 3.x
- zeromq4-haskell: Bindings to ZeroMQ 4.x
4 packages related to Stomp. Technically, Stomp seems to be (and I may be wrong here) a messaging protocol, which any message-queue may implement (like RabbitMQ, ActiveMQ, etc). So, technically, even a DB-backed queue may imlement the Stomp protocol. However, in my cursory research, I did not find any DB-backed queue that implemented Stomp.
- stomp-conduit: Stompl Conduit Client
- stomp-patterns: Stompl MOM Stomp Patterns
- stomp-queue: Stompl Client Library
- stompl: Stomp Parser and Utilities
1 package for Azure ServiceBus:
- azure-servicebus: Haskell wrapper over Microsoft Azure ServiceBus REST API
2 packages for Amazon/AWS:
- amazon-emailerhworker-ses: A queue daemon for Amazon’s SES with a PostgreSQL table as a queue.
- amazonka-sqs: Amazon Simple Queue Service SDK.
- powerqueue-sqs: A Amazon SQS backend for powerqueue
1 package for AMQP:
- amqp-utils: Generic Haskell AMQP Consumer
1 package for IronMQ:
- iron-mq: Iron.IO message queueing client library
1 package for NATS messaging system:
- nats-queue: Haskell API for NATS messaging system
1 package for Google TaskQueue:
- gogol-taskqueue: Google TaskQueue SDK.
1 package for LevelDB: This is
- powerqueue-levelmem: A high performance in memory and LevelDB backend for powerqueue
1 package for CCTools WorkQueue: This seems like a job-queue used in acadamic environments for processing very large volumes of data, were the processing job is split across multiple machines (sounds very much like Hadoop to me).
- cctools-workqueue: High-level interface to CCTools’ WorkQueue library
7 packages for various other backends (I was not able to figure out which backend is being used to store the queue):
- tpar: simple, parallel job scheduling
- Pup-Events-PQueue: A networked event handling framework for hooking into other programs.
- faktory: Faktory Worker for Haskell
- batchd: Batch processing toolset for Linux / Unix
- distributed-process-task: Task Framework for The Cloud Haskell Application Platform
- haskell-disque: Client library for the Disque datastore
- parallel-tasks: This library is useful for running a large amount of parallel tasks that run on top of the IO monad, executing them in batches from a work queue.

In-memory queues, or data-structures, or completely irrelevant

59 packages in this category. (Seriously, how many different takes on queues, parallel queues, concurrent queues, etc does the Haskell ecosystem need!)

Spock-worker: Background workers for Spock
- It seems that the library can support multiple backends but I was able to find only a pure/in-memory backend. If that really is the case, then this package is irrelevant with respect to the discussion at hand.
queue: Abstraction typeclasses for queue-like things.
psqueues: Pure priority search queues
deque: Double-ended queues
kazura-queue: Fast concurrent queues much inspired by unagi-chan
lockfree-queue: Michael and Scott lock-free queues.
pqueue: Reliable, persistent, fast priority queues.
priority-queue: Simple implementation of a priority queue.
control-monad-queue: Reusable corecursive queues, via continuations.
unagi-chan: Fast concurrent queues with a Chan-like API, and more
PSQueue: Priority Search Queue
pure-priority-queue: A pure priority queue.
dequeue: A typeclass and an implementation for double-ended queues.
data-concurrent-queue: A Library for directional queues
stm-queue-extras: Extra queue utilities for STM
rolling-queue: Bounded channel for STM that discards old entries when full
HFrequencyQueue: A Queue with a random (weighted) pick function
chaselev-deque: Chase & Lev work-stealing lock-free double-ended queues (deques).
bounded-tchanstm-chans: Bounded Transactional channels (queues)
abstract-deque: Abstract, parameterized interface to mutable Deques.
flush-queue: Concurrent bouded blocking queues optimized for flushing. Both IO and STM implementations.
abstract-deque-tests: A test-suite for any queue or double-ended queue satisfying an interface
bounded-queue: A strict, immutable, thread-safe, single-ended, bounded queue.
type-indexed-queues: Queues with verified and unverified versions.
pure-priority-queue-tests: Tests for the pure-priority-queue package
fingertree-psqueue: Implementation of priority search queues as finger trees.
meldable-heap: Asymptotically optimal, Coq-verified meldable heaps, AKA priority queues
stm-chunked-queues: Chunked Communication Queues
MSQueue: Michael-Scott queue.
min-max-pqueue: Double-ended priority queues.
concurrent-batch: Concurrent batching queue based on STM with timeout.
vpq: Priority queue based on vector
CMQ: cwmwl udp message queue
stable-heap: Purely functional stable heaps (fair priority queues)
write-buffer-stm: A write buffer for STM channels and queues.
heaps: Asymptotically optimal Brodal/Okasaki heaps.
pqueue-mtlqueuelike: Fully encapsulated monad transformers with queuelike functionality.
EdisonAPI: A library of efficient, purely-functional data structures (API)
disposableglazier-react: Allows storing different resource-releasing actions together.
eprocess: Basic Erlang-like process support for Haskell
aivika: A multi-method simulation library
aws-kinesis-client: A producer & consumer client library for AWS Kinesis
AVar: Mutable variables with Exception handling and concurrency support.
queuelike: A library of queuelike data structures, both functional and stateful.
AvlTree: Balanced binary trees using the AVL algorithm.
procrastinating-structure: Pure structures that can be incrementally created in impure code
tagged-binary: Provides tools for serializing data tagged with type information.
network-transport: Network abstraction layer
speculation: A framework for safe, programmable, speculative parallelism
BufferedSocket: A socker wrapper that makes the IO of sockets much cleaner
threads-pool: A library to operate with pool of haskell’s IO threads
network-connection: A wrapper around a generic stream-like connection
timer-wheel: A timer wheel
imap: An efficient IMAP client library, with SSL and streaming
http2: HTTP/2 library
huffman: Pure Haskell implementation of the Huffman encoding algorithm
NumberSieves: Number Theoretic Sieves: primes, factorization, and Euler’s Totient
keera-hails-mvc-model-lightmodel: Rapid Gtk Application Development - Reactive Protected Light Models
toilet: Manage the toilet queue at the IMO

Footnotes

When you shutdown your job-runner, what happens to jobs that have already been de-queued and are being executed? Suprisingly, we found that even delayed_job (from Rails) did not handle this very well. It used to kill the job-runners without waiting for the them to complete the currently running jobs, resulting in jobs hanging around in “locked” state till they were finally considered as “timed-out” and picked up for executing again. Graceful shutdown is handled by odd-jobs. ↩
Why did they even show up in Hackage search results? ↩