Taming asynchronous calls with node-batchflow

Callback hell

Node's event driven nature is what makes it run so damn fast, but it can also introduce some real challenges for the developer. If you aren't careful
you can quickly end up in "callback hell" when trying to coordinate multiple asynchronous calls.

Imagine you are working on the administration menu of your new shiny web application. You need to load the current user, and the last 10 documents. The
naive way to implement this looks something like this:

User.findOne({id: request.userId}, (err, user) ->
    throw err if err
    Document.findLast10 (err, documents) ->
        throw err if err
        response.render 'menu',
            currentUser: user
            documents: documents
)

Now this doesn't looks so bad, but now imagine you need to make three more async calls to fetch data, which all have to be rendered together:
now you are really getting venturing into callback hell! We could just fix this by using synchronous calls, but then we might as well give up on using node
since we'll lose the non-blocking goodness that makes node perform so well.

Taming hell

Luckily for us there are a number of libraries that help us coordinate asynchronous tasks such as async and node-batchflow.
I'm going to use the latter as I found it much more CoffeeScript friendly.

The basic idea is to batch up the work into separate tasks, and run a callback when it's all finished. So we could implement the above like:

tasks =
    currentUser: ->
      User.findOne({id: request.userId})
    documents: ->
      Document.findLast10()
work = batch(tasks).parallel().each (key, task, done) ->
   task().exec (err, results) ->
     throw err if err
     done({id: key, results: results})
work.end (results) =>
  resultsHash = results.reduce (hash, value) ->
    hash[value.id] = value.results
    hash
  , {}
  response.render 'menu',
    currentUser: resultsHash.currentUser
    documents: resultsHash.documents

Admittedly this looks a bit of a monster now, so lets refactor the heavy lifting to a helper function:

batch = require('batchflow')

batchLoad: (tasks, callback) ->
  work = batch(tasks).parallel().each (key, task, done) ->
     task().exec (err, results) ->
       throw err if err
       done({id: key, results: results})
  work.end (results) ->
       callback results.reduce (hash, value) ->
         hash[value.id] = value.results
         hash
       , {}

This function assumes it is passed a hash of tasks, each with an id and a function that returns a database query object which we can run exec() on.
Once all the database queries are finished, we reduce the results to a single hash and call the callback passed to the helper function. Using this helper
function we can implement our menu logic like so:

tasks =
    currentUser: ->
        User.findOne({id: request.userId})
    documents: ->
        Document.findLast10()
@batchLoad(tasks, (results) ->
  response.render 'menu',
      currentUser: results.currentUser
      documents: results.documents
)

This will now scale nicely when we need to add even more data to our menu:

tasks =
    currentUser: ->
        User.findOne({id: request.userId})
    recentUsers: ->
        User.findLast10()
    documents: ->
        Document.findLast10()
    monkeys: ->
        Monkeys.findAnyWithBananas()
@batchLoad(tasks, (results) ->
  response.render 'menu',
      currentUser: results.currentUser
      recentUsers: results.recentUsers
      documents: results.documents
      monkeys: results.monkeys
)

Compare this to the callback hell alternative:

User.findOne({id: request.userId}, (err, user) ->
    throw err if err
    User.findLast10 (err, users) ->
        throw err if err
        Document.findLast10 (err, documents) ->
            throw err if err
            Monkeys.findAnyWithBananas (err, monkeys) ->
                throw err if err
                response.render 'menu',
                    currentUser: user
                    recentUsers: recentUsers
                    documents: documents
                    monkeys: monkeys
)

I think you'll agree this is much harder to read and much more bug prone.

Command line scripts

node-batchflow is also super useful when writing command line scripts. For example if you are spawning a lot of command line calls, you can quickly exceed the number of maximum processes if you run everything in parallel. node-batchflow helps you run your tasks in parallel with a maximum limit on the number of
parallel processes, which is super handy.