Node v6.6 and asynchronously handled promise rejections

Node v6.6.0 added a neat warning when it detects an unhandled promise rejection. For example:

Promise.resolve().then(() => { undefinedVariable * 5 })
// (node:57413) UnhandledPromiseRejectionWarning:
//   Unhandled promise rejection (rejection id: 20): ReferenceError: undefinedVariable is not defined

Prior to this warning, the ReferenceError would get swallowed up and you might be left scratching your head. It was possible to capture these unhandled rejections through other means, but it’s great to have it built-in to Node now.

If you’re seeing this warning on its own, then it’s most likely legitimate and you should try to track it down.

There is, however, a counterpart to the above warning, which is:

PromiseRejectionHandledWarning: Promise rejection was handled asynchronously

This will be raised if you actually do handle the rejection, but in a later tick. This pattern isn’t very common in most applications, but you might encounter it in your test suite.

The culprit for me were some Sinon.JS stubs which were doing this:

sinon.stub(Database, 'connect').returns(Promise.reject('nope'))

Although valid - the rejection is raised as part of the test and then handled - it raises the warning because Node has no way to know that it’ll be handled later on in the test. Instead, you’d get the PromiseRejectionHandledWarning warning a few ticks later as an acknowledgement.

The quick fix for me - and something I probably should have done a while ago - was to include the sinon-as-promised module which adds proper resolves() and rejects() functions to stubs. These don’t raise any warnings because the promise rejection is created when the function is actually called.

Aggregate queries across PostgreSQL schemas

Schemas are a useful feature in PostgreSQL which can be used for the design of a multi-tenancy system. Using schemas you could, for example, have one schema for each user of your application. Inside this schema the user has their own copy of all your tables, views etc, and the data within them is completely separate from any other schemas.

Conceptually, this approach is very similar to having a separate database per user, but in fact all schemas live inside the same database and can therefore be queried across within the same session. This also simplifies using them with your application, as all schemas can use the same database connection details.

Null-safety with JSON and PostgreSQL

If you’re using the native JSON support in PostgreSQL you may, from time to time, come across the issue of null safety. For example, if you’re trying to query a nested JSON object which contains null values. Attempting to use one of the JSON operators on a null will result in an error like this:

ERROR:  cannot extract element from a scalar

PostgreSQL’s -> operator is generally pretty good at soaking up null references. You can chain it on a nested object and it’ll always return null if it encounters a key which doesn’t exist. However, if it encounters a key whose value is actually null. you’ll get the above error.

The easiest way to avoid this is to make sure you never set a JSON key to null if you’re ever going to treat that key as an object. If you can’t make that guarantee though, you’ve got a few other options to navigate safely around the null.

Historical records with PostgreSQL, temporal tables and SQL:2011

Sometimes you need to find out what a record looked like at some point in the past. This is known as the Slowly Changing Dimension problem. Most database models - by design - don’t keep the history of a record when it’s updated. But there are plenty of reasons why you might need to do this, such as:

  • audit/security purposes
  • implementing an undo functionality
  • showing a model’s change over time for stats or comparison

There are a few ways to do this in PostgreSQL, but this article is going to focus on the implementation provided by the SQL:2011 standard, which added support for temporal databases. It’s also going to focus on actually querying that historical data, with some real-life use cases.

PostgreSQL doesn’t support these features natively, but this temporal tables approximates them. This requires PostgreSQL 9.2 or higher, as that was the first version with support for a timestamp range data type.

Before you dig in, it’s important to note that this extension does not provide complete support for the 2011 standard. Specifically, there is no support for the new syntaxes provided for querying across historical tables, such as the AS OF SYSTEM TIME keyword.

These are generally conveniences though - the temporal tables extension takes care of the updating, and you’ll at least be adopting a standard used by other databases, as opposed to rolling your own solution or using something application-specific.

How to use webpack with Rails

webpack is a powerful module bundler, primarily designed for front-end development, which can integrate nicely with bower and npm JavaScript modules.

It has quite a few advantages over the typical Rails methods of managing front-end JS, but can still slot in seamlessly with Sprockets and the asset pipeline. Unfortunately, the documentation for webpack is still a bit hard to digest, especially if you’re wanting to integrate it with Rails.

If you’re still not sold on using webpack, here’s some of what it can do for you:

  • manage all your front-end JS (& their dependencies) using NPM or Bower
  • automatically preprocess CoffeeScript, ES6, etc
  • output source maps for absolutely everything, with minimal effort
  • help you separate the JS for different pages into different files, with ‘common’ modules automatically shared across all pages
  • split off large modules into separate files which are only downloaded on demand (via require.ensure)

If that sounds good to you, read on to see how to use all this with either an existing Rails app or a brand new one. By the way, although this is Rails specific, some of what’s here might be of benefit when combining webpack with any Rails-like framework.

Tracking errors with Logstash and Sentry

Logstash is an excellent way to eat up logs and send them elsewhere. In a typical setup you’ll send them to Elasticsearch and the excellent Kibana for viewing and analysis, which works well but is missing a vital part: being alerted when your application throws errors.

There’s a whole bunch of ways you can deal with errors without Logstash, one of which is Sentry. This is a software service which takes your errors, samples and groups them and, crucially, alerts you by email (or many other options). Sentry can plug in to most applications with its different Raven clients, which will allow you to track errors as part of your application and then send them to Sentry directly.

But, well, if you’re already using Logstash to log everything (including errors), wouldn’t it be great to just have Logstash send errors on to Sentry for you? I think so! And, luckily, it’s quick and easy to do!

What can you do with PostgreSQL and JSON?

PostgreSQL 9.2 added a native JSON data type, but didn’t add much else. You’ve got three options if you actually want to do something with it:

  1. Wait for PostgreSQL 9.3 (or use the beta)
  2. Use the plv8 extension. Valid option, but more DIY (you’ll have to define your own functions)
  3. Use the json_enhancements extension, which backports the new JSON functionality in 9.3 to 9.2

I wanted to use this stuff now, and I opted to go with option 3. I wrote a blog post which should help you get going if you want to go this route: adding json_enhancements to PostgreSQL 9.2.

So let’s assume you’re on either 9.3, or 9.2 with json_enhancements. What can you do? Lots! All the new JSON operators and functions are in the 9.3 documentation, so I’m going to run through some of the more fun things you can do along with a real-world use case.

Adding json_enhancements to PostgreSQL 9.2

If you keep up with PostgreSQL developments there’s no way you’ve missed the JSON datatype introduced in 9.2, and the upcoming JSON functions in 9.3. The biggest change here is that in 9.3 it’ll be possible to query your JSON columns without needing to use plv8, which embeds V8 into Postgres.

That said, the JSON functions in 9.3 have been backported to 9.2, and can be used right now with 9.2. Before deciding if this is for you, you’ll probably want to play around with it on your development machine. If so, this guide might help!

Multi-cluster Ganglia on EC2 or unicast networks

Ganglia is a scalable distributed monitoring system, and is excellent for keeping tabs on 10s or 1,000s of servers.

It works best on multicast-capable networks, where its army of gmonds will chat to one another with minimal configuration, and the gmetad federator is able to ask any of them for data, but it’s fairly simple to set up a single cluster implementation on a unicast network too.

However, one of Ganglia’s awesome features - multiple clusters - is considerably more complicated to set up on a unicast network. There are a few options to do it:

  • Dedicate one gmond in each cluster as the receiver, and have all the others in that cluster send to it. This is OK, but if that one server goes down, there’ll be no stats for that cluster.
  • Run one receiving gmond per cluster on a dedicated reporting machine (e.g. same as gmetad is on). Works, but annoying to configure as you’ve got a big stack of gmonds running on one server.
  • Emulate multicast support by having each gmond in each cluster send/receive to all or some of the others. A central gmetad is configured to point at one or more of these gmonds for each data_source (cluster).

None of these are as elegant as the default multicast configuration, but we’ll have to make do. I’ve opted to use the third option as I believe it strikes the best balance between reliability and ease of maintenance, and to avoid having 100s of nodes spamming all the others constantly, you can designate a handful of receiver nodes in each cluster and have all the others report to those.

Note: doing this by hand will not be fun. Combine this guide with your server automation tool of choice - I’ll be using Chef, so you’ll need to translate the instructions yourself if you’re using something else.

Guide to setting up Nagios with Chef

Nagios is an awesome open source tool for monitoring servers and applications, and, being such a mission to install and configure, it’s a perfect use case for Chef. Opscode have a nice Nagios cookbook but it’s still not the most straightforward thing to get running.

To make things easier, I’ll explain here how to use this cookbook to set up a Nagios server and have it monitor multiple client servers and applications such as PostgreSQL, MongoDB and nginx.

Creating and bootstrapping EC2 instances in an Amazon VPC using Chef & Knife

Instances in an Amazon VPC are a little tricker than usual to create using the knife ec2 server create command, because they are, of course, private. However, if you have an SSH gateway you can use that to create and bootstrap them.

There is one small caveat, however: if you create a server in a public subnet, it will not be able to access the Internet (and thus complete the Chef bootstrap process) until you assign it an Elastic IP. The knife ec2 plugin doesn’t let you do this, so you’ll either have to do it manually after the server has been created, or script it using the AWS API.

Managing EBS volumes with Chef

Chef works great for creating and attaching Amazon EBS volumes to instances, and there’s an Opscode chef-aws cookbook which has providers for working with single EBS volumes and also for creating RAID arrays from multiple volumes.

The chef-aws cookbook is fairly straightforward on its own, but requires a little DIY to get it fully working. I’ll explain here how to use this cookbook to effectively create single or RAID EBS volumes and allow these to be customised using attributes in roles or environments (so you could have your staging DB use a single 20GB volume, and your production DB use a 200GB RAID).

Syntax highlighting in nanoc

Syntax highlighting is easy in nanoc using the (built-in) colorize_syntax filter and Pygments. Pygments is an extremely robust Python library with support for many languages. You wouldn’t ordinarily want to call out to a Python application from your Ruby app, but as your nanoc site is compiled this method works great.

Vagrant, Chef and the sudo cookbook

Vagrant is great. Chef is great. Let’s say you’re using Vagrant to test a Chef config for a server. Maybe you want to add the Opscode sudo cookbook to manage your sudo package and sudoers? If so, be careful!

This particular sudo cookbook replaces the /etc/sudoers file to do its thing, wiping out the %sudo ALL=(ALL:ALL) ALL line in the process. It just so happens that, on a typical Vagrant box, the vagrant user is part of this group, and depends on it to be able to do, well, anything.

Blindly installing the sudo book (as I did) will therefore result in your vagrant user being powerless. Side-effects of this include: network failures; an inability to shutdown the VM (vagrant halt) and much more!

If you are dead-set on using the sudo cookbook, make sure you add the vagrant user to the sysadmin group, or add the %sudo ALL=(ALL:ALL) ALL line back in to the sudoers file.

Node.js Cluster and working directory shenanigans

Cluster is a excellent built-in Node.js module which lets you run a master node process which spawns, balances and maintains multiple child processes. One nice advantage of this is you can reload the worker processes independently, so you can get zero-downtime deploys by deploying the new code, and asking the master to reload its workers.

Let’s say we have the following master setup:

  exec: '/app/current/worker.js'

On start, and when asked to reload, the cluster master forks node /app/current/worker.js processes. My problem came about due to my deploy process, with Capistrano. On a deploy, Capistrano adds the application code into a timestamped /app/releases directory, and then creates a symlink from /app/releases/timestamp -> /app/current.

Following the deploy, the cluster master happily reloads its workers. The problem is, the working directory of the cluster master is an old release (specifically, whatever the /app/current symlink pointed at when the master was last started). When Cluster forks a child process, that child process has the same working directory as the master.

What ended up occuring was the node /app/current/worker.js would run, and correctly load the worker.js file from the current symlink of /app/current (so, the latest release). However, anything inside worker.js which references process.cwd(), such as require('./lib/something'), would actually be resolved to an old file.

The special __dirname variable would, however, show the real directly, as __dirname shows the directory the current script is executed from. So, knowing that, the fix was simple - at the top of worker.js, make sure the cwd is the same as __dirname and change it if not.

if (__dirname !== process.cwd()) {