[Home] [Feed] [Twitter] [GitHub]

HTTP content negotiation on AWS CloudFront Part 2

November 21 2018

[Note -- this has been updated to fix a bug in handling Chrome]

My earlier post on HTTP content negotiation in AWS CloudFront covered support for negotiating the response encoding using the request's Accept-Encoding header. This post builds on that by adding support for negotiating Content-Type using the request's Accept header.

As Ilya Grigorik laid out several years ago, content negotiation tricky to get right, and yet particularly important when serving images. This is even more relevant today, as recently the browser ecosystem's support for WebP has taken a big step forward -- Edge just shipped support and work on Firefox support has resumed after a long hiatus. Furthermore, there continue to be interesting new image formats on the horizon such as AVIF and HEIF.

There are several reasons why Content-Type negotiation is more difficult than Content-Encoding. First, the media types being negotiated are hierarchical, supporting a type and subtype model, e.g. image/png. In addition, wildcards are supported, e.g. image/*. Finally, unlike Accept-Encoding when browsers explicitly send all of the encodings that they support, browsers tend not to do this with the Accept header. For example, Firefox sends Accept: */* when requesting images. This gives the HTTP server no indication of what the browser actually supports -- by the RFC the server would be allowed to return any content at all. As a result, servers are typically either conservative, returning only formats which are highly likely to be supported like image/jpeg, or fall back to heuristics like User-Agent sniffing to detect specific browser builds which support a server-preferred content type.

What is an HTTP server implementer to do? Apache's mod_negotiation has a fairly sophisticated set of heuristics for supporting content negotiation which covers some of this including working around usage of overly-permissive wildcards.

With AWS CloudFront, we can implement something similar to drive this process on Lambda@Edge. The code below is being used to serve this article, and will cause the image of a dog to be returned as WebP if your browser supports it.

'use strict';

const {
    ValueTuple,
    awsPerformEncodingNegotiation,
    awsPerformTypeNegotiation } = require('http_content_negotiation');

const SERVER_ENCODINGS = [
    new ValueTuple('br', new Map([['q', 1]])),
    new ValueTuple('gzip', new Map([['q', 0.9]])),
    new ValueTuple('identity', new Map([['q', 0.1]]))];

const SERVER_IMAGE_TYPES = [
    new ValueTuple('image/webp', new Map([['q', 1]])),
    new ValueTuple('image/jpeg', new Map([['q', 0.5]]))];

const SERVER_IMAGE_WHITELIST = new Set([
    'image/jpeg',
]);

exports.handler = (event, context, callback) => {
    const request = event.Records[0].cf.request;

    if (request.uri.endsWith('.jpg')) {
        const type = awsPerformTypeNegotiation(
            request.headers, SERVER_IMAGE_TYPES, SERVER_IMAGE_WHITELIST);
        if (type) {
            const uriWithoutExtension = request.uri.slice(0, -3);
            switch (type.value) {
                case 'image/webp':
                    request.uri = uriWithoutExtension + 'webp';
                    break;

                case 'image/jpeg':
                    // Nothing to do
                    break;
            }
        }
    } else if (!request.uri.startsWith('/gzip/') &&
            !request.uri.startsWith('/br/')) {
        const encoding = awsPerformEncodingNegotiation(request.headers, SERVER_ENCODINGS);
        if (encoding && encoding.value !== 'identity') {
            request.uri = '/' + encoding.value + request.uri;
        }
    }

    callback(null, request);
};

Dog and purple flowers

How does this work?

Similar to my previous post, this uses the zero-dependency, MIT-licensed http-content-negotiation-js library to run the content negotiation process. This library implements all of the requisite media range parsing and semantics, as well as some of the heuristics from mod_negotiation. For example, it treats matches against a subtype wildcard as having an implicit q-value of 0.02 if none of the media ranges in the request have an explicit q-value specified.

First, the SERVER_IMAGE_TYPES list of ValueTuple objects is created to represent our content type preferences for images. Note that we indicate that we have a preference for image/webp (implicit q-value 1) over image/jpeg (q-value 0.5), as the former compresses better. We also use SERVER_IMAGE_WHITELIST to track a whitelist of media types that we are willing to allow to match a wildcard. This handles the case where a browser sends Accept: */* but we're not sure if they really support WebP or not, in which case it's best to fall-back to something that we know is supported.

Next, the request handler looks for request URLs ending in .jpg and interprets this as a request for an image, performing type negotiation. We then rewrite the URL for the upstream request with a new file extension based on the negotiated content type.

Finally, it's worth noting that while type and encoding negotiation are not mutually exclusive, it is generally not worthwhile spending CPU cycles to encode and decode images. Because of this, we only bother performing encoding negotiation if we're not serving an image.