HTTP content negotiation on AWS CloudFront Part 2

My earlier post on HTTP content negotiation in AWS CloudFront covered support for negotiating the response encoding using the request's Accept-Encoding header. This post builds on that by adding support for negotiating Content-Type using the request's Accept header.

There are several reasons why Content-Type negotiation is more difficult than Content-Encoding. First, the media types being negotiated are hierarchical, supporting a type and subtype model, e.g. image/png. In addition, wildcards are supported, e.g. image/*. Finally, unlike Accept-Encoding when browsers explicitly send all of the encodings that they support, browsers tend not to do this with the Accept header. For example, Firefox sends Accept: */* when requesting images. This gives the HTTP server no indication of what the browser actually supports -- by the RFC the server would be allowed to return any content at all. As a result, servers are typically either conservative, returning only formats which are highly likely to be supported like image/jpeg, or fall back to heuristics like User-Agent sniffing to detect specific browser builds which support a server-preferred content type.

With AWS CloudFront, we can implement something similar to drive this process on Lambda@Edge. The code below is being used to serve this article, and will cause the image of a dog to be returned as WebP if your browser supports it.

'use strict';

const {
    ValueTuple,
    awsPerformEncodingNegotiation,
    awsPerformTypeNegotiation } = require('http_content_negotiation');

const SERVER_ENCODINGS = [
    new ValueTuple('br', new Map([['q', 1]])),
    new ValueTuple('gzip', new Map([['q', 0.9]])),
    new ValueTuple('identity', new Map([['q', 0.1]]))];

const SERVER_IMAGE_TYPES = [
    new ValueTuple('image/webp', new Map([['q', 1]])),
    new ValueTuple('image/jpeg', new Map([['q', 0.5]]))];

const SERVER_IMAGE_WHITELIST = new Set([
    'image/jpeg',
]);

exports.handler = (event, context, callback) => {
    const request = event.Records[0].cf.request;

    if (request.uri.endsWith('.jpg')) {
        const type = awsPerformTypeNegotiation(
            request.headers, SERVER_IMAGE_TYPES, SERVER_IMAGE_WHITELIST);
        if (type) {
            const uriWithoutExtension = request.uri.slice(0, -3);
            switch (type.value) {
                case 'image/webp':
                    request.uri = uriWithoutExtension + 'webp';
                    break;

                case 'image/jpeg':
                    // Nothing to do
                    break;
            }
        }
    } else if (!request.uri.startsWith('/gzip/') &&
            !request.uri.startsWith('/br/')) {
        const encoding = awsPerformEncodingNegotiation(request.headers, SERVER_ENCODINGS);
        if (encoding && encoding.value !== 'identity') {
            request.uri = '/' + encoding.value + request.uri;
        }
    }

    callback(null, request);
};

How does this work?

Similar to my previous post, this uses the zero-dependency, MIT-licensed http-content-negotiation-js library to run the content negotiation process. This library implements all of the requisite media range parsing and semantics, as well as some of the heuristics from mod_negotiation. For example, it treats matches against a subtype wildcard as having an implicit q-value of 0.02 if none of the media ranges in the request have an explicit q-value specified.

First, the SERVER_IMAGE_TYPES list of ValueTuple objects is created to represent our content type preferences for images. Note that we indicate that we have a preference for image/webp (implicit q-value 1) over image/jpeg (q-value 0.5), as the former compresses better. We also use SERVER_IMAGE_WHITELIST to track a whitelist of media types that we are willing to allow to match a wildcard. This handles the case where a browser sends Accept: */* but we're not sure if they really support WebP or not, in which case it's best to fall-back to something that we know is supported.

Next, the request handler looks for request URLs ending in .jpg and interprets this as a request for an image, performing type negotiation. We then rewrite the URL for the upstream request with a new file extension based on the negotiated content type.

Finally, it's worth noting that while type and encoding negotiation are not mutually exclusive, it is generally not worthwhile spending CPU cycles to encode and decode images. Because of this, we only bother performing encoding negotiation if we're not serving an image.