It's no secret that reducing the size of HTTP responses can lead to performance improvements. Surprisingly, this is not a linear relationship; decreasing response size only slightly can dramatically reduce the time required to transfer the data.
This document explains the throughput characteristics of an established TCP connection and how they can shape performance, often in surprising ways.
Note: I making some simplifying assumptions here so that things are easier to model: a pre-existing, idle, TCP connection, and no packet loss. This effectively shows the best case scenario for how TCP can handle a response.
TCP has several mechanisms that govern how fast the sender can send data.
While a comprehensive understanding of TCP is way, way beyond the scope of this note (and not something the author would claim to possess anyway), the basic flow control mechanisms are not horribly complicated.
First, a bit of vocabulary
cwndfor new connections; 10 is the standard value and what Facebook uses
The maximum amount of data in-flight from the sender to the receiver is defined
min(MSS * cwnd, rwnd).
Each ACK for a data segment that arrives back at the sender frees up a slot in
cwnd. If the sender is unable to send additional data segments because
there are already
cwnd un-acknowledged segments in-flight, they can send out
new data each time an ACK arrives. In addition, the
cwnd is incremented by 1
each time an ACK is received, effectively doubling the
cwnd value each time a
flight of ACKs arrives for the outstanding data segments.
The sender can have up to
cwnd segments in-flight at a given time. Beyond
that, the sender is stuck waiting for ACKs before it can emit additional
segments. For large responses, this means that we typically see a pattern where
cwnd segments are emitted all at once, an RTT passes, and
cwnd ACKs arrive
all at once. At this point, the sender can then send out another
of segments. As a result, output tends to be bursty, with periodicity equal to
Recall that each ACK received increments the
cwnd by 1. For a large response
(i.e. the sender wants to send as much as possible at every opportunity), every
data flight is twice as large as the one before it.
If we're able to fit our response into the first data flight, we will require only a single round-trip to receive the response. The inverse is also true: if our response is only a single byte too large, the full response will not be available to the receiver for an additional RTT.
This illustrates an interesting property of TCP's congestion control algorithm:
when investigating latency it's useful to think of transmission size in terms
of the number of data flights that are required to transmit it, rather than
the absolute byte counts. That is, a single-byte response will take just as
much time to receive as an
cwnd * MSS response.
Here is the amount of time required to transmit various data payloads on
typical cell networks around the world. Assumptions: MSS of 1300,
cwnd of 10
(the IETF recommended
IW), and RTTs as shown for various countries.
RTT values are hypothetical but realistic RTTs for cell network users in the respective countries.
Using the above table, we can see that if we have a response that tends to be around 40k, the effort to reduce that below the 39k threshold will result in a 50% decrease in time to receive the data! Given that network time often dominates performance, this can be a significant win.
If you are running your own server, you could also increase the
directly, though you really want to be sure you know what you're doing; it's
easy to cause performance problems by introducing congestion into the network
that would have otherwise been avoided. For kicks, here's a link showing the
IW values for major CDN providers.