What Is Cache-Control? Everything You Need to Know

January 21, 2021
Cache Control

Contents

Try CDNetworks For Free

Most of our products have a 14 day free trial. No credit card needed.

Share This Post

Cache-control is an important way by which developers can dictate how resources will be cached when a user browses the internet. Without cache-control, the browser caching and the resulting experience for the user will be sub-optimal.

What is Cache-Control?

When a user browses the internet, the communication follows what is called the Hyper Text Transfer Protocol (HTTP) format. This is a protocol that dictates the standard of communication on the internet. Since the release of  HTTP/1.1 in 1997, there were few changes to the protocol  until HTTP/2 was released in 2015.

Both HTTP/1.1 and HTTP/2 include a number of elements intended to make caching work as well as possible. The user is the client, who sends a request to a web server (say in the form of a URL) and the web server responds (with a web page). HTTP headers are elements or parameters in this format that include additional information to make the HTTP transaction go smoothly .

Cache-Control is a HTTP cache header that contains a set of parameters to define the browser’s caching behavior in the client requests and server responses. When a client makes a request to the server, the browser can cache, or store copies of resources for faster access and lower latency. This means that when the browser has to retrieve the last modified files again, it doesn’t need to make a request to the web server again. Cache-Control specifies when and how a response should be cached and for how long.

What is Browser Caching?

Browser caching is the process by which a web browser saves website resources in order to load them quickly during the next client request. You can see it in action when you load a web page with a background image for example. The first time you load the page, the image gets saved in your browser cache. The next time you visit the page, you will notice that the page loads faster and latency is reduced. This is because the browser is not requesting the image again from the web server. Instead, it is loading the image from your local files.

The browser cache does not store the files for an indefinite period of time though. There is a set time frame, known as Time to Live (TTL) beyond which the cached resource will expire from the local files. If you load the page after the TTL has expired, the browser will have to place another request to the web server and receive a fresh copy of the resource. The TTL for each browser and server is specified in the HTTP headers.

HTTP Headers

HTTP headers are a set of conditional request parameters that contain additional information about the communication between a client and a server. The World Wide Web operates based on the Hypertext Transfer Protocol which outlines the syntax for all communications between clients and servers.

There are a number of headers for specifying various types of information in the client-server communications.

For requests, the header usually contains information on the resource being requested, the client’s browser and data formats that the client will accept. For responses, the information is usually about whether or not the request was successfully fulfilled and the language and format of any resources in the body.

Broadly speaking, HTTP caching headers can be categorized into:

General headers

These are HTTP caching headers which can be used for both request and response messages but doesn’t apply to the content of the message. Cache-Control is one such header. Others include Date, which specifies the date and time of the message, and Connection, which specifies if the network connection stays open after the transaction.

Request headers

These are headers that are used in the HTTP request. They contain more information about the resource being fetched, or about the client making the request.

Examples include Accept, which advertises which content or media types to fetch and Cookie, which contains the stored HTTP cookies previously sent by the server.

Response headers

Response headers include additional information about the HTTP response. Examples include Age, which specifies the time that the object has been in proxy cache, and Location, which indicates the URL to redirect a page to.

Entity headers

Unlike the others, entity headers contain information about the content and body of the message. They can be used in HTTP requests or HTTP response messages. Examples include the Content-Length which specifies the size of the entity-body in bytes, and Content-Language, which describes the language intended for the audience.

Cache-Control Headers/Status Codes Explained

Cache-control headers include information on everything to do with caching – how to cache, when to cache, when not to and more. They are essentially directives consisting of key-value pairs separate by a colon. The ‘key’ is what appears to the left of the colon and in this case is always “cache-control”. The value of the header appears on the right of the colon. For example, “cache-control: max-age” is one such directive.

Cache-control directives are considered request directives if they are used by the client in an HTTP request and response directives if they are used by the server in an HTTP response.

Here are some of the most common cache-control directives:

Cache-control: max-age

The max-age directive states how long the browser can use the fetched HTTP response stored as a cached copy from the time the request was made. It is the maximum amount of time specified in the number of seconds. For example, max-age=90 means that a HTTP response remains in the browser as a cached copy for the next 90 seconds before it can be available for reuse. For static files such as images, CSS and Javascript files, it is possible to use aggressive caching. A cached response that is older than max-age is called a stale response.

Cache-control: S-Maxage

s-maxage is similar to the max-age directive but the “s” stands for shared as in shared cache. This is relevant to Content Delivery Networks (CDN) and other intermediary caches. It overrides the max-age directive and the expires header field when present.

S-maxage vs max-age

Both s-maxage and max-age are Cache-Control header directives that specify how long a resource can be cached in intermediate caches such as proxies or browsers. However, a critical difference between the two directives has implications for security.

The max-age directive specifies the maximum time in seconds that any cache, including intermediate caches can cache a resource. On the other hand, the s-maxage directive only applies to shared caches, such as proxy servers, and is used to specify the maximum time in seconds that these shared caches can cache a resource.

By setting a short s-maxage value, the risk of sensitive information being cached for an extended period of time and exposed to unauthorized users is reduced.

In contrast, the max-age directive is more about optimizing web performance by controlling how long resources can be cached by client-side caches. While max-age can still have security implications, it is more commonly used to improve web page loading times and reduce server load.

Cache-control: no-cache

This directive tells caches that a resource is not available for reuse for subsequent requests to the same URL without checking if the origin server for the resource has changed. In other words, it is an instruction to the browser that it must revalidate with the server every time before using a cached version of the URL. This is useful to ensure that authentication is respected among other benefits. The no-cache directive uses the ETag header field for validation of the cached response by making a roundtrip to and from the server to ensure that the response has not changed. If there has been no change, no download is required.

Cache-control: no-store

no-store is similar to no-cache but simpler. With this directive, the HTTP response cannot be cached and re-used. Instead, the resource has to be requested and a full response is downloaded from the original server each time. This is especially relevant when dealing with private/personal information or banking data.

Cache-control: no-transform

When resources are stored in the cache server, intermediate proxies can sometimes make modifications to these assets. For example, they could change the format of images and files in order to save space and improve performance. This can cause problems if the asset is to remain identical to the original entity-body. The no-transform directive tells the intermediate caches or proxies not to make any such modifications. For example, they cannot edit the response body, Content-Encoding, Content-Range, or Content-Type.

Private Vs. Public Cache-Control

Public Cache-Control means that the resource can be cached by any intermediary between the server and the client, such as a proxy server. Private Cache-Control means that the resource can only be cached by the user’s browser and no other intermediary.

From a security standpoint, Private Cache-Control is generally better because it reduces the risk of sensitive information being cached by unintended parties. For example, if a web page contains customer account information, setting the Cache-Control header to Private ensures that this information is not cached by a proxy server that other users may use.

However, Public Cache-Control may be necessary for certain resources that intermediaries can safely cache without any security concerns, such as publicly available images or CSS files. In this case, it is crucial to ensure that the resource is not sensitive and that appropriate caching directives are used to prevent unauthorized access to sensitive data.

In summary, the choice of Private vs. Public Cache-Control depends on the nature of the resource being requested and the application’s security requirements. As a security expert, it is important to understand these differences and ensure appropriate caching directives are used to minimize security risks.

Status Code Configuration

Status code configuration lets the server specify which status codes should be cached and for how long.

For example, a web application may want to cache specific resources for a more extended period when the server responds with a 200 OK status code, indicating a successful request, but may prefer not to cache resources when the server responds with a 4xx or 5xx status code, indicating an error or server-side issue.

From a security standpoint, the status code configuration parameter can be used to prevent caching of sensitive information when a request generates an error or is otherwise unsuccessful. For example, suppose a user tries to access a restricted resource and receives a 403 Forbidden status code. In that case, the server may specify a short cache TTL (Time To Live) for that particular resource to prevent it from being cached and potentially exposed to unauthorized parties.

In addition, the status code configuration can also be used to mitigate certain types of attacks, such as CSRF (Cross-Site Request Forgery) or XSS (Cross-Site Scripting). By setting appropriate cache TTL values for resources involved in these attacks, the server can prevent older versions of these resources from being used maliciously.

In conclusion, the status code configuration parameter is an important aspect of the Cache-Control header that can be used to enhance security by controlling caching behavior based on the response status codes and mitigating certain types of attacks.

Benefits of Using a CDN for Cache-Control

Caching can be thought of as moving resources closer to a local drive from a server for faster access and reduced latency. This same idea applies for Content Delivery Networks (CDN) which moves your website content to proxies for accelerated content distribution and bandwidth optimization. Proxy servers are intermediate servers which cache resources instead of storing them all on the end user or a website visitor’s local drives.

CDNs provide numerous benefits for Cache-Control:

1. They simplify cache policy management
It can be overwhelming for web developers to manually tag file types, tweak and manage all the different cache headers. CDNs help them simplify cache policy management using user-friendly dashboards. Administrators can override cache header directives as and when needed and at a granular level to control specific files and file types.

2. They augment browser caching with proxies
Browser caching by itself does the job of downloading a website’s resources to your local drive after your first visit. CDNs can accelerate the delivery of these locally stored resources using proxies.

This helps bring content closer to the site visitors and makes sure that a single cached copy is served to multiple visitors. It also allows for quick delivery of resources even to first-time visitors whose browsers may not have cached the site content yet.

3. They can help automate caching using machine learning
Some of the more advanced CDNs are capable of automating cache control using machine learning (ML). ML algorithms can track content usage patterns and cache dynamically generate content and resources.

For example, a HTML file that has not changed much over time can be labelled static and classified as cacheable. It can be served directly from the CDN servers for faster page load and responsiveness. The algorithm can continue to track the status of the page and classify it as dynamic as soon as there is a change. This optimizes your storage and caching policies and improves content delivery speed.

More To Explore