HTML5 Offline Application

by Arpan Dhandhania

HTML 5 is introducing several new features to the web such as multi-threaded JavaScript, cross document messaging and local storage. Today, we will look more closely at the offline application caching feature of HTML 5.

Offline Application Caching

All browsers have some kind of caching mechanism in place, but to be honest, they don't always work. You browse though a site on your laptop and then shut the laptop. After a while, you open up your laptop and click the Back button in the browser hoping to see the previous page that was opened. However, as you are not connected to the internet and the browser didn't cache the page properly, you are unable to view that page. You then click the Forward button thinking that at least that page will load, but it doesn't. You need to reconnect to the internet to be able to view the pages.

Until HTML 4, the only work around was that the user had to save each page individually. HTML 5, thankfully, provides a smarter solution. While building the site, the developer can specify the files that the browser should cache. In fact, on each page, you can specify which documents should be cached. So, even if you refresh the page when you are offline, the page will still load correctly. This sort of caching has several advantages.

  • offline browsing
    As the name indicates, the user will be able to browse through the site even when he is offline.
  • speed
    Files that are cached locally will load much faster. Usually style sheets are shared across all pages of a website. The first time you load a page from a website, it will take some time to download the style sheet, but when you click on other pages, the browser won't need to download the file again.
  • reduced load on server
    Every time you load a page that has some cached elements, the browser will poll the server to check if the cached file has been updated; if it hasn't, then it won't download it. By doing so, the load on the server is considerably reduced.

How It Works

The mechanism to ensure that the website is available to the user, even when they are not connected, is very simple. You need to specify the manifest attribute on the html element. The attribute takes a URI to the manifest, which contains the rules for caching.

This is what the manifest.cache file typically looks like:

The cache manifest has three section headers:


Note that the MIME type of the manifest file is text/cache-manifest. You might need to add a custom file type extension binding to Apache (or whatever web server you are running) or specify the mime-type, for instance using the PHP header directive.

Files listed under CACHE will be cached after they are loaded; while the ones under NETWORK are said to be white-listed. What this means is that they require a live connection to the server. If the user isn't connected to the server, the browser should not use the cached version instead.

The FALLBACK section contains entries that provide a backup strategy. If the browser is unable to retrieve the original content, the fallback resource will be used. In the example above, we display a static image in case the dynamic one is unavailable.

The last line in the NETWORK section contains the path to a folder to ensure that requests to load resources contained under /api will bypass the cache and always fetch the resource from the server.

In the manifest, any line starting with # is treated as a comment. Other than increasing the readability of the code, comments have another use in the manifest. Let us say you have specified that masthead.png should be cached; but you have updated the image. Now as the cache is updated only when the manifest changes, the user will continue to see the old image that was cached. You can do this by changing part of the manifest; so a good way of doing it is incrementing the version number every time you update a resource.