What came before WebSockets?

Pre-webscokets.png

How did we (developers) achieve realtime browser push before WebSockets and what were the downfalls of those technologies which meant they never really became mainstream? Here’s that write-up.

Introduction

With the arrival of WebSockets we finally have a standardised technology for true realtime bi-directional communication between a server and a web browser (or any other client). When we were creating our What are WebSockets? page we decided to write up a history of the technologies that came before and that are in some cases still in use today. How did we (developers) achieve realtime browser push before WebSockets and what were the downfalls of those technologies which meant they never really became mainstream? Here’s that write-up.

The Internet wasn’t originally built to be all that dynamic. It was conceived to be a collection of HyperText Markup Language (HTML) pages linking to one another to form a conceptual web of information. Over time the static resources increased in number and richer items, such as images, began to be part of the web fabric. Server technologies advanced allowing for dynamic server pages – pages whose content was generated based a query.

DHTML

Soon there was a requirement for the Internet client, the web browser, to be more dynamic, for it to offer a richer experience. This was achieved through browser scripting in the form of VBScript and JavaScript and was known as Dynamic Hyper Text Markup Language (DHTML). Whilst this improved the web browser experience to some degree it still fell short in allowing a true application experience to be built within the web browser – the web pages still needed to refresh and reload relatively frequently.

Cross Frame Communication

As web browsers evolved so did the scripting technologies and the techniques employed when using them. Application-style functionality was achieved within websites though tricks such as cross frame communication. These techniques allowed new data to be loaded from the server without the page refreshing by a frame loading a new page from the server and in doing so retrieving new information.

Up until this point all actions within a website, web page or web application had consistent of a user action generating a request to a server from the web browser. But what if the web server had some new, additional information for the user?

HTTP Polling

The first solution to this problem was for the client to poll the server at regular intervals. This solution was, and still is, inefficient and leads to stale data being displayed in web pages and applications.

LiveConnect

The next step in the evolution of the web and web applications was for developers to find a way for the server to push new information to the to the web browser from the server. This was first achieved through the use of Java applets which could create a long-held connection with the server and communicate with JavaScript in the web page through a feature known as LiveConnect.

Forever Frame

The LiveConnect technique was relatively quickly superseded, due to Java Virtual machine inconsistencies, by a native browser technique known as forever frame where a long-lived HTTP connection is established to the server using a hidden frame. Data, usually

AJAX

Then, thanks to Microsoft’s requirements with their Outlook web application, the XMLHttpRequest object was born introducing at technology that made something we all now know as Asynchronous JavaScript and XML (AJAX) possible. The ability to make a request to the server from JavaScript without the need for any cross frame communication, often referred to as a hack, had been long awaited. Other browser vendors slowly but surely introduced support for the XMLHttpRequest and without it being an official standard it became one.

HTTP Long-Polling and XHR Streaming

Additional techniques arrived including script tag long-polling, htmlfile ActiveX Object, XHR long-polling, XHR multipart-replace and XHR Streaming.

The long-polling techniques work by establishing a connection to the server which is held open. When the server has more data for the client it sends that data through and closes the connection. The client then re-establishes the connection and waits for any new data. The main problem with this technique is that during the reconnection process the data on the page could be out of date and inaccurate.

XHR multipart-replace and XHR Streaming are much better HTTP solutions since they maintain a connection between the client and the server. Even so the long-polling solutions were more popular due cross browser inconsistencies with XHR multipart-replace and XHR Streaming. XHR multipart-replace, which was potentially the best solution of all, only works in Gecko-based browsers. XHR Streaming worked in all browsers the responseText of the XMLHttpRequest object would continue to grow until the connection was closed meaning a reconnection had to eventually be forced to clear this buffer.

Bi-directional Communication (and the two connection limit)

One of the problems with all of the realtime browser solutions so far was that they all required two HTTP connections. The first HTTP connection is used as the back/data/streaming channel where data is sent from the server to the client. A second connection is required to send commands for things such as logging in, changing subscriptions and publishing events/messages. To begin with this caused quite a few problems due to a two ‘same domain’ connection limit enforced by web browsers which could lead to slow loading pages or the connection in a second browser window failing to establish at all. All modern browsers now have a high same domain connection limit which means this is much less of a problem.

Cross domain support

For a long time we’ve been able to embed a script tag on our site from CDNs to save us bandwidth and benefit from browser caching. One big restriction with scripting however is that script running on a web page can only communicate with other scripts executing in the same domain. For example, if a page served from pusher.com contained an iframe serving news.bbc.co.uk then script in these web pages would not be able to communicate with one another, and rightly so. This restriction was also enforced for the XMLHttpRequest object; If script was running in a page on pusher.com the XMLHttpRequest could only be made to a resource on pusher.com. In an age where web services are everywhere and usage of such services are encouraged for things such as mashups then this became quite a big restriction. This additionally meant that anybody trying to develop an application using a realtime push technology had to host their realtime server on the same domain as their website. This meant they had to self host. Again, in a time where cloud hosting, cloud services, Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) are so popular this was a massive restriction.

The need for this Cross Origin Resource Sharing (CORS) lead to the introduction of the Access-Control-Allow-Origin header which allowed the server to indicate to the browser making the XMLHttpRequest whether that request was actually allowed.

Even with CORS things weren’t as simple as they should be. The forever frame v XHR long-polling v XHR streaming v XHR multipart-replace decision still needed to be made along with the new additional quirk of Internet Explorer adding its very own object, XDomainRequest, which had to be used for cross domain web requests.

These inconsistencies between web browsers and the multitude of ‘realtime browser solutions’ meant that barrier to using realtime push functionality, not to mention bi-directional communication, was still too high.

But now we have WebSockets!