Protocol Relative URLs

Last week, I fixed an interesting issue and want to share it with you.

It first came from a bug report when a user found that one of the PDF’s logos was missing. Because we use wicked_pdf to generate PDFs, naturally, the first thing I did was open the html we serve to wicked_pdf in the browser. And guess what, the logo was showing correctly in my browser. That is strange, so I open the html template and check the source code, and what I found was something like this:

html <img src='//remotesite.com/img/logo' />

Notice that the image url does not have a protocol, is it even a valid URL? It turns out that it is, this type of url was official named as network-path reference in rfc3986 section 4.2. Paul Irish wrote a blog post to introduce it back in 2010 and give it a better name: Protocol Relative URL .

As the name suggests, with protocol relative URLs, the resource is requested base on the current page’s protocol, so if the current page is using http then the resource will also be requested with the http protocol; and if the current page is using https, then the resource is requested with the https protocol.

Since the protocol-relative URL is so easy to use and it works well in most of the browsers (except for IE6, but nothing works well in IE6), Paul Irish used it in html5-boilerplate, and many people adopted it to avoid the mixed active content issue especially when trying to move their site from http to https.

Although most modern browsers support Protocol Relative URLs, they don’t work as expected with wicked_pdf. Before calling wicked_pdf, a html file is saved in local storage, and wicked_pdf serves the file to wkhtmltopdf. Wkhtmltopdf considers it as a local file and automatically uses a file:/// protocol to reference assets. This causes problems because the resources aren’t always on the local server. Sometimes they are served from a CDN or from online storage such as S3.

For the same reason, the following scenarios also will not work: - A user tries to save a HTML page that contains Protocol Relative URLs and then opens it in the browser - A user is sent a HTML email that contains Protocol Relative URLs and they try to open it in an email client such as Outlook - Using background jobs to generate HTML files containing Protocol Relative URLs

Also, we found that Protocol Relative URLs cause issues with third party integrations (such as oauth via Facebook and Twitter) that expect the protocol. Here is an example from the internet: This person tried to share a page containing images with protocol relative urls: he6ci But Facebook didn’t pick up the images on the page: 6zqwh

After adding the protocol to the images uf9jb

Facebook displayed the images correctly: qufh8 Qt Webkit hasn’t supported Protocol Relative URLs since version 0.9.6, so any library that relies on Qt Webkit basically won’t work with Protocol Relative URLs.

It looks like they do more harm than good, right? So how did we fix this problem in the end? Simple, we stopped using Protocol Relative URLs, and updated the code to always use HTTPS protocol for external assets. All of the links like this:

html <img src='//remotesite.com/img/logo' />

are now changed to:

html <img src='https://remotesite.com/img/logo' />

It should be a fairly trivial shift as most CDNs allow HTTPS request nowadays, and SSL is fast enough in most scenarios. It also provides a more secure way to get the assets. So there is no reason to continue using Protocol Relative URLs anymore.

Finally, let’s end with Paul Irish’s update from the same Protocol Relative blog post: > Now that SSL is encouraged for everyone and doesn’t have performance concerns, this technique is now an anti-pattern. If the asset you need is available on SSL, then always use the https:// asset.