As has been widely reported in the media, the United States has effectively allowed Internet Service Providers to sell information about people's Internet usage habits, something that was previously going to be banned effective later this year. To combat this, several media reports have suggested using encrypted web sessions (HTTPS rather than HTTP); in fact, major porn sites Pornhub and YouPorn -- whose business depends heavily on users' privacy -- have transitioned over the past few days to utilize encrypted communications by default, with Pornhub's VP Corey Price quoted as saying that "With more than 70 million daily visitors, we wanted to continue our concerted effort to maximize the privacy of our users, ensuring that what they do on our platform remains strictly confidential." Likewise, Brad Burns of YouPornnoted that, "As one of the most viewed websites in the world, it is our duty to ensure the confidentially and safety of our users."
But, there is a big problem with all of this advice and action:
HTTPS may NOT ensure users' privacy.
Here are several reasons:
1. DNS undermines privacy
Most people use their ISP's servers for Domain Name Services - that is, for translation of the name of the site to its actual IP address. While making a DNS request to translate the name of a porn site won't tell the ISP exactly how often one visited the site, or what items he or she viewed on the site, it will tell the ISP that he or she looked up the site name -- and information about usage patterns may emerge from repeated lookups after cached DNS data expires. Furthermore, while the fact that someone looked up pornhub.com may not, on its own, tell an ISP much about someone, what about cases where a teenage girl visits plannedparenthood.org, a college student visits aa.org, or a married man visits ashleymadison.com? Plenty of private information can be garnered from DNS lookups.
2. HTTPS sessions start unencrypted
In order to establish encrypted communications, modern browsers and web servers (or apps and servers) use something called a Transport Layer Security (TLS) Handshake. At a high level (intentionally oversimplified), that process typically involves the server proving to the browser that it is the party that it claims it be and then the two parties - the server and the browser -- establishing and agreeing upon the algorithm and keys needed for encrypting the session. During the earlier parts of the handshake process, however, communications are not encrypted - and the ISP (or anyone else listening in) can see the name of the server that a browser is accessing.
3. Websites, webpages, and media can be fingerprinted and identified even when encrypted
It is possible to determine what web sites, or even web pages within sites, are being accessed in an encrypted session even without cracking the encryption. For the sake of simplicity - I am obviously not going to give a cryptanalysis lesson in my this column - consider my website, JosephSteinberg.com, and the Inc.com website. Even someone watching encrypted traffic from them - who obviously cannot read any of the traffic's contents - can see that each of the two sites uses a different set of third party resources in specific orders, and can see that each site sends to the browser a different amount of content in different groupings. Likewise, on my own site, the many blog post pages deliver content in a pattern that differs from the one used by of the home page, but which is generally similar between the various blog post pages. If someone loaded 9 blog post pages and then the home page, it might be possible, therefore, for someone monitoring traffic to and from the browser to be able to determine the sequence of events -- that 9 blog posts were loaded and then the home page. With more detailed fingerprinting and analysis, the blog posts may be identifiable down to the exact page. Likewise, it may be possible to determine which video on a website someone watched by monitoring the amount of data in the associated stream. While ISPs may seem unlikely to implement such detailed monitoring, if others are willing to pay top dollar for such data, it is certainly not out of the question. The bottom line is that an ISP that analyzes encrypted traffic could likely "fingerprint" a significant percentage of websites being accessed (especially since it can load sites on its own and establish heuristics against which to compare); it may even be able, in many cases, to determine which page or page within a group of pages on a site a user accessed.
4. Network traffic analysis undermines privacy
In many cases, by simply performing a network-level analysis of traffic to and from a user's system (or cable modem), an ISP can determine with what server a user is communicating.
5. Auto-suggest patterns undermine the secrecy of user input
Any site that suggests input to users as they are typing (most people are familiar with this feature from Google searches), may undermine its HTTPS encryption. As with fingerprinting, it is possible that by monitoring the amount of data returned when a letter is typed and comparing it to results to similar tests, that a crafty system may be able to determine each letter that a user types as he or she does so.
So, is privacy dead?
My hope is that many Internet Service Providers will refrain from collecting and selling information about people's Internet usage habits. Protecting people's privacy can be a valuable selling point and competitive advantage. But, in any case, we must understand that in the case of those ISPs that do choose to profit off of people's personal information, relatively small investments may prevent HTTPS from being the defense that many people assume it to be. Remember, HTTPS was designed to encrypt data - not to hide the fact that communications took place, or to mask the fact that specific resources were accessed.
So, what should you do?
While there is no perfect solution, using an ISP that promises not to collect Internet usage details is obviously ideal. Furthermore, consider using Tor or a VPN (if you trust yourself to properly configure the VPN and trust the VPN provider to protect your privacy - perhaps these are material for a future article) - these are likely much better options than just relying on HTTPS. "Polluting" your Internet usage history -- ideally by running a tool that randomly searches and browses web pages -- will also reduce the ability of an ISP to understand what you actually do online (but, clever technology can often detect the difference between automated browsing and human browsing). Or, simply use HTTPS - and assume that any browsing that you do from your home may be tracked.