There was a time when we had a reasonable expectation of privacy, a time when air gaps separated computer systems, when aggregated data-sets containing the records of millions of people simply did not exist. As the Ashley Madison hack goes public it is clear that this time is past, and that, regardless of your views on adultery, these expectations are no longer realistic.
The back story here is simple. Ashley Madison appeared in the early 2000’s as a social media site targeted squarely at people who wanted to have an affair. There was little subtlety to the company’s approach as they plastered every available piece of online real estate with their tagline, “life is short, have an affair”.
In new media terms the company has been remarkably successful, continuing to make its way as other more traditional dating sites have undergone intense pressure from competition. Avid life media, who operates the company, claims more than 33 million members in 46 countries with someone new joining every six seconds. This huge membership has made them Canada’s largest.com.
There is already a tremendous amount in the media about the Ashley Madison hack, mostly focused on the ethics of the affair, whether the company adequately protected the members, and what legal implications they now face. I don’t really plan to cover any of this as I believe others have done an excellent job. My interests revolve around the data itself, and what additional data can be inferred from it.
The stolen data is now publicly available outside of the “dark web” in the form of a 13 GB data file that can be downloaded through just about any torrent client. There are so many people sharing this data that if you have a reasonably quick Internet connection it should take no less than an hour to obtain a copy.
I will leave the ethical considerations of whether such data should be downloaded to you, the reader. My purposes are those of a citizen journalist aiming to understand the nature of the breach, it’s extent, and the potential repercussions on the affected individuals. Since the data is now in the public domain I believe my constitutional rights cover this use.
Having obtained the files, I ran the usual MySQL commands to restore the database dump and ended up with a variety of tables available for query on my MacBook. My immediate attention turned to the login information which appeared to have relatively well protected user passwords, probably salted and hashed. This was an excellent sign that at least the basic principles of securing websites had been followed.
What caught my eye was the geographical data, namely the latitude and longitude identifying the members location. Here was a treasure trove indeed, and as I spun up my command prompt and struggled through my memory to recall how to compute distance, I quickly realized that I could locate every member who was within 2 miles of my house.
Approximately 10 seconds later I received a list of 245 people. Assuming that I did my math incorrectly I quickly spot checked the data. Nothing appeared to be wrong other than the fact that surely there could not be so many people in my immediate locality. I started going through the email addresses one by one, and stopped when I identified people I knew. The data was certainly plausible.
As usual, I was amazed by how many people do idiotic things. Approximately 20% of the Ashley Madison users did not bother to conceal their tracks and had used their corporate emails. Some were educational institutions, and I noticed several government addresses as well. A further 20% of users were using identifying email addresses composed of some variation of their names.
Think about that for a minute. If my estimates hold up that’s approximately 40% of 33 million people who have made use of the service that can be identified, a truly staggering number.
However, it gets worse. Even though the street address fields were uniformly empty, the latitude and longitude fields were not, and these identify a user to within meters of where they performed registration. A quick experiment confirmed my worst fears. I selected two people I knew, took their geographical data and plugged it into Street view. A moment later I saw their houses, larger-than-life on the Internet, where anyone in the world could find them.
A cold shiver ran over me as I thought about the repercussions for those people and their spouses.
At this point I stopped. I had no interest in verifying the authenticity of the credit card information, nor was I particularly interested in the sexual peccadilloes of my neighbors. My goal was to determine how identifying the stolen information was. My finding is that it is deeply incriminating, and many people will be having a very difficult conversation with their spouse and subsequent divorce lawyers.
The Ashley Madison affair is unusual in that it would be very hard for any affected individual to claim any other intent than to cheat on their spouse. The geographic data corroborates identities and makes it significantly harder for the non-injured spouse to claim their innocence.
The almost uniform presence of this geographic data is of deep concern as it was almost certainly collected from users web browsers, even in situations where they had elected not to put their address into the main database fields. Was it legitimately harvested? It does seem unlikely that most of the users would be willing to supply this information, and I wonder how many of them knew of this use?
To conclude, regardless of their views on adultery, I imagine that most people are concerned about my ability to identify users so easily based on information that they did not knowingly consent to. It’s important to see understand the usage of the word “knowingly” in this context, insomuch as I’m sure each user signed an agreement allowing the company to collect whatever data they required, but perhaps without considering the implications that could occur many years later.
I contend it is time for legislation to control the information that can be collected by private enterprise, or at least to present a summary of that information in an understandable form before requiring the user to assent to each individual use. After all, it is always possible that a small percentage of the 33 million people would not have used the service had they fully considered the future outcome.