Facebook, Google and our Dwindling Privacy: Take My Data Please Edition
I read this article about the trade off for privacy and this quote from Ron Conway really stood out "For that value tradeoff, they're willing to provide information." I completely agree with him and not just because his support of his iconic investments is legendary. But his statement is not really relevant to most of what is going on at companies giving rise to the concern. We really do not know and cannot imagine what is being done. The government is not allowed to access the same information without a warrant, but the fiction of "consent through EULA" finds permission buried deep inside what consumers call a "click through agreement" and the drafters call a license grant.
Rather than go into a whole new rant, I am just reposting something I wrote about a year and a half ago. Sadly, even though we are becoming more aware from great editorials like this one by Lori Andrews in the New York Times, other than attempts by the powers that be to reframe the argument, not much has been done. Have a look at the video at the top of the page for Mitchell Baker's simple and logical solution, but in the mean time you can read my post from June 2010 to see what set me off. . . .
Sure E3 is going on and you might click through to this post to read something I had to say about it. Do you really think there is anything left to say? It is back and the whole LA Convention center is full of unicorns shitting rainbows while puppies dance on their backs. If you cannot make it down there, you may be better off. You do not want to step in a rainbow pile. There is so much E3 news I went ahead and wrote about something that is bugging me. But if you would rather see E3 stuff, here to go ahead.
I purchase a bunch of random things through itunes and because there is no real correlation between the timing of the purchase and the timing of the confirmation receipt, I often do not even open the purchase confirmation emails. But last week I got a few emails in a row and opened them to find out I purchased:
ViKey - Bộ gõ tiếng Việt - TELEX, VNI, VIQR, v2.0, Seller: Dinh Ba Thanh,
MyFlickr, v1.0, Seller: Do Tuan Anh ,
VnExpress 2010, v3.1, Seller: Do Viet Tuy,
VietnamCar 2010, v1.0, Seller: Do Viet Tuy,
DTCK 2010, v1.0, Seller: Do Viet Tuy, and
CafeF (Special Edition), v1.0, Seller: Pham Cao Phuc.
Another email told me I purchased VietStock 2010, v1.0, Seller: Do Viet Tuy. Curiously, I did not remember buying any of these things. I went to call the iTunes store, but I could not, there is no number. I looked on line and I noticed hundreds of posts on the official Apple discussion boards and across the web about people who had their accounts hacked and found no assistance from Apple. They all said the only recourse was through the credit card company, so I called my credit card company and they without any questions, they voided out the charges. They said it happens all the time.
After hanging up I realized I could not upgrade my iPad apps. iPad apps and related upgrades are tied to the user account at the time of purchase. My cunning grasp of the obvious connected the two issues. I called iPad support and devoted the next hour of my life speaking with a series of very helpful and happy Apple cult members who were very sorry I was having issues. Apparently they can call iTunes help, but consumers may only reach it by email. While they were genuinely kind and helpful, was somewhat disheartened by their responses. Apple gathers a bunch of data and asks for permission to use it. They tell me their genius will suggests interesting songs and movies if I let it track what I buy. Apps will be better if they can track location and if I lose my device, they can even tell me where my iPhone or iPad is if I just give them permission. When my credit card numbers were stolen Amex was able to identify aberrant usage within one charge. I’ve used the card all over the world with multiple purchases in multiple cities in a week and they never asked a question, but one charge in one grocery store in Los Angeles, and they nailed it. They called and asked if I made the charge, I told them I did not, and I had a new card in my hands within twenty four hours. So again, applying my highly regarded grasp of the obvious right around minute 46 of our getting to know you call I asked the very kind Apple person
I’ve had the account for about four years. Wouldn’t the store identify a sudden burst of purchases in Vietnamese and at least ask if it was me?”
“Oh no Keith” we are on a first name basis now, they are nice, they are Apple and they care about me ”that would be an invasion of your privacy and we would not do that. We would never look at what you buy.”
“BULLSHIT” I wanted to yell, but I didn’t. This NPC is too far gone. Far be it for me to embark on the deprogramming.
Contrary to what my Applebot told me Apple does take our data. Even though we don’t read the scrolling EULA, which was handed down through generations of very clever, albeit wordy, legal monks in the purest pursuit of full disclosure, we see their recommendations. Unless we believe in the recommendation fairy, the continued improvements points to their watching us. They tell us so. They promise provision of better service by parsing, analyzing and searching for correlations. What we may not know - because no one really reads the EULA - is Apple’s interaction with you does not end with the purchase. The company better serves you by tracking what you are playing in your library, how often it is played and when you last played it. We would be pissed if a little sister did this to you, even worse if it was a parent, but we let Apple do this and use it. I am not coming down on Apple here, Amazon has been doing this for years, as has TiVo, your credit company and Google. When it comes to our privacy, this is just one of the many aspects we give up without thought. Probably because it is too hard to wrap our heads around the value and amount of meta data flowing from the real data we provide and we really do not think any will do anything with it. We could not be any more wrong.
In the old days if you told someone you bought an album or a book it did not mean anything. But our data no longer exists in a vacuum. Now, the cloud around that data seamlessly blends with other clouds of data, exponentially growing with each merger. The cloud grows as the amount of data grows. We only see the data pile, whole new branches of science are looking at the invisible cloud and this stuff, is being used against us.
The press is going nuts over Facebook privacy policies, but the discussion of access to, and spread of, data we never intended to share is much quieter - bordering on nonexistent. In addition to what we disseminate by putting something up on Facebook or purchasing through iTunes or Amazon, we build vast silos of data just by using a browser. We have a personal silo on Facebook full of pictures, thoughts and connections,
a web activity silo stored on our ISP, a financial silo held in credit reporting agencies and banks built through our purchases and credit requests, a personal interest silo when we click on an ad, and more we can not even conceive. It is hard enough to imagine what companies are doing with the data we provide – Facebook can predict future hookups between members with 33% accuracy – we cannot even begin to wrap our heads around what will happen once the silos connect and network effect kicks in.
The Financial Silo.
Almost all of us are comfortable using credit cards. Aside from the risk of the waiter or store clerk stealing your number, we really don’t think about the individual purchase. Some people even feel comfortable enough to register with Blippy.com, making a game of broadcasting everything they buy. Why should we be concerned about individual purchases? Who could possibly care about your buying a 12 pack of Diet Coke and a game at Wal Mart? No one ever thinks these purchases speak to who we are, but credit card companies and banks build psychological profiles based on what we purchase and where we buy it.
The exploration into cardholders’ minds hit a breakthrough in 2002, when J. P. Martin, a math-loving executive at Canadian Tire, decided to analyze almost every piece of information his company had collected from credit-card transactions the previous year. Canadian Tire’s stores sold electronics, sporting equipment, kitchen supplies and automotive goods and issued a credit card that could be used almost anywhere. Martin could often see precisely what cardholders were purchasing, and he discovered that the brands we buy are the windows into our souls — or at least into our willingness to make good on our debts. His data indicated, for instance, that people who bought cheap, generic automotive oil were much more likely to miss a credit-card payment than someone who got the expensive, name-brand stuff. People who bought carbon-monoxide monitors for their homes or those little felt pads that stop chair legs from scratching the floor almost never missed payments. Anyone who purchased a chrome-skull car accessory or a “Mega Thruster Exhaust System” was pretty likely to miss paying his bill eventually.
Martin’s measurements were so precise that he could tell you the “riskiest” drinking establishment in Canada — Sharx Pool Bar in Montreal, where 47 percent of the patrons who used their Canadian Tire card missed four payments over 12 months. He could also tell you the “safest” products — premium birdseed and a device called a “snow roof rake” that homeowners use to remove high-up snowdrifts so they don’t fall on pedestrians.
These profiles are then used by credit card companies and banks to determine when to offer home loans, lower existing credit lines, or deny new credit, Without even thinking about it, we are building a profile of ourselves which is available to all who review our credit. With the passage of the new federal banking bill, the US Government will also have access to these records.
Web Surfing Silo
While credit card companies, and the US Government are building profiles of us, we are building profiles of ourselves. Our surfing habits create a unique “Clickprint” that can empower those reviewing the data to anticipate our behavior. Reams and reams of data are gathered and despite the statements contained in privacy policies, distributed. In 2006, AOL fired its CTO over the releases of stored and anonymized search data. AOL found the supposed anonymous data could be used to identify individuals making the searches. Balaji Pdmanabhan and Catherine Yang of Wharton and UC Davis, respectively, identified the reason for the concern in their paper “Clickprints on the Web: Are There Signatures in Web Browsing Data?” They found retailers can distinguish between different users in as little as three sessions and behavior can be identified in anywhere from 3 to 16 sessions. Imagine the profile we build when all of our surfing habits are taken into account. Four years later the situation is even worse.
In a more recent paper, Balachander Krishnamurthy and Craig Wills of AT&T Labs and Worcester Polytechnic Institute showed how advertisers can identify users by simply looking to the referral page for the click through.
A key question that has not been examined to our knowledge is whether Personally Identifiable Information (“PII”) belonging to any user is being leaked to third party servers via Online Social Networks (“OSN”). Such leakage would imply that third parties would not just know the viewing habits of some user but would be able to associate these viewing habits with a specific person.
In this work we have found such leakage to occur and show how it happens via a combination of HTTP header information and cookies being sent to third-party aggregators. We show that most users on OSNs are vulnerable to having their OSN identify information linked with tracking cookies. Unless an OSN user I aware of this leakage and has taken preventive measures, it is currently trivial to access the OSN page using the ID information. The two immediate consequences of such leakage: First, since tracking cookies have been gathered for several years from non-OSN sites as well, it is not possible for third party aggregators to associate identify with those past accesses. Second, since users on OSNs will continue to visit OSN and non-OSN sites, such actions in the future are also liable to be linked with their OSN identify.
Tracking cookies are often opaque strings with hidden semantics known only to the party setting the cookie. As we also discovered, they may include visible identity information and if the same cookie is sent to aggregator, it would constitute another vector of leakage. Due to the longer life-time tracking of cookies, if the identity of the person is established even once, then aggregators could internally associate the cookie with the identity. As the same tracking cookie is sent form different Websites to the aggregator, the user’s movements around the Internet can now be tracked not just as an IP address, but as associated with the unique identifier used to store information about users on an OSN. This OSN identifier is a pointer to PII about the user.
The leakage through sale of data was not only found on Facebook, but Myspace, LiveJournal, Hi5, Xanga and Digg as well as Google through DoubleClick and Yahoo through Right Media. While this may cause us to shake, there is more to be concerned with than teh leaks we can identify and stop. Facebook and Linkedin have actually created data science teams to analyze data and look for behavioral correlations to clickprints. According to a book critical of Facebook, Mark Zuckerberg used to play with the data to entertain himself.
As the service's engineers built more and more tools that could uncover such insights, Zuckerberg sometimes amused himself by conducting experiments. For instance, he concluded that by examining friend relationships and communications patterns he could determine with about 33 percent accuracy who a user was going to be in a relationship with a week from now. To deduce this he studied who was looking which profiles, who your friends were friends with, and who was newly single, among other indicators.
The threat is not ephemeral. Just to make sure, the FBI wants your ISP to keep all of your data for two years
Merging The Data
Ok, so the banking and credit side of the world knows about financial situation and the retail side of the world may know about our interests and peccadillos, but I am just being overly sensitive. Relative Loss of privacy is simply a cost of living in a faster, more fluid world. Right? Not really. What happens when the silos merge? Banks, credit card companies retailers and others can all merge the silos. Each has access to both silos by virtue of advertising programs and voluntarily provided data. We opt into the financial solo, but no one realizes a click through
on a credit card or refinance offer potentially merges silos. But if I am not doing anything wrong, there is nothing to worry about. Sure, you are not doing anything wrong in the present, but how does it look through behavioral prediction – a science, by the practitioners own admission is inaccurate at best. In a Minority Report kind of way and erring on the side of caution, companies wanting to protect investment will reduce your credit , and the TSA may put you on the no fly list on the basis of information taken completely out of context. Analysis of these vast data and metadata libraries is done by computers, not humans. Computers, sifting through reams and reams of data, spitting out tinier but still vast reams of data for application of algorithms for conversion into measures within a “acceptable” margin of error. Anyone whose credit rating has been dinged by a mistaken attribution knows the hell of being caught in a “guilty until proven innocent” cycle after falling within the margin of error. Imagine what happens when it gets into the hands of the government.
It gets even scarier when we consider Google not only has the search data, but Google desktop, creates metatags for every file on a computer, gmail indexes every email and its content, the proposed Google health service will provide access to medical data, and android phone provide communication and location data, google voice transcribes and indexes all voice mails and frequently called numbers, and the facial recognition could give access to comings and goings in public places. Google, and many others, will know everything about us, because we told them.
In the old days, when they were not being investigated, these companies would stand up for us. Google actually stood up to the US government and refused to offer certain services in China to avoid the risk of having to disclose data. Pre 9/11 the US Government did not have access to the data, post 9/11 through the Patriot Act and the new rules contained in the recently passed Federal Banking Bill, they get access to both silos. Even Google is not protecting data. The data accidentally gathered while mapping streets in Europe was recently handed over to authorities in Germany, France and Spain. Google admitted the collected data was in error, but they are handing over data which the governments may or may be actually be entitled to collect. The data ties IP addresses to the sites accessed.
In the even older days, we could live without footprints. When you wanted to see someone you would send a calling card. You could not get into someone’s house unless you were invited. No one knew where you went unless you told them. If a company wanted information about you, it asked for it. If the government was interested
in what you were doing, they investigated through formal requests to the courts and subpoenas were issued after a showing of cause. Today, in the interest of “helping companies to help us Each one of us has a Great Pacific Garbage Patch of data we never knew we built. It is time to clean up our garbage patches. Each data set we provide, wittingly and unwittingly, is part of a network, each connections grows the network, and therefore computing power, exponentially, until something much more powerful than us, is mixing, matching, dissecting, connecting, analyzing and organizing every piece of data about us. And the thing doing it, really doesn't care. The danger lies in what we do not know. The loss of privacy is increasing on an exponential rather than a linear course and when the last glimmer is extinguished, it will leave with a whimper, not with a shout.
Comments