Did Harvard Researchers Breach Students' Privacy?

When data is aggregated from social networks, supposedly anonymized, and then published, is that ethically wrong? Or is it only morally questionable if the data can be de-anonymized and used to identify individuals? Social scientists are caught in the crossfire. Harvard researchers were accused of breaching students' privacy.

When Harvard researchers aggregated Facebook data and supposedly anonymized it, then no one should have been able to identify those users. That was not the case, however, and the "privacy meltdown" caught Harvard researchers in a net of ethical turmoil. The right or wrong of data mining social networks for social-science studies is a bone of contention between privacy watchdogs and researchers. While one social scientist called all the data that can be collected from social networks a "wet dream," one privacy scholar said if it can easily be de-anonymized then it's an "ethical concern." When your data is collected from social networks, allegedly anonymized, and then published for a study, are you an injured party if you can later be identified? Is it a question of ethics when social-science researchers aggregate and publish studies?

The Chronicle of Higher Education looks at the "promise and peril" of when sociologists study social network data, focusing on when Harvard researchers harvested 1,700 Facebook profiles to study the social-science data for "how friendships and interests evolve over time." While scholars all over the world might find such research interesting, researchers came under heavy fire when part of the supposedly anonymized data was released to the public as "Tastes, Ties, and Time." It quickly turned into a privacy fiasco as it was possible to identify individuals; the Harvard researchers were accused of breaching students' privacy.

In 2006, Jason Kaufman, of Harvard's Berkman Center for Internet & Society, tasked Harvard research-assisting-students with downloading the 1,700 or so Facebook profiles from an "anonymous" university. But that gave students who were "friends" with other students "access to profiles that students might have set to be visible to Harvard's Facebook network but not to the whole world." In fact, the Facebook data was later identified as belonging to the Harvard College's Class of 2009. So much for anonymized as University of Wisconsin privacy scholar Michael Zimmer pointed out.

Zimmer wrote about "the suggestion that the 'Facebook 100' data has been 'anonymized' is seriously flawed," and its release "might be putting the information of 1.2 million Facebook users at risk." Zimmer said, the "Harvard project should have triggered an ethical concern." He has also covered the ethics of research in Facebook [PDF] including the "ethical concerns that must be addressed before embarking on future research in social networking sites, including the nature of consent, properly identifying and respecting expectations of privacy on social network sites, strategies for data anonymization prior to public release, and the relative expertise of institutional review boards when confronted with research projects based on data gleaned from social media."

Kaufman said critics of his research are acting like "academic paparazzi." In 2008, Kaufman talked about the controversy of not informing students of his data gathering. Although he had discussed it with the institutional review board, "alerting students risked 'frightening people unnecessarily.' We all agreed that it was not necessary, either legally or ethically."

The Chronicle quoted Alex Halavais, an associate professor of communications at Quinnipiac University and soon-to-be president of the Association of Internet Researchers as saying, "If you had to dream of research content, it would be sending out a diary and having people record their thoughts at the moment. That's like a social scientist's wet dream, right? And here it has kind of fallen on our lap, these ephemeral recordings that we would not have otherwise gotten."

Researchers who aggregate data may not stop to consider that data corresponds with a living, breathing human. Halavais had conducted a Twitter study on protests around the Group of 20 summit. But "some people were arrested for using Twitter to help demonstrators evade police." That caused one participant in the study to delete his Twitter account after his tweets had been collected to be published.

Because it is nearly impossible to completely anonymize data collected from social networking sites, both Facebook and Twitter have updated their privacy policies. Twitter notified several companies that they were violating Twitter's terms of service by archiving and redistributing tweets. Yet Twitter allowed the Library of Congress to digitally archive billions of public tweets since Twitter started in 2006 and some of that data is desired for research purposes.

The privacy controversy "tainted Harvard's data." Halavais said, "Once a data set has been clearly de-anonymized, it becomes a little bit like kryptonite. People will touch it, but you're putting your own ethical stance at risk if you do."

Like this? Here's more posts:

Follow me on Twitter @PrivacyFanatic

Copyright © 2011 IDG Communications, Inc.

Subscribe today! Get the best in cybersecurity, delivered to your inbox.