In The Name Of Research, Will Facebook Stop Allowing Us To Delete Our Data?
The company caveats this by noting that some of your data will not be deleted but declines to explicitly state what data it will keep other than private messages that you sent to other users. Messages aside, for most users deleting their account, should remove the majority of their information from Facebook’s servers within 90 days. However, buried in the announcement of Facebook’s new academic dataset released through the Social Science One initiative is a troubling caveat that suggests the company may actually keep our data for research and may make our deleted data available to academics across the world for research, even if we explicitly deleted our accounts, leaving us no way to escape the company’s surveillance machine.
Last month Social Science One published the documentation for their first research dataset that will be available to academics across the world to mine, consisting of the links that users have shared on Facebook over the past year and a half. Any link shared by at least 20 people and shared at least once publicly will be included in the dataset.
Buried in the detailed documentation is a line that raises grave concerns over Facebook’s commitment to allowing its users to actually fully delete their data from its servers. With respect to the completeness of the dataset, the documentation offers that “Data from users who have chosen to delete their accounts are not available due to legal constraints (and availability).”
Facebook’s public documentation on account deletion asserts that the company will delete all copies of a user’s data from its servers within 90 days of receiving a deletion request and that only private messages stored in another user’s account will remain. When asked whether this was still the case, a Facebook spokesperson confirmed that the information was still accurate.
This raises the question then – why does Social Science One’s documentation state that deleted user accounts are not part of their dataset “due to legal constraints” together with availability instead solely due to availability? In other words, why does it not simply say that deleted user accounts are deleted from Facebook’s servers and thus the data no longer exists to mine, regardless of any legal issues?
The inclusion of such specific language is very unlikely to be an accidental oversight given the precision with which the rest of the documentation is written, with its specific detailed caveats and nuances and Facebook's historical attentiveness to the precise choice of wording used to describe its rights to user data.
It would certainly seem that if Facebook legitimately deleted all traces of a user’s account from its servers upon receiving a deletion request, then Social Science One’s documentation would simply say “Data from users who have chosen to delete their accounts is not available, since Facebook permanently deletes all data associated with those accounts within 90 days.”
When asked whether the language in the dataset documentation was superfluous or how otherwise to reconcile it with Facebook’s public statements regarding how it handles deleted account data, a company spokesperson declined to comment other than to point to its public account deletion FAQ.
What happens after the Social Science One dataset is released? As users delete their accounts over time, at least some fraction of the links in the dataset will fall below the 20 person and one public share threshold, meaning they are in the dataset but would not have been included if the dataset had been generated today.
This raises the question of what happens to those links, especially links where every person that shared the link subsequently deletes their account? According to Social Science One’s documentation, they cannot legally include deleted user data in their dataset, yet if all 20 people that shared a link later delete their accounts in the months after the dataset is released, then deleted user data will be present in dataset, since the link would not have been included without those accounts.
While differential privacy should ensure that shared links cannot be tied back to an individual account, it only protects individual privacy, it does not address the issue of the inclusion of deleted user data being retained indefinitely and republished by Social Science One against the public assurances of Facebook that it permanently wipes deleted user accounts from its servers and Social Science One’s own statement that it cannot legally include deleted user data.
The only solution would be for Social Science One to reprocess its entire dataset on a regular basis to determine whether any user accounts that have been deleted would require removing links from the dataset. This, in turn, would create unique replication challenges.
Of course, the alternative scenario is that Facebook views Social Science One’s datasets as being managed by Social Science One, rather than itself, even though they reside on its servers. This would permit Facebook to correctly state that it deletes all data from deleted accounts, even if that data is permanently archived and redistributed to academics through Social Science One. Indeed, the language the company uses in its official description of account deletion would appear to permit such a view and the company has a previous history of creatively labeling external entities that have access to its data.
When asked to comment further on this scenario and whether Facebook will require Social Science One to regularly update its datasets to remove data from deleted users, the company declined to comment. When asked whether Facebook would consider it a violation of its policies if Social Science One permanently retained and republished deleted user data, the company similarly declined to comment. Repeated requests for comment to the Social Science Research Council (SSRC), which helps steward Social Science One and its public relations agency were not answered.
It is noteworthy that neither Facebook nor SSRC were willing to comment on the question of how to handle data from deleted user accounts given that they explicitly mention the scenario in their documentation and given how prominently Facebook has focused its public statements on deletion as the sole escape path to remove one’s personal data from Facebook’s clutches.
It also reflects how little attention Social Science One appears to have paid to such research ethics issues. When I asked the initiative for comment this past April on this very issue of data deletion and whether it would be updating its datasets over time to remove deleted data, SSRC responded that all questions regarding the handling of deleted user data were “TBD” to be figured out down the road. For an organization that has been publishing roadmaps of what data it plans to release and is producing exquisitely detailed documentation of those datasets, it is remarkable that when it comes to questions about research ethics and data privacy, its response is either “TBD” or radio silence, while Facebook similarly declines to comment.
Putting this all together, Social Science One was heralded as a new era in transparency and access to social media data that would enforce strict ethical controls and provide total visibility into the research process, yet instead we have an organization that views ethics as “TBD” and doesn’t respond to even the most basic of questions of how it will handle user data, while Facebook declines to comment on apparent conflicts between the initiative’s promises and the company’s own policies and public statements. After all, Facebook can't promise its users that it will delete all of their data when they delete their account while at the same time not commit to deleting their data from its research data archives that it makes available to academics through Social Science One. In the end, one has to ask whether we are taking a step forward in transparency and ethical standards in research or whether this latest initiative is merely the last nail in the coffin of our quaint and all too brief dream of online privacy.