Data dating sites
Data is already public.” This sentiment is repeated in the accompanying draft paper, “The OKCupid dataset: A very large public dataset of dating site users,” posted to the online peer-review forums of Some may object to the ethics of gathering and releasing this data.
However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form.
As Kirkegaard stated: “Data is already public.” No harm, no ethical foul right? Many of the basic requirements of research ethics—protecting the privacy of subjects, obtaining informed consent, maintaining the confidentiality of any data collected, minimizing harm—are not sufficiently addressed in this scenario.
Moreover, it remains unclear whether the Ok Cupid profiles scraped by Kirkegaard’s team really were publicly accessible.
About one-in-five 18- to 24-year olds (22%) now report using mobile dating apps; in 2013, only 5% reported doing so.The “already public” excuse was used in 2008, when Harvard researchers released the first wave of their “Tastes, Ties and Time” dataset comprising four years’ worth of complete Facebook profile data harvested from the accounts of cohort of 1,700 college students.And it appeared again in 2010, when Pete Warden, a former Apple engineer, exploited a flaw in Facebook’s architecture to amass a database of names, fan pages, and lists of friends for 215 million public Facebook accounts, and announced plans to make his database of over 100 GB of user data publicly available for further academic research.Online dating use among 55- to 64-year-olds has also risen substantially since the last Pew Research Center survey on the topic.
Today, 12% of 55- to 64-year-olds report ever using an online dating site or mobile dating app versus only 6% in 2013.We must reframe the inherent ethical dilemmas in these projects. And we must continue to develop policy guidance focused on the unique challenges of big data studies.