Tinder was ticked after 40,000 profile photos were scraped to create the People of Tinder dataset, accused the person behind the script of violating its terms of service, and asked Kaggle to remove the dataset from the platform. Nevertheless, it was downloaded hundreds of time before the take-down which now results in a 404 error.
The People of Tinder dataset was created by Stuart Colianni; it consisted of 40,000 images from Tinder users in the San Francisco Bay Area – half were of women and half were of men. He intends to use the dataset with Google’s TensorFlow’s Inception to create a neural network capable of distinguishing between male and female images.
Colianni shared TinderFaceScraper on GitHub. He expressed disappointment in other small facial datasets before claiming, “Tinder gives you access to thousands of people within miles of you. Why not leverage Tinder to build a better, larger facial dataset?”
He uploaded the scraped Tinder pictures to Kaggle, a platform for predictive modelling and analytic competitions. Before Tinder asked Kaggle to remove the dataset, TechCrunch checked it out, reporting that the “People of Tinder, consists of six downloadable zip files, with four containing around 10,000 profile photos each and two files with sample sets of around 500 images per gender.”
Some affected Tinder users reportedly were not particularly thrilled to have their sexy selfies, which were intended to induce a swipe right, scraped and shared in a dataset which was downloaded hundreds of times for who-knows-what projects which leverage AI. It’s a good reminder: there are no guarantees that photos intended to be semi-private – or only seen by a specific person or people in specific circumstances – will not become public after you posted them be it through a breach, revenge porn or a scraper.
Others were insulted after seeing that TinderFaceScraper code included the following snippet:
# Iterate through list of subjects
for hoe in hoes:
# Get the subject ID
sid = hoe['_id']
# Gets a list of pictures of the subject
pictures = hoe['photos']
As for his choice of using “hoe” and “hoes” as variable names in his script, Colianni said it was an “oversight. This syntax was borrowed from a Tinder auto-liker, which I used as a reference when learning to interact with the Tinder API programmatically. I regret this oversight, and the code has been corrected.”
Colianni’s scraped dataset, Tinder claims, violated the prohibited activities section in its terms of service. Colianni updated his GitHub post to include: “I have spoken with representatives at Kaggle, and they have received a request from Tinder to remove the dataset. As such, the facial data set previously hosted on Kaggle has been removed.”
Tinder asserted to TechCrunch that takes “the security and privacy of our users seriously and have tools and systems in place to uphold the integrity of our platform.” It may care about users' privacy now, but that was questionable in April of 2016 when Tinder outraged some users after they were automatically opted in to Tinder Social.
In the statement for this go-around, the company tossed in a plug for its free product, then added, “We are always working to improve the Tinder experience and continue to implement measures against the automated use of our API, which includes steps to deter and prevent scraping.”
Yet Colianni pointed out, “The Tinder API Documentation has been available to the public for years, and there are numerous open source projects on GitHub such as Pynder showing how to make Tinder bots and interact with the Tinder API.”
As other outlets have reported, developers have tinkered with the Tinder API over the years, such as creating a catfish machine that tricked guys into thinking they were flirting with women when in fact they were flirting with other guys.