Europe’s GDPR, California’s CCPA, Brazil’s LGPD… in the past years it has become more and more common to read about some acronym that represents new data privacy legislation across the world. Of course, everyone remembers famous data privacy cases such as the Cambridge Analytica scandal, when the company was able to use private Facebook data from millions of users to create and sell psychological profiles of American voters to political campaigns.² Or maybe you just got terrified about how that ad on Google/Facebook/Instagram/Amazon was about the exact same thing you were just discussing with your friends 30 minutes ago, or how Medium knew to suggest you read this article.
“But how does Facebook know all these things about me?”
Facebook’s Data: How Companies Get to Know You
When I say “Facebook” here, it’s simply a trick to call your attention since the company has been known for a few data privacy scandals and numerous rumours about hearing from our phone without permission.³ ⁴ What I really mean is any company that sells ads on their platforms, which also includes Google, Amazon, Instagram, and basically any other “free” website that we use on a daily basis.
“… whatever, how do they know me so well?”
Well, like all nice desserts in Silicon Valley, it all starts with cookies.
Cookies Data: no, not that kind of cookie
You know when you enter a site for the first time and you just press “Okay” to this random pop-up that you never really read? It’s asking you to accept their cookies.
Put simply, a cookie is a file created practically every time you enter a new website and stored on your computer (more specifically, the browser you are using). It’s a way for those websites to store some basic data about you and personalize your experience. It allows, for example, for you to browse Medium without having the need to log in again literally every time you navigate to a different Medium page.⁶
But they can also be used as a tracking mechanism, as a way of storing (not exactly storing, but effectively the same thing in the end) which pages you have visited in that domain, your searches, etc.⁷ For example, say you are going from tab to tab looking for possible travel destinations in Europe and, one week later, you are browsing Google and an ad selling flight tickets to Mallorca shows up. Coincidence? Probably not — most likely that data was tracked through cookies, which were then used by Google Ads to offer you that ad.
“But Roberto, that can’t be the whole story, how did they know that I like rock music if I’ve never entered any website related to that?!”
Proprietary Data: what you’re willingly giving them
This is for sure the most obvious data source that all these different players have a different version of, but somehow many people seem to forget about it. Google knows everything you search for; Facebook knows absolutely all groups, page likes, shared posts, and basically every move you’ve made on the social media app; Amazon knows every single purchase (and Alexa conversation⁸) you’ve made on its platform in your entire life; and the list goes on and on.
Is it really that hard for FB to know that you like rock music when you’re part of 2 rock groups, liked 5 different rock band pages, and once shared that video of a tribute to Chester Bennington (😢)? And this is not even taking into account the cookies that I just explained.
And in no way am I judging you or any other person in the world for giving this data for free to all these different providers. Whether the internet was designed to work like this from the start or not, truth is it evolved over the years to be what we see now: we don’t pay to use search, social media and many other tools, but all these companies need to profit as well, that’s just capitalism. Whether this is a good thing, I’ll probably discuss in a future article, but for now, just know that I understand why you do this.
“Look, I swear to you, I’ve never in my life entered any websites that could have created these… biscuits…? Whatever, I never did and I share as little as possible in my social media accounts, there has to be something else!”
Partnership Data: everyone plays the game
You’re reading this article on Medium, right? What’s your login on Medium? Trick question, you most likely don’t have a specific user-password pair for Medium, you probably log in with your Facebook or Google account. By this point your browser on the computer and app on your phone already have those saved and maybe you have forgotten, it happened to me too.
Just as Medium has the possibility of logging in through all these different partner options, so do Spotify, Fitbit, Ticketmaster, and even Pokémon Go! And many of these platforms also use business tools from those ads companies, which means they are actually sharing even more information.
For example, say you are logging in to Fitbit through Facebook for the first time. There will be a message that goes something like “Fitbit will receive: your public profile, friend list, email address and birthday.” And similarly, the other way around, these apps will share your data with Facebook too! By integrating their websites or apps with, say, Google Analytics or Facebook SDK, those partners automatically share your data with Google and Facebook respectively, without asking for your permission on doing so.⁹
Let’s walk through a quick exercise to illustrate how deep all these different data points really are. If you are reading this around September 2020, follow these steps: enter the Facebook app → click the three dashes on the bottom right corner → scroll down → Settings & Privacy → Settings → scroll down until “Your Facebook Information” → Off-Facebook Activity → Manage Your Off-Facebook Activity. Stop for a second and read what it says. Mine has an impressive 196 apps and websites that have shared my activity with Facebook. I didn’t even know I had ever accessed Vox.com in my life, but there it is with 14 data points. Disclaimer: all these data points seem to aggregate cookies, proprietary data, partners’ data and more (keep reading!). (This was not very clear to me from the app’s description.)
Now let’s go deeper: choose an app → xxx interactions were received → Download Activity Details → scroll down → choose a date range or the file will be huge → download it once it’s ready → play with the zip file you just downloaded. I eventually came upon ads_and_businesses → your_off-facebook_activity. In there you can find all the data that these partners sent to Facebook, even if you never logged into them with your account. Taking a look at “XP Investimentos”, which is the app I use to invest my money, I see that they send Facebook information every single time I open the app, also containing the date and time I did so.
If you, a regular person with zero computational power, wished to sell me an investment and you had that information, what would you do? Without being creative, you could at least try to do that around the regular time I use the app. Imagine if you had the computational power the tech giants have…
“Oh… my… god…”
But wait, It’s not over! There is still more data!
Data Brokers: the data living in the shadow
From what we have seen so far, it sure seems like these ad companies can own a lot of information about you and understand you fairly well. And it’s natural to think of Facebook, Google, Amazon, etc. because these are the players that are in constant contact with us, the consumers. But collecting data about you is not their mission, it’s not their most important capability.
Imagine if there was a company purely dedicated to collecting data about customers like you and me, to later maybe create a profile about us and sell it. Literally all they would be worried about would be getting more and more information about you. Sounds like a stalker, kind of creepy even, right? Well, these guys exist and they are called data brokers!
Data brokers are constantly buying data from a myriad of sources or simply aggregating publicly available information (ahem ahem social media ahem ahem). I know it sounds sketchy, and I’m not going to enter the discussion whether it is (at least not on this post), but I want to add the disclaimer that some of them do, for example, fraud detection, which a lot of people would argue is acceptable.¹⁰
Some of the sources that data brokers collect data from:
- Public records: property records, court records, Census data, …
- Commercial sources: purchase histories, warranty registrations, …
- Social media: Facebook, Instagram, Twitter, …
- Buying from platforms that collected data from you
- And of course, buying/partnering with other data brokers¹¹
To give you a sense of the power they have, Acxiom, one of the biggest and most important data brokers, states they have “more than 11,000 data attributes in more than 60 countries helping brands connect to 2.5 billion people.”¹² Let me say that again: eleven THOUSAND attributes, 2.5 BILLION people. Let that number sink in for a second…
In fact, they are so powerful, that even the big Facebook (I’m sorry I’m picking on you guys so much) used to partner with them until a couple of years ago. (I’m not really sorry)¹³
“Okay… these guys have a lot of data on me… now I’m involved: how does all of this translate into creepy ads that seem to be hearing everything?”
I’m so happy you asked me that!
Integrating your data
To answer that, I’m going to start by reframing your question with the one that was on my mind for a long time: what is the common piece of data in all of these different bits of information that allows these companies to join everything together?
My name? My last name is too long, every time I use a different part of it…
My email? I have three different email accounts, which I use for signing up for services depending on what they are, this shouldn’t work neither…
My phone number? I’ll avoid inputting that as much as possible…
“So… What is it?”
The Device ID
What is the one thing in common in all of your interactions with all those apps and websites? Even more common than using your Gmail account to login. The one thing I can argue you for sure use more than any app: your device. The average American adult spends more than 10 hours per day in front of a screen.¹⁴ And with every interaction you have when you use that app on your iPhone, the app collects an unique identifier that can always be traced back to which specific phone it is.¹⁵ And there you go. In the blink of an eye, all these different players can aggregate your data and create a pretty accurate profile of who you are.
Disclaimer: a big part of the data matching, maybe as important as the Device ID, and that receives considerable resources from big tech, is Record Linkage, also known as entity resolution. Put simply, it’s a technique that matches new data acquired, possibly without the device ID or with a different device ID (say you use your mom’s iPad to log in to Facebook), with data already present in their database, being a complemental (but crucial) step on the data aggregation process.¹⁶
Apple, buddy, what are you doing?
Historically, the iconic fruit-shaped-logo company was already known for being more concerned with their users’ data privacy than most of its competition, but it went beyond. Just a couple of months ago, Apple announced that its iOS 14 (as well as iPadOS 14 and tvOS 14) will require app developers to clearly ask users for their permission to be tracked!¹⁷ This is a huge movement from Apple’s side and that will definitely make digital advertising way harder for all the players given the considerable market share that the tech giant has in the smartphone space.
Here are some examples of tracking that Apple gives on its website as guidance for developers:
- “Displaying targeted advertisements in your app based on user data collected from apps and websites owned by other companies.
- Sharing device location data or email lists with a data broker.
- Sharing a list of emails, advertising IDs, or other IDs with a third-party advertising network that uses that information to retarget those users in other developers’ apps or to find similar users.
- Placing a third-party SDK in your app that combines user data from your app with user data from other developers’ apps to target advertising or measure advertising efficiency, even if you don’t use the SDK for these purposes. For example, using an analytics SDK that repurposes the data it collects from your app to enable targeted advertising in other developers’ apps.”¹⁷
“Roberto, what do you think?”
First and foremost, no, Facebook is not hearing your conversations. There’s just an incredible amount of data available for advertisers, which enables them to predict things about us with enormous accuracy.
Now, what are my thoughts around all of this? It’s all a big trade-off. I personally like it when Amazon suggests to me just the product I was looking for or when Google News shows me an article comparing the new Fitbit and the new Apple Watch. This is personalized content, which is only possible because these companies have all this data about me. So yes, I love the services that all of this data makes possible.
However, I clearly don’t wanna be influenced by other companies that have maybe created a psychological profile about me and can now influence me to vote for a different candidate or to buy something I don’t really want to.
So what’s the solution? Awareness and empowerment.
First, one of the biggest problems of our society: getting correct information to everyone everywhere. People have the right to know how they can be manipulated by something that seems to be out of their control many times. I believe in data-driven decisions, but you can’t make them if you don’t have the data! Let’s all bring this topic to Zoom calls, family discussions and bar hangouts (when coronavirus allows us to)!
Finally, empowerment here ideally should come from the companies themselves. Apple is setting a beautiful example as we just saw in the previous section, giving its customers the power to decide if their data will be tracked or not. In less ideal circumstances, empowerment comes through regulation: GDPR imposed many problems for companies to adapt, but in the end the overall sensation is that it’s a big win for protecting consumers’ privacy.
Once the average person is educated about the subject and empowered to make decisions about it, that’s the moment where we are getting closer to our beautiful, cuddly tech utopia.
1 Komnenic, Masha. “Privacy Laws Around the World.” Termly, Mar 14, 2019. https://termly.io/resources/infographics/privacy-laws-around-the-world/
2 Confessore, Nicholas. “Cambridge Analytica and Facebook: The Scandal and the Fallout So Far.” The New York Times, Apr 4, 2018. https://www.nytimes.com/2018/04/04/us/politics/cambridge-analytica-scandal-fallout.html
3 Lapowski, Issie. “The 21 (and Counting) Biggest Facebook Scandals of 2018.” Wired, Dec 20, 2018. https://www.wired.com/story/facebook-scandals-2018/
4 Graham, Jefferson. “Is Facebook listening to me? Why those ads appear after you talk about things.” USA Today, Jun 17, 2019. https://www.usatoday.com/story/tech/talkingtech/2019/06/27/does-facebook-listen-to-your-conversations/1478468001/
5 “How Do Cookies Affect Your Cyber Security?” Ophtek, Oct 15, 2019. https://www.ophtek.com/how-do-cookies-affect-your-cyber-security/
6 “HTTP Cookies.” Wikipedia. https://en.wikipedia.org/wiki/HTTP_cookie
7 “All you need to know about Third-Party Cookies.” Cookie Script. https://cookie-script.com/all-you-need-to-know-about-third-party-cookies.html
8 Bandeira de Mello, Roberto. “Why is Alexa EVERYWHERE?” Medium, Aug 12, 2020. https://blog.usejournal.com/why-is-alexa-everywhere-f543d4f521d7
9 “Facebook Platform & the General Data Protection Regulation (GDPR).” Facebook. https://www.facebook.com/business/m/one-sheeters/gdpr-developer-faqs
10 Melendez, Steven and Pasternack, Alex. “Here are the data brokers quietly buying and selling your personal information.” Fast Company, Mar 02, 2019. https://www.fastcompany.com/90310803/here-are-the-data-brokers-quietly-buying-and-selling-your-personal-information
11 Grauer, Yael. “What Are ‘Data Brokers,’ and Why Are They Scooping Up Information About You?” Vice, Mar 27, 2018. https://www.vice.com/en_us/article/bjpx3w/what-are-data-brokers-and-how-to-stop-my-private-data-collection
12 “Acxiom Data.” Acxion. https://www.acxiom.com/customer-data/
13 Harwell, Drew. “Facebook, longtime friend of data brokers, becomes their stiffest competition.” Washington Post, Mar 29, 2018. https://www.washingtonpost.com/news/the-switch/wp/2018/03/29/facebook-longtime-friend-of-data-brokers-becomes-their-stiffest-competition/
14 Howard, Jacqueline. “Americans devote more than 10 hours a day to screen time, and growing.” CNN, Jul 29, 2016. https://www.cnn.com/2016/06/30/health/americans-screen-time-nielsen/index.html
15 “Identifier for Advertisers (IDFA) | Meaning.” Adjust. https://www.adjust.com/glossary/idfa/
16 Kihn, Martin. “How Cross-Device Identity Matching Works (part 2).” Gartner, Sep 20, 2016. https://blogs.gartner.com/martin-kihn/how-cross-device-identity-matching-works-part-2/
17 ”User Privacy and Data Use.” Apple App Store. https://developer.apple.com/app-store/user-privacy-and-data-use/