Online Safety Bill Committee: Dean Asks How Facebook Uses and Accesses Our Data

Wednesday, 20 October, 2021

Westminster News

Q102 Dean Russell MP: I would quite like to explore a little more the data input part and how sophisticated that is. As with any system, with bad data in you get bad data out, or in this instance bad recommendations. I was just interested to understand where all those data sources are coming from for the different platforms, because we often talk about Facebook, YouTube and these channels as very individual, siloed systems, but, as we know, they talk to each other in various ways; you have browsers talking through cookies and all these different things happening. I would just be interested to know from you whether we would be shocked at the scale of the data that is collected about us and used to go into these algorithms to make these recommendations. What sort of sources are out there that are being used currently? One of the conspiracy theories you often see online is that Alexa might be hearing you and listening to conversations. The Chair mentioned earlier Bluetooth proximity to other people. There are all these things. Would you mind explaining your views on that data input piece, please?

Laura Edelson: I know there are absolutely conspiracy theories about Alexa. Those do not hold a lot of water. As always, the truth is more banal. When you use a platform such as Facebook, TikTok or YouTube, as you use that platform you are giving the platform a tremendous amount of data about what you interact and engage with and how you do it. Every move you make on a platform is monitored, because they are trying to optimise for your engagement. They want you to be as engaged as possible, for as long as possible, as often as possible. That is what the platforms we are discussing are built to do. You give them so much information about how you do it. How fast you scroll and how long you watch videos can be monitored. When you are on YouTube, which video are you likely to go to? That will tell them where your eye tracks to. They 19 do not care about where your eyes track to; they just want to get you to watch as many videos as possible for as long as possible. When they show you an ad, they want to know which ads you will click on. As you interact with the platform, that data is tremendously powerful because they are always getting more data. As they make little tweaks, they see how you respond to those little tweaks, and then they can tweak further. This is used in a lot of different ways. This is used to choose which ads they will show you, which version of an ad they will show you, what content they will promote and what groups they will promote. Damian asked earlier about the effects of proximity. That is a piece of it, but the proximity of your social graph and the edge effects are also tremendously powerful. This can mean that if you are in a social group that has been exposed to extremism, it will spread through that whole group because of those edge effects and because of the interactions of other people in your group. Even if they do not like it, if they interact with it and enough people interact with it, you will be shown it as well.

Dean Russell MP: Is that just from that platform, or are they collecting similar data from other platforms? If I am on Facebook, it obviously will know everything that I have done on Facebook and social graphs of friends and connections, but is it also pulling that data, or different types of data, from other places as well?

Laura Edelson: To my knowledge, no. They have really strong business incentives to protect their data from other platforms, because it is the core of their business. They want to know as much as possible about their users and how their users interact, and they do not want their competitors to know that.

Dean Russell MP: Is that vice versa then? Will other platforms and browsers and so on not be picked up in Facebook data? If I only used Facebook and nothing else, then of course they would only have my data from Facebook, but if I use Facebook and lots of other websites, browsers and channels, are they factoring any of that into it, or is it always just based, in the Facebook example, just on what Facebook has from me on its platform?

Laura Edelson: It is not solely based on Facebook, because you will interact with those platforms together. It is common that people post YouTube links to Facebook. When you do that, Facebook knows about what you are doing on YouTube. That heavily factors into what else they will promote to you on Facebook. Users themselves leak data cross-platform all the time.

Dean Russell MP: That is what I was trying to understand. That is really helpful. I have a quick point on your narrative just then. I must admit that the way you described the way that platforms optimise and optimise sounded a little like the way a drug dealer might make their drugs better to make a better high, constantly refining to give that saccharine effect or that high in a quicker and more impactful way, to get people more and 20 more hooked. Would that be a fair analogy, from your experience?

Laura Edelson: Yes. As Guillaume put it earlier, these are just optimisation functions. They have specific things they are trying to maximise, which usually comes down to engagement, session length and number of sessions per week. They will continually optimise for those things. Dean Russell MP: In a way, for young people and children, I will not go as far as saying it is rewiring their brains, but there will be a vice-versa effect there, will there not? Laura Edelson: I am not a psychologist. I have not done that kind of research, but that is certainly a reasonable conclusion to draw.

Dean Russell MP: Renée, I have the same question for you. I was interested in your views on the interconnectivity of data and how the platforms are using the data inputs and the sophistication of that, from your experience.

Renée DiResta: Laura described it as I would have also. You mentioned browsing behaviour and open web. Certain websites use particular comment plug-ins, perhaps, from Facebook and other things. There is some visibility into what you are doing elsewhere on the web if that website has chosen to use particular facets that connect with Facebook in some way. Otherwise, it is this sharing across platforms of people posting a YouTube or Rumble link, or another platform link, from one place to another.

Dean Russell MP: I know that nowadays we have far more privacy options. I know that Apple is very much trying to lead on this, with the idea that you know what data is being shared from apps, for example. Does there need to be better protection for children on this? I would imagine most children do not know that they are sharing all this data online. One of the aspects of this Bill is protecting children, and one of those aspects of course is the recommendations, but we can limit those recommendations by limiting the data that platforms are able to use about children. Would that be a fair thing to look at?

Renée DiResta: I believe so. I do not work on harm related to children, so I do not have very much knowledge of the psychological research there, but it seems like the sort of thing that you would be able to do.

Dean Russell MP: Mr Chaslot, on the same theme of the data sources, especially given that you mentioned experience with YouTube and so on earlier, is the data collected from people watching videos, for example, based solely on the length of time they watch them, or does the content from those videos, or even from images, get taken into account? If you watch a video for 10 minutes and the title is harmless—it might be about having fun on the beach or something—but actually the content shown and the language used is harmful, would that also be seen as a data source that could be used, or is it based more on the length of time, the 21 type of video and so on that has been put in by the creator?

Guillaume Chaslot: A lot of data sources are put into the algorithm, but eventually you will usually have one data source that is more powerful because it contains more powerful information. That is the amount of time that you watch, in the case of YouTube and TikTok. One reason why TikTok became so viral and so addictive with kids is that videos are shorter. The algorithm has more data on children than on YouTube, because the duration of a video is much shorter on TikTok. That is why TikTok is learning extremely fast from children. I am really worried about the amount of information that the algorithm on TikTok can get from these watch times. It can be used for what type of colour, music or body shape they like. It is very creepy when you think of everything that can be used from these interactions, and it is creepy that they can be kept for so long.

Dean Russell MP: We mentioned earlier platforms not using reports. It almost feels like, “See no evil, hear no evil, report no evil and therefore assume that it doesn’t happen”. If you take TikTok as a platform, you hear numerous times, on a weekly basis, about a new dare that is going on, with kids in schools being dared to do something; we have seen awful situations with the kids swallowing capsules for washing machines and so on. Do you think the take-up of those is directly related to the content that is being gleaned? If somebody has perhaps done a previous dare and that is in the data, and they have shared that content to show that they have done that, say on TikTok, are they more likely to be pushed to do the next big thing that is coming through—the next dare, the next activity— that could be dangerous for their health?

Guillaume Chaslot: Yes. There can be such a rabbit hole, with the kid starting to watch something, leading to them falling into more and more extreme dares.

Online Safety Bill Committee: Dean Asks How Facebook Uses and Accesses Our Data

You may also be interested in

Thank you