Platforms vs. PhDs: How tech giants court and crush the people who study them from NYU and Facebook

NYU and Facebook nyu ad observerlapowskyprotocol

Platforms vs. PhDs, The increasingly tense relationship between tech businesses and academia is revealed by a legal dispute between researchers from NYU and Facebook.
I realised I had a small role to play in the panic that Laura Edelson was experiencing.

During one of many panicked phone calls this week, Edelson said to me, “If this is real, this is my nightmare.”

I had gotten in touch with Edelson, a Ph.D. candidate at the Tandon School of Engineering at New York University, to question her about something NYU and Facebook had told me regarding the reasons behind the company’s recent serving of Edelson and her colleagues with a widely reported cease-and-desist notice. Ad Observer, a browser extension that gathers information on the Facebook users that political marketers are focusing on, was co-created by Edelson. Facebook informed me that one of the reasons it was asking Edelson to close down Ad Observer was that it had broken its own rules by collecting data from users who had never given their approval to do so. Facebook said that the Ad Observer team was also making the data available for download by anyone.

Edelson was surprised because she had worked hard as a cybersecurity researcher to keep sensitive data out of her data set. To her knowledge, the only data Ad Observer gathered and published came from users who willingly provided both the substance of the political ads they saw on Facebook and the reasons they had been targeted with those ads after installing the browser extension.

Even though she fiercely refuted Facebook’s allegations, Edelson’s palms trembled at the thought that someone else’s sensitive information might be hiding in Edelson’s data. She immediately disabled everyone else’s access to download the data while she and her team completed a last-minute privacy audit.

However, Edelson’s fear turned out to be unfounded. NYU and Facebook wasn’t referring to private user accounts when it claimed that Ad Observer was gathering information from users without their consent. It was referring to the accounts of advertisers, which included the names, profile photographs, and contents of publicly accessible Pages that run political advertisements.

Edelson laughed and sighed deeply when I told her this knowledge. Of course she was gathering information from advertising. The essence of Ad Observer is around that data, which is already available to the public on Facebook. The fundamental tenet of this is that the public needs to know who is attempting to sway their vote and the political discourse, she said. That is essentially the project’s goal.

However, Facebook views advertisers as users, and significant ones at that. It is still against Facebook policy to scrape advertiser data, even data that Facebook makes available, and publish it without the advertiser’s permission. After all, only Facebook has the authority to create those regulations.

The fundamental tenet of this is that the public needs to know who is attempting to sway their vote and the political discourse. That’s sort of the project’s purpose.

One of the most extreme examples of the increasingly tense relationship between platforms and the people who examine them is Facebook’s crackdown on Ad Observer, which is still ongoing. Internet platforms have reached out to the research community in recent years as Silicon Valley has come under increasing scrutiny. As a result, formerly inaccessible data sets are now available for academics to employ in their studies on the social effects of tech platforms. One company, Twitter, just introduced a free API that enables pre-approved academics to access the complete archive of tweets. In the meantime, Facebook is collaborating with a group of over a dozen scholars to examine the influence of the platform on the 2020 election and has made a vast amount of Facebook data accessible to researchers through its Facebook Open Research and Transparency project.

But as this work advances, tech corporations are also taking action against academics whose practises violate their policies. Researchers have used APIs and social analytics tools, dummy accounts, and scraped data to determine, for example, whether online housing ads discriminate against Black people or whether fake news receives more online engagement than real news as topics like online disinformation, ad targeting, and algorithmic bias have become important areas of study. These techniques frequently contravene the carefully constructed terms of service that restrict, among other things, data scraping, data sharing, and fraudulent accounts. Therefore, digital companies have occasionally used those terms of service as a cudgel to shut down even well-intentioned research projects in order to safeguard their users’ privacy — and their own reputations.

One of them is called Ad Observer. Just weeks before the presidential election, in October, as Facebook was inundated with political advertisements, Edelson and her collaborator, NYU and Facebook associate professor Damon McCoy, were given the order to not only shut down the browser extension by Nov. 30, but also to delete all the data they had gathered, or else they would face “additional enforcement action.” Even now, months later, there is still no agreement between the NYU team and Facebook.

“Not what NYU was attempting to do was the problem.

It was how they were attempting to do it “Steve Satterfield, director of privacy and public affairs at Facebook, said People who use the Ad Observer browser extension grant the NYU and Facebook researchers access to anything they can see from their browser, according to Satterfield. Any scraping of the information immediately contravenes Facebook’s policies. We’re open to collaboration, but there are some things on which we won’t budge, and this was one of them, according to Satterfield.

There are philosophical discussions around the interactions between scholars and Big Tech. Some are permitted. In fact, the Supreme Court is presently considering a case that experts worry could render research practises that contravene terms of service laws illegal. Though they all take place at a time when the world arguably needs such research the most, they are all a part of the delicate dance between researchers and social media platforms.

A game of cat and mouse

Edelson and McCoy had already been entangled in Facebook’s battle against data scraping.

The two researchers created a method to scrape the archive shortly after NYU and Facebook introduced its library of political ads, now known as Ad Library, in 2018. This allowed them to more quickly analyze the data. It was initially difficult for scholars and journalists to evaluate the data collection if they didn’t know what they were looking for because searching the Ad Library required performing keyword searches.

Although it appeared to have a wealth of delicious information, McCoy recalled that it appeared to have been hurriedly put together and was not too helpful.

McCoy hired Edelson, who had previously worked at Palantir before starting her Ph.D. programme, to construct a scraper that would offer them a bird’s-eye perspective of the political ad environment on Facebook. Before the 2018 midterm elections, the scraper uncovered a wealth of significant data regarding political advertising, most notably that the former president Trump was spending more on political advertisements than any other advertiser. In a piece published in July 2018 by The New York Times, Facebook’s former director of product management, Rob Leathern, was quoted as claiming that the NYU study was “exactly how we thought the tool would be utilised.”

Facebook still broke the tool shortly after the article was published.

The Times article appeared only four months after the Cambridge Analytica crisis, which involved Facebook’s privacy issues and was caused by a single Cambridge University professor who created a programme to collect the data of unwary Facebook users. Soon after starting its anti-scraping initiatives, Facebook made technical adjustments that effectively stopped Edelson and McCoy’s exploit. According to Facebook, these anti-scraping initiatives, which have been in place since 2018, haven’t focused on any particular tool. However, McCoy and Edelson experienced the same effects. It turns into a game of cat and mouse, McCoy added. “We circumvent it, and they build other hurdles.”

The General Data Protection Regulation, which introduced new restrictions on data sharing and new consent criteria for data gathering, entered into effect in Europe just two months before the Times piece.

Tech businesses now had a number of legal justifications for withholding data from researchers who were generating negative results nevertheless. According to Nu Wexler, who worked in policy communications at Facebook during the Cambridge Analytica incident and afterwards worked at Google, “it was a bit of a PR decision, but also a policy decision.”

At least that conflict with Facebook ended quickly. NYU and Facebook team to be early testers of its political ad archive API shortly after breaking NYU’s scraper. Using the API, Edelson and McCoy started researching the dissemination of false information and misinformation through political advertisements. They rapidly discovered that the dataset was seriously lacking in one crucial piece of information: information on the audiences that the ads were intended for. For instance, the Trump team released a commercial last year that depicted a post-Biden dystopia in which the world is on fire and no one answers 911 calls because of “defunding of the police department.” Edelson discovered that this advertisement had been expressly intended for married suburban women. According to Edelson, that context is important for comprehending the advertisement.

Facebook, however, was reluctant to make targeting information public. That might make it too simple to reverse-engineer someone’s interests and other personal information, claims Satterfield. Assuming someone likes or comments on an advertisement, for example, it wouldn’t be difficult to check the targeting information on that advertisement, if it were made public, and determine that the person in question satisfies the targeting requirements. “You might be able to learn things about the people who engaged with the advertisement if you mix those two data sets,” Satterfield said.

This was one of the areas where we wouldn’t consider making concessions.

Because of this, Facebook only makes ad targeting information available to users after they have personally seen an advertisement. Edelson and McCoy created Ad Observer as a workaround in light of this. Installing the browser extension allowed users to browse Facebook while willingly providing their own ad targeting information. In May 2020, McCoy and Edelson launched a programme that was designed to remove data that potentially include personally identifiable information in response to privacy concerns. Ad Observer eventually had more than 15,000 installations.

The idea that Facebook users might unintentionally hand up other people’s data to outside researchers immediately raised red flags within Facebook. Facebook has already filed lawsuits against businesses that have been detected data-scraping. They contend that how they handled the NYU squad was merely a continuation of that effort. According to Facebook’s Satterfield, “That activity, which was encouraging users to install extensions that scrape data, was what led to the action we took.”

A worldwide ricochet

The cease-and-desist letter from Facebook was intended for the NYU team, but it spread throughout the entire world of research. Australian researcher Mark Ledwich, who monitors YouTube, claimed that the conflict has already dampened interest in his and other transparency initiatives.

Ledwich is one of the co-founders of transparency. Tube, a website that analyses the most popular English-language YouTube channels and groups them according to factors like their political leanings or if they disseminate conspiracies. Additionally, he is a co-founder of Pendulum, a brand-new firm in internet forensics. Initially, YouTube’s API—which has limitations on data storage and the number of API calls a developer is permitted to make each day—was used to power transparency. Tube. Ledwich claimed that adhering to such conditions would have rendered his task impractical.

There is no way to do research like ours without violating their terms of service, according to Ledwich. Ledwich started utilising numerous accounts to circumvent those conditions in order to collect data, but after telling YouTube about it, he was completely barred from the API.

Ledwich started collecting publicly accessible YouTube data instead. He claims the technique has scared off some potential research collaborators, but it has provided him new insights into what’s happening on the platform. For example, the Institute for Data, Democracy and Politics at George Washington University recently declined Ledwich’s request for financing and referenced NYU’s problems with Facebook as justification. The justification, according to Ledwich, was that they were concerned about the risk following Facebook’s legal threat against the NYU Ad Observatory.

Rebekah Tromble, the Institute’s director, stated that in her opinion, researchers should be allowed legal protection to scrape data provided they do it in a manner that respects users’ privacy, as she believes the NYU researchers do. She has also pushed for legislation in the U.S. and Europe that would establish guidelines for this kind of data sharing. “However, our commitments to the institution and our own funders mean we have to proceed with caution when it comes to offering financial support for constructing tools based on scraping,” Tromble added. “I’m thrilled to promote transparency. tube and other data-scraping initiatives, but at this time our institute is unable to support them.”

There is no way to carry out our research without violating their terms of service.

YouTube content is predominantly video-based, unlike Facebook and Twitter, which makes it more challenging for researchers to evaluate in mass. None of the researchers or former Google employees who were contacted by Protocol for this story were aware of any transparency projects that YouTube had developed for outside researchers, and YouTube was also unable to provide any specific research studies. YouTube often collaborates with scholars from around the globe on a variety of important topics, according to spokesperson Elena Hernandez.

While the search team does not currently have any research projects to share, Google spokesperson Lara Levin noted that “search is quite different from social networks and feed/recommendation-based products, in that it is responding to user queries and it is an open platform that researchers [and] media can (and do) study.”

Due to the limitations of the API, researchers that scrape publicly accessible data are responsible for some of the most enlightening transparency efforts involving YouTube. Guillaume Chaslot, the creator of algotransparency.org and a former employee of Google, developed his application to track YouTube suggestions in this manner.

YouTube forbids scraping, with the exception of search engines or situations in which it has granted prior written consent. But according to Chaslot, YouTube hasn’t yet made an effort to stop him. “It would be quite unprofessional of them to cut me off. They appear to be really reluctant to have anyone witness what they are doing “explained he. The business has even invited Chaslot to visit its offices to talk about how it’s resolving some of the problems he’s raised.

However, that does not imply that YouTube has given his work its complete support. YouTube attempted to discredit Chaslot’s methods when he originally shared his story with The Guardian in 2018 and presented findings that, among other things, revealed that YouTube recommendations had steered viewers toward pro-Trump content prior to the 2016 election. A spokeswoman at the time told The Guardian that the sample of 8,000 films that they examined “does not offer a true picture of what videos were recommended on YouTube over a year ago in the run-up to the US presidential election.”

They claim that because I only gather a small portion of the data, I don’t have a complete picture. Similar justifications from YouTube for why the firm believes AlgoTransparency’s findings are false were supplied to Protocol.

NYU and Facebook YouTube is one of the least researched social media channels as a result of all of this. In a recent review of papers presented at the annual meeting of the International Communication Association last year, it was discovered that only 28 papers, compared to 41 for Facebook, contained the names Google or YouTube in them. Twitter outperformed them both, showing up in 51 studies. Researchers found that whereas YouTube data only showed in slightly less than 9% of the articles they examined, Twitter data appeared in more than half of the studies they looked at.

That’s evidence of Twitter’s innately open nature and of its deliberate efforts to collaborate with scholars. However, even that study did not take into account all of Twitter’s activities. Up until recently, the complete history of tweets was only accessible to corporate developers who paid for access to the API. Most researchers only used a portion of what the free version supplied. Twitter said earlier this year that it was modifying that and establishing a free API with restricted access to the whole archive for researchers.

According to Adam Tornes, staff product manager at Twitter, “access to Twitter data gives researchers with useful insight into what’s occurring around the world, but their triumphs have mostly been in spite of Twitter, not because of us.” “We have overlooked their particular circumstances, as well as their various needs and capacities, for far too long. We have also fallen short of recognizing and embracing the importance and influence of scholarly work utilising Twitter data.”

NYU and Facebook For researchers, there is a trade-off even in this upgrade. Since Twitter has capped the number of tweets researchers can gather at 10 million per month, they will now have access to more historical data on the platform. Josh Tucker, a second NYU professor and the lab’s co-director, stated, “We have all these initiatives ongoing that are premised on collecting a lot more data than we’re going to be permitted to collect.” “Someone at Twitter may, without our knowledge or consent, change the course of our research agenda going forward.”

Twitter stated that it will offer extra levels of access for researchers without that data cap and that it is willing to change the product to meet their needs.

NYU and Facebook According to Casey Fiesler, an assistant professor at the University of Colorado Boulder who specialises in tech research ethics, making such much historical data available could potentially lead to additional complex ethical problems. Fiesler discovered that the majority of respondents to a recent poll of Twitter users had no awareness that their tweets were being examined. No matter what the purpose of the experimentation, many individuals fundamentally disagree with it, according to Fiesler.

Without our involvement, someone at Twitter might change the course of our research plan in the future.
For this reason, among others, Fiesler can see why Facebook could be concerned about the findings of the NYU researchers. Since the Federal Trade Commission found that Facebook had violated an earlier consent order by failing to secure user data, the Cambridge Analytica scandal ultimately cost the firm $5 billion in fines. “I don’t envy Facebook’s situation in this case. Take a look at the numerous privacy scandals they’ve handled “She spoke.

Fiesler acknowledges the motives of the NYU researchers and believes that other parties should be permitted to grade their assignments in addition to tech corporations. The core fact is that this is challenging, said Fiesler. “These are value conflicts, and there may be no winning in some circumstances.”

Scraping is not illegal.

Facebook has recently taken major steps toward transparency in response to all of the researcher concerns. Following nearly two years of difficult discussions with a different group of academics, Facebook last year made a collection of 38 million URLs available to pre-approved scholars. Later in the year, the business disclosed that it was collaborating with 17 researchers to examine Facebook’s effects on political polarisation, voter turnout, democratic confidence, and the dissemination of false information during the 2020 election.

In January, Facebook said that it would be disclosing targeting information on 1.3 million political ads that were broadcast prior to the 2020 presidential election. The drawback is that researchers are unable to export the raw data because the data is shared in a secure environment. Satterfield stated, “What we’re attempting to do is limit the chances of the information being misused.”

Facebook offered the NYU team access to this targeting data over the course of the negotiations, but Edelson claimed the privacy constraints would make the kind of study she and McCoy do unfeasible. “Data is collected, then joined with other data sets. We develop models. In this system, none of that would be possible “She spoke.

However, Edelson applauds Facebook’s efforts to make more data accessible to both academics and the general public through programmes like the Ad Library and CrowdTangle, a social media analytics programme owned by Facebook that mainly attracts criticism for demonstrating the prevalence of far-right propaganda in the United States. CrowdTangle data has enabled Edelson and a number of colleagues to discover that before and soon after the 2020 election, far-right misinformation on Facebook received greater engagement than other sorts of political news.

“Give credit where it’s due. More data than anyone else has been made available on Facebook, “Edelson declared. She also thinks there are valid reasons why some research projects must be conducted in secure settings. “There is a role for cooperation between researchers and platforms that involve private data that must be kept fairly secret. I believe it’s fantastic that such alliances are forming “She spoke.

As a primary researcher on Facebook’s 2020 election study, Tucker, the NYU politics professor who expressed worry about his lab’s Twitter research projects, is also involved in the study. Tucker wrote extensively about the trade-offs researchers must make when deciding whether to work with a platform or go it alone before joining. According to him, he entered the Facebook deal with “eyes wide open.”

According to Tucker, conducting research independently “has the benefit of total independence, but it puts you to all sorts of restrictions to access data, as well as the arbitrary character of the platform.”

But collaborating closely with a business like Facebook, as he is doing now, has drawbacks as well. One: You have to persuade the businesses to work with you, he stated. When working with individuals who are paid employees of the organisation, how do you protect the integrity of the research, then?

When negotiating the project’s terms, Tucker and his co-leader Talia Stroud of the University of Texas at Austin insisted that the external researchers have complete control over any papers that emerge from the project. Additionally, they pre-registered their plans, effectively informing the public of everything they would inevitably share before they collected any data. Only Facebook personnel would have access to the raw data, and Facebook would have the right to evaluate files to make sure they don’t break any laws. These were the conditions Facebook brought to the table. None of the researchers would get compensation from Facebook; instead, Facebook would pay for the cost of data collecting and the time of NYU and Facebook nyu ad observerlapowskyprotocol personnel.

When working with individuals who are paid employees of the company, how do you preserve the integrity of the research?
Tucker claimed that so far, the project “has not gone without hiccups.” Particularly, Facebook doesn’t want to disclose any of the underlying data it collects, whereas Tucker’s team does. Tucker stated that the discussions are still in progress.

Despite such disagreements, he nonetheless thinks that initiatives like this one that involve collaboration are essential and hopes that it will serve as an example for other businesses. With these significant adjustments by these platforms in the middle of the process, Tucker remarked that this election is likely the most significant of the post-war era. “We have a research that evaluates the impact of it.” At the very least, he contends, that is a significant contribution.

NYU and Facebook and Edelson concur that this does not, however, do away with the need for research that takes place outside of Facebook’s guidelines. Tech businesses’ self-imposed transparency initiatives have raised concerns about the extent of data they are actually sharing and their inability to come up with innovative, secure sharing methods. The truth is that tech behemoths have no incentive to guide academics as they identify the corporations’ errors. There aren’t many scholars who will publish overwhelmingly uplifting studies about social media and data at the moment, according to Wexler.

The borders that researchers can and cannot breach may not be under the long-term control of tech businesses. Courts and other agencies are increasingly providing the answers to those queries. A federal court determined in the spring that researchers who use fictitious job profiles to research algorithmic discrimination in violation of the terms of service of businesses wouldn’t be breaking the Computer Fraud and Abuse Act, which obliquely prohibits unauthorized access to computer systems. Another case involving the same issues of what constitutes unlawful access is Van Buren v. United States, which the Supreme Court will decide on this summer. Researchers are concerned that the court’s ruling in that case could significantly expand the application of the CFAA and have far-reaching effects on individuals like Edelson.

NYU and Facebook Edelson was virtually present on the day the court heard oral arguments in Van Buren last fall while sporting a T-shirt that read, “Scraping is not a crime.”

Regulators in Europe are also working to establish a compromise with GDPR that would permit some data exchange with researchers. Pre-screened academic researchers could access data from “very large online platforms” under the terms of Article 31 of the Digital Services Act, which the European Commission proposed last December, provided that access does not compromise trade secrets or cause material security vulnerabilities.

“We do admit that some of the data that researchers or regulatory agencies are interested in could have a substantial impact on privacy. It may reveal a person’s political or sexual preferences, “Mathias Vermeulen, public policy head of the U.K. data rights organisation AWO, who has pushed for increased data sharing with researchers, stated. “I believe that these issues can be addressed in large part by getting together and outlining some of the respective responsibilities.”

The talks between Facebook and NYU are currently at a standstill. Facebook refused to say whether it intended to file a lawsuit or take other legal action against Edelson and McCoy. The entire scientific community will be watching to see what happens next if it does. If Facebook prevails in that lawsuit, Ledwich predicted that it will put an end to a lot of institutional financing for illegal social media research, including YouTube.

Edelson said she will continue working no matter what happens next. In actuality, it’s growing. She and McCoy recently renamed their research team Cybersecurity for Democracy from the Online Political Transparency Project in an effort to enlarge its purview. They have continued to operate Ad Observer under that guise and, in the past month, have started using it to gather data from YouTube as well. “As a security researcher, it is my responsibility to check the security of online systems. We are aware that the ad distribution networks are having issues. They are social engineering attack vectors, hence it is my responsibility to research them “She spoke. I won’t quit doing that, I promise.