Facebook dataset combats AI bias by having people self-identify age and gender

Facebook dataset combats AI bias

Facebook dataset recently made a dataset available for download Facebook casual aiwiggersventurebeat that highlights Al bias in computer vision and audio machine learning models related to age, gender, and skin tone. The company asserts that the corpus, called Casual Conversations, is the first of its kind to include paid subjects who freely disclosed their age and gender rather than having this information labelled by outside parties or estimated by models.

The data used to train AI systems may contain biases, which can exacerbate prejudices and have negative effects. Modern image-classifying AI models have been demonstrated to automatically pick up biases regarding race, gender, weight, and other characteristics when trained on the images in the popular dataset ImageNet. Numerous studies have shown that bias can affect facial recognition. Even the AI technologies used to make art have been demonstrated to harbour prejudices, which could lead to erroneous beliefs about historical social, cultural, and political facets and prevent awareness of significant historical events.

From startups to Honeywell By integrating labels of “apparent” skin tone, the collection How Leaders are Driving Predictable Growth with AI Casual Conversations, which includes over 4,100 videos of 3,000 participants, some from the Deep fake Detection Challenge, seeks to address this bias. According to Facebook, the skin tones were evaluated using the Fitzpatrick scale, which was created in 1975 by American physician Thomas B. Fitzpatrick. The Fitzpatrick scale, which ranges from Type I (pale skin that always burns and never tans) to Type VI, can be used to estimate how different skin types react to ultraviolet light (deeply pigmented skin that never burns).

Facebook claims that in order to identify each participant’s skin type for Casual Conversations, it hired trained annotators. In order to quantify how models interact with people of varied skin tones in low light, the annotators also labelled films with ambient lighting conditions.


A U.S. vendor was engaged by Facebook, a spokeswoman for the company, to choose annotators for the project from “a spectrum of backgrounds, race, and genders.” Participants from Atlanta, Houston, Miami, New Orleans, and Richmond were compensated for their participation.

Industry and academic specialists alike are still developing their understanding of fairness and prejudice in AI, according to the research. Casual Conversations can be a crucial first step for the AI research community in the direction of standardizing subgroup assessment and fairness studies, according to a blog post from Facebook. We believe that Casual Conversations would encourage more research in this crucial, developing area.

Facebook dataset ADVERTISEMENT

Facebook’s argument is supported by a growing body of studies showing that prejudice, especially destructive prejudice, can affect computer vision models. In a paper published last October, researchers from the University of Colorado in Boulder showed that while AI from companies like Amazon, Clarifai, Microsoft, and others maintained accuracy rates for cisgender men and women above 95%, it incorrectly classified trans men as women 38% of the time. The Gender Shades project and the National Institute of Standards and Technology (NIST) conducted independent benchmarks of major vendors’ systems and found that facial recognition technology exhibits racial and gender bias. They also found that current facial recognition programmes can be wildly inaccurate, misclassifying people up to 96% of the time.

Beyond facial recognition, darker-skinned people have historically been marginalized by tools like Zoom’s virtual backgrounds and Twitter’s automatic photo-cropping tool. A software engineer noticed that his Black friends were being labelled as “gorillas” by Google Photos’ picture recognition algorithms back in 2015. A thermometer carried by a person with dark complexion was at once automatically categorized as a “gun” by Google’s Cloud Vision API, whereas a thermometer held by a person with light skin was labelled as a “electronic gadget” by the nonprofit AlgorithmWatch.

Many of these inaccuracies, according to experts, were caused by problems with the datasets used to train the models. An average of 3.4% annotation errors were discovered in a recent MIT-led audit of well-known machine learning datasets, including one in which a photo of a Chihuahua was incorrectly categorized as a “feather boa.” It was discovered that an older version of ImageNet, a dataset used to train AI systems globally, had images of naked youngsters, porn actors, college parties, and more that were all collected off the internet without the subjects’ knowledge. Over 2,000 photographs labelled with the N-word and labels like “rape suspect” and “child molester” were among the inappropriate annotations discovered in the 80 Million Tiny Images machine vision corpus, another source of images.

But Casual Conversations is by no means an ideal standard. Facebook claims that it did not gather data on the participants’ country of origin. Additionally, the corporation only offered the options “male,” “female,” and “other” when asked about their gender, leaving out genders like those who identify as nonbinary.

The spokesman added that as of right now, only Facebook teams can access Casual Conversations and that using it for assessment reasons is optional for employees, but they will be encouraged to do so.

Exposes about Facebook’s fairness policies haven’t done much to build trust among AI experts. According to a New York University study released in July 2020, Facebook’s machine learning systems make roughly 300,000 mistakes when it comes to content moderation every day, and problematic posts still manage to get past Facebook’s filters. Members of a Facebook group calling for a nationwide recount of the 2020 U.S. presidential election traded unfounded accusations about alleged election fraud and state vote counts every few seconds. The group, which was started in November and quickly grew to nearly 400,000 members, was created last year.

Facebook dataset, for its part, claims that while it views Casual Conversations as a “positive, bold” first move, it will keep working to build methods that better capture diversity throughout the course of the ensuing year. The spokesman stated, “In the next year or so, we intend to explore avenues to expand this data set to be even more inclusive with representations that cover more activities, geographical areas, and gender identities, as well as a larger variety of ages. Although it’s too soon to comment on potential stakeholder involvement, we’re open to having discussions with researchers, academics, and other stakeholders in the IT sector.