Video: Understanding the Limitations of AI: When Algorithms Fail

00:00:00

hi everybody so I want to tell you a little bit of a story today so when I was a PhD student at Stanford I was very

00:00:12

excited about predicting things from images so I my specialties in computer vision so we were very excited about using publicly available images to predict certain things like demographic

00:00:26

characteristics voting patterns etc etc right so one of the things we were doing is we were saying hey we could predict crime rates right so on the left you can

00:00:37

see that actual crime rates and on the right you can see predicted ones but I as time went on I started seeing some issues with this kind of work right so some of you might might guess what I'm

00:00:50

about to say so here you can see estimated drug use in Oakland right on the Left which is pretty pretty spread out and then on the right you can see

00:01:03

crime reports drug reports of drug use right they're not as spread out as on the left-hand side so when you're doing this kind of prediction from data what

00:01:15

which data are you using you're not using the estimated drug use data you're using the reported drug use right which is heavily heavily biased and in the u.s. you know you'll see that

00:01:26

the places where are that are reported to have this kind of drug use are and overwhelmingly poorer areas and basically like places where most you

00:01:38

know black and brown people live and so statisticians like Christy and Lomb have talked about what's called runway feedback loops so right like law enforcement does this kind of analysis

00:01:51

trying to predict crime hotspots right so they have ground truth and what's their ground truth data that they're training from it's who was arrested for a crime which crime report was reported it's not the same as who committed a

00:02:03

crime and then they use that data to figure out how many what you know neighborhoods have a lot of crime and then they send more police to those neighborhoods and clearly if you have more police in certain neighborhoods you're gonna

00:02:14

arrest more people right and then that kind of seems to corroborate your ground truth and you have this you know feedback loop that just kind of amplifies societal biases right so there

00:02:27

but nevertheless these kinds of predictive algorithms are being used everywhere so last year I signed a letter against something called the extreme betting initiative and the

00:02:39

extreme betting initiative is something that was proposed by ice so immigration Customs Enforcement and what they were proposing was to partner with tech companies to analyze people's social

00:02:51

network activity and determine whether someone would be a good immigrant whether they should be allowed to emigrate or not whether they're more likely to be a good citizen versus not and that's a refugee this really worried

00:03:06

me because there's two questions to ask here one is whether or not we should be doing this kind of stuff in the first place and the second one is whether the tools the automated tools we use to do this kind of analysis are robust enough

00:03:18

right and so for me the second one is the answer to the second question is no right so here is one example where Facebook Translate translated a Palestinians good morning in Arabic to

00:03:32

attack them and this person was arrested he was later let go once people saw what he actually wrote but right our translation tools are not there to be used in this kind of high-stakes

00:03:43

scenarios given that I'm in computer vision I want to talk about face recognition or automated facial analysis tools are also being used in law enforcement right for high-stakes

00:03:56

scenarios so then there's two questions again is the first one is whether we should be you know using these kinds of tools to identify criminals or not and to you know for whatever actually we

00:04:10

don't even know what these tools are actually being used for there's a report called the perpetual lineup report that shows that one and two American adults are actually in some sort of face recognition database and police can use

00:04:23

it law enforcement officials can use it however they want whenever they want and we don't even know what kinds of accuracies they have and there's no standards for transparency right and so there's two questions first is should

00:04:34

these types of databases should they exist and second is how these tools were bust enough to be used in this kind of high-stakes scenario and so my research says no in the second question which is

00:04:47

so we my colleague joy bola meanie and I showed that some automated facial analysis tools so in this case we talked about gender recognition so these tools

00:05:00

look at photo and they try to ascribe your your gender so they want to say you know you're either a male or a female and so there's many issues even with that particular task of gender

00:05:11

recognition first of all they don't even acknowledge non-binary identities and you know I don't even condone gender recognition as a task in itself but anyways to show whether there to see whether they're systematic error rates

00:05:26

or not we look at these tools then they look at your photo and they say you're either a male or a female and as someone's skin type gets is darker and

00:05:38

darker for women you can see that the the classification approach is random chance right so if you flip a coin it's like 50% accuracy so why is this the

00:05:53

case so we have a hunch when we were trying to do this kind of analysis we were looking at data sets that existing that existing data sets for face recognition assistance right and we found them to be overwhelmingly lighter

00:06:05

skinned and overwhelmingly male so we first had to come up with our own data set that was more balanced and by skin type and by gender in order to do our analysis right and so we use what am i

00:06:19

saying why am I saying skin type wear so we use the Fitzpatrick skin type classification system where you can see it goes from lighter to darker right so people have done this kind of analysis by race but what does it mean what does race mean right it's

00:06:33

an unstable social construct across time and space so someone I can be black you know that person can be black so we used something that was more objective and so when you look at the overall accuracy of

00:06:46

these gender classification systems they look really high but then when you break down the accuracy by gender you can start to see gaps so like 78 percent there for face plus plus when you break

00:06:57

them down by skin type you can see that it gets accuracy gets lower for a darker versus lighter skinned people and then when you look at the intersectional accuracy by gender you break it up by

00:07:09

gender and skin type that's when you see the highest disparity in error rates right so look at Microsoft you have 79 percent sixty-five percent accuracy 65

00:07:23

percent for IBM so once this this work came out this paper came out there was a whole bunch of press about it there were people calling for reg of regulation for face recognition or automated facial

00:07:36

analysis tools and IBM and Microsoft both came out with new api's kind of seeming to fix this issue right but what does fixing this issue mean so one thing I want to say what are the lessons here

00:07:49

the first one is that we can't ignore social and structural problems right so we showed that there is some bias and face recognition tools you just fix that and now you make it accurate and then you can have release it that doesn't fix

00:08:02

the problem right you have to understand how is face recognition being used who is it being you who is it benefiting and who is it not benefiting so who are the people who are unfairly targeted by

00:08:16

face recognition tools let's say and who are the people selling you know face recognition tools to law enforcement um who are the people involved in the creation of AI right versus who are the

00:08:30

people benefiting who are the people not benefiting so here is a conference I took this picture no I didn't take this picture but I used this picture all the time I hope they don't if they find out they might not be

00:08:41

happy but this picture is at a conference called I clear it is a machine learning conference right one of the leading machine learning conferences and I don't see anybody who looks like the people who are unfairly targeted by

00:08:55

certain automated facial analysis tools or AI tools in this picture right so we cannot ignore social and structural problems this is really important

00:09:09

because one example I want to give you is that this person Deborah Raji came up with a follow-up paper that showed that Amazon's recognition has similar biases to the ones I just talked to you about

00:09:23

gender biases and this really scared Amazon right like they were Roe blog post after a blog post kind of trying to discredit the study and then there was a letter that we had signed by a whole bunch of computer vision researchers

00:09:36

calling on them to stop selling this software to law enforcement right but Deb the first author of this paper that showed bias and their products she

00:09:49

almost dropped out of her studies she didn't want to be in tech anymore she felt too isolated she felt too discriminated against she didn't know what she wanted to leave until she came to black nag which is an organization I

00:10:04

co-founded with my colleague Reddy's above and many other people and why do I spend so much time on black in AI because people like Deb would have

00:10:16

dropped out of this field if it weren't for black an ad right so even as a researcher what I want to say is we cannot ignore our social and structural problems the second thing I want to say is that there are currently no laws or

00:10:29

standards that govern how to use certain kinds of products machine learning products or AI products - and for what purpose right so there are no there's

00:10:41

there's no restrictions so we don't know if like these algorithms that are being used by law enforcement are breaking certain laws we don't know if algorithms that are being used for hiring our breaking Equal Employment Opportunity

00:10:55

and so I advocate for Sanders in documentation and so other industries have been there you know my prior background was was in electronics and in electronics every single component

00:11:09

something as simple as a resistor - as complicated as as a CPU comes with something called a datasheet right so a dinner sheet all tells you non-ideal I'm sure a lot of people know about data

00:11:22

sheets a data sheet tells you non idealities of a particular component right so I think we should have data sheets for data sets and pre-trained models right so if you have gender

00:11:34

classification systems what can I use this for what should I not use this for right maybe I shouldn't use it for high-stakes scenarios what kind of data set wasn't trained on did you train it on people from 20 to 25 ages from 2025

00:11:49

in a certain location or did you train it on a wider distribution so I've been spending a lot of time this last year thinking about what kinds of standardization we should have and so if

00:12:02

you are interested in this concept you can read these two papers data sheets for data sets and model cards for a model reporting so one talks about what this looks like for data sets and one

00:12:15

talks about what this looks like for models and you know again like we're not the first ones to be here so it's really interesting to look at parallels and other industries sometimes so if you look at cars right when cars first came

00:12:28

on the road there were no stop size there were no like driver's licenses there were no seatbelts right and it took them a very long time to even start to legislate seatbelts I believe they didn't legislated until 1967 and even

00:12:42

after they legislated it they had to have campaigns for people to wear seatbelts right and then if you look at clinical trials similar thing right there were a lot of unethical things that were done experimenting on

00:12:55

vulnerable populations and also in clinical trials they were not they were not mandated to have like they didn't have to have women for example in clinical trails right and so you find that seven

00:13:10

out of ten I believe eight out of ten drugs that we're pulled out of the market unfairly I'm not unfairly targeted women because that's kind of always what I think about unfair targeting of populations but

00:13:23

eight out of the ten drugs that were pulled out of the market had negative consequences for women right because things were not being tested on women so it took many years to have standards and

00:13:36

many other industries and we should learn from these other industries so one thing I want to say is that when we talk about societal biases there's many places where these biases enter when we

00:13:49

formulate what problems are important when we collect training data when we architect our models when we analyze how our models are used right but for some reason well I know for why in the research community at least there's a

00:14:02

huge focus on that one part which is architecting horror models and loss functions are used even though there's all of these things to think about and we know from surveys that people have

00:14:15

done that a lot of people industry are looking for guidelines on the data side right what to what to do on the data side I'm not sure if you guys agree with this because there's a lot of ethical

00:14:29

concerns around data but it's not as glamorous I guess as working on kind of model architectures so I think a lot of us at least in my field should be more focused

00:14:40

on the data side and so one of one thing I want to conclude by is saying that we want to end this concept of parachute science so a lot of times when people might feel talking about fairness ethics

00:14:53

etc sometimes we use the pain of certain marginalized communities in our research and we get famous we give talks like me we can you know we

00:15:05

have academic careers we're getting paid and we're not really alleviating the concerns of that marginalized group we're not making space for people in that marginalized group and lifting their voices right and I and I and this

00:15:18

has happened in many other fields and I really that our field kind of Nai moves away from that and towards kind of collaborating with people marginalized communities and giving them a voice and

00:15:29

lifting their voice thank you [Applause]