Auto Scroll
Select text to annotate, Click play in YouTube to begin
hi everybody so I want to tell you a little bit of a story today so when I was a PhD student at Stanford I was very
excited about predicting things from images so I my specialties in computer vision so we were very excited about using publicly available images to predict certain things like demographic
characteristics voting patterns etc etc right so one of the things we were doing is we were saying hey we could predict crime rates right so on the left you can
see that actual crime rates and on the right you can see predicted ones but I as time went on I started seeing some issues with this kind of work right so some of you might might guess what I'm
about to say so here you can see estimated drug use in Oakland right on the Left which is pretty pretty spread out and then on the right you can see
crime reports drug reports of drug use right they're not as spread out as on the left-hand side so when you're doing this kind of prediction from data what
which data are you using you're not using the estimated drug use data you're using the reported drug use right which is heavily heavily biased and in the u.s. you know you'll see that
the places where are that are reported to have this kind of drug use are and overwhelmingly poorer areas and basically like places where most you
know black and brown people live and so statisticians like Christy and Lomb have talked about what's called runway feedback loops so right like law enforcement does this kind of analysis
trying to predict crime hotspots right so they have ground truth and what's their ground truth data that they're training from it's who was arrested for a crime which crime report was reported it's not the same as who committed a
crime and then they use that data to figure out how many what you know neighborhoods have a lot of crime and then they send more police to those neighborhoods and clearly if you have more police in certain neighborhoods you're gonna
arrest more people right and then that kind of seems to corroborate your ground truth and you have this you know feedback loop that just kind of amplifies societal biases right so there
but nevertheless these kinds of predictive algorithms are being used everywhere so last year I signed a letter against something called the extreme betting initiative and the
extreme betting initiative is something that was proposed by ice so immigration Customs Enforcement and what they were proposing was to partner with tech companies to analyze people's social
network activity and determine whether someone would be a good immigrant whether they should be allowed to emigrate or not whether they're more likely to be a good citizen versus not and that's a refugee this really worried
me because there's two questions to ask here one is whether or not we should be doing this kind of stuff in the first place and the second one is whether the tools the automated tools we use to do this kind of analysis are robust enough
right and so for me the second one is the answer to the second question is no right so here is one example where Facebook Translate translated a Palestinians good morning in Arabic to
attack them and this person was arrested he was later let go once people saw what he actually wrote but right our translation tools are not there to be used in this kind of high-stakes
scenarios given that I'm in computer vision I want to talk about face recognition or automated facial analysis tools are also being used in law enforcement right for high-stakes
scenarios so then there's two questions again is the first one is whether we should be you know using these kinds of tools to identify criminals or not and to you know for whatever actually we
don't even know what these tools are actually being used for there's a report called the perpetual lineup report that shows that one and two American adults are actually in some sort of face recognition database and police can use
it law enforcement officials can use it however they want whenever they want and we don't even know what kinds of accuracies they have and there's no standards for transparency right and so there's two questions first is should
these types of databases should they exist and second is how these tools were bust enough to be used in this kind of high-stakes scenario and so my research says no in the second question which is
so we my colleague joy bola meanie and I showed that some automated facial analysis tools so in this case we talked about gender recognition so these tools
look at photo and they try to ascribe your your gender so they want to say you know you're either a male or a female and so there's many issues even with that particular task of gender
recognition first of all they don't even acknowledge non-binary identities and you know I don't even condone gender recognition as a task in itself but anyways to show whether there to see whether they're systematic error rates
or not we look at these tools then they look at your photo and they say you're either a male or a female and as someone's skin type gets is darker and
darker for women you can see that the the classification approach is random chance right so if you flip a coin it's like 50% accuracy so why is this the
case so we have a hunch when we were trying to do this kind of analysis we were looking at data sets that existing that existing data sets for face recognition assistance right and we found them to be overwhelmingly lighter
skinned and overwhelmingly male so we first had to come up with our own data set that was more balanced and by skin type and by gender in order to do our analysis right and so we use what am i
saying why am I saying skin type wear so we use the Fitzpatrick skin type classification system where you can see it goes from lighter to darker right so people have done this kind of analysis by race but what does it mean what does race mean right it's
an unstable social construct across time and space so someone I can be black you know that person can be black so we used something that was more objective and so when you look at the overall accuracy of
these gender classification systems they look really high but then when you break down the accuracy by gender you can start to see gaps so like 78 percent there for face plus plus when you break
them down by skin type you can see that it gets accuracy gets lower for a darker versus lighter skinned people and then when you look at the intersectional accuracy by gender you break it up by
gender and skin type that's when you see the highest disparity in error rates right so look at Microsoft you have 79 percent sixty-five percent accuracy 65
percent for IBM so once this this work came out this paper came out there was a whole bunch of press about it there were people calling for reg of regulation for face recognition or automated facial
analysis tools and IBM and Microsoft both came out with new api's kind of seeming to fix this issue right but what does fixing this issue mean so one thing I want to say what are the lessons here
the first one is that we can't ignore social and structural problems right so we showed that there is some bias and face recognition tools you just fix that and now you make it accurate and then you can have release it that doesn't fix
the problem right you have to understand how is face recognition being used who is it being you who is it benefiting and who is it not benefiting so who are the people who are unfairly targeted by
face recognition tools let's say and who are the people selling you know face recognition tools to law enforcement um who are the people involved in the creation of AI right versus who are the
people benefiting who are the people not benefiting so here is a conference I took this picture no I didn't take this picture but I used this picture all the time I hope they don't if they find out they might not be
happy but this picture is at a conference called I clear it is a machine learning conference right one of the leading machine learning conferences and I don't see anybody who looks like the people who are unfairly targeted by
certain automated facial analysis tools or AI tools in this picture right so we cannot ignore social and structural problems this is really important
because one example I want to give you is that this person Deborah Raji came up with a follow-up paper that showed that Amazon's recognition has similar biases to the ones I just talked to you about
gender biases and this really scared Amazon right like they were Roe blog post after a blog post kind of trying to discredit the study and then there was a letter that we had signed by a whole bunch of computer vision researchers
calling on them to stop selling this software to law enforcement right but Deb the first author of this paper that showed bias and their products she
almost dropped out of her studies she didn't want to be in tech anymore she felt too isolated she felt too discriminated against she didn't know what she wanted to leave until she came to black nag which is an organization I
co-founded with my colleague Reddy's above and many other people and why do I spend so much time on black in AI because people like Deb would have
dropped out of this field if it weren't for black an ad right so even as a researcher what I want to say is we cannot ignore our social and structural problems the second thing I want to say is that there are currently no laws or
standards that govern how to use certain kinds of products machine learning products or AI products - and for what purpose right so there are no there's
there's no restrictions so we don't know if like these algorithms that are being used by law enforcement are breaking certain laws we don't know if algorithms that are being used for hiring our breaking Equal Employment Opportunity
and so I advocate for Sanders in documentation and so other industries have been there you know my prior background was was in electronics and in electronics every single component
something as simple as a resistor - as complicated as as a CPU comes with something called a datasheet right so a dinner sheet all tells you non-ideal I'm sure a lot of people know about data
sheets a data sheet tells you non idealities of a particular component right so I think we should have data sheets for data sets and pre-trained models right so if you have gender
classification systems what can I use this for what should I not use this for right maybe I shouldn't use it for high-stakes scenarios what kind of data set wasn't trained on did you train it on people from 20 to 25 ages from 2025
in a certain location or did you train it on a wider distribution so I've been spending a lot of time this last year thinking about what kinds of standardization we should have and so if
you are interested in this concept you can read these two papers data sheets for data sets and model cards for a model reporting so one talks about what this looks like for data sets and one
talks about what this looks like for models and you know again like we're not the first ones to be here so it's really interesting to look at parallels and other industries sometimes so if you look at cars right when cars first came
on the road there were no stop size there were no like driver's licenses there were no seatbelts right and it took them a very long time to even start to legislate seatbelts I believe they didn't legislated until 1967 and even
after they legislated it they had to have campaigns for people to wear seatbelts right and then if you look at clinical trials similar thing right there were a lot of unethical things that were done experimenting on
vulnerable populations and also in clinical trials they were not they were not mandated to have like they didn't have to have women for example in clinical trails right and so you find that seven
out of ten I believe eight out of ten drugs that we're pulled out of the market unfairly I'm not unfairly targeted women because that's kind of always what I think about unfair targeting of populations but
eight out of the ten drugs that were pulled out of the market had negative consequences for women right because things were not being tested on women so it took many years to have standards and
many other industries and we should learn from these other industries so one thing I want to say is that when we talk about societal biases there's many places where these biases enter when we
formulate what problems are important when we collect training data when we architect our models when we analyze how our models are used right but for some reason well I know for why in the research community at least there's a
huge focus on that one part which is architecting horror models and loss functions are used even though there's all of these things to think about and we know from surveys that people have
done that a lot of people industry are looking for guidelines on the data side right what to what to do on the data side I'm not sure if you guys agree with this because there's a lot of ethical
concerns around data but it's not as glamorous I guess as working on kind of model architectures so I think a lot of us at least in my field should be more focused
on the data side and so one of one thing I want to conclude by is saying that we want to end this concept of parachute science so a lot of times when people might feel talking about fairness ethics
etc sometimes we use the pain of certain marginalized communities in our research and we get famous we give talks like me we can you know we
have academic careers we're getting paid and we're not really alleviating the concerns of that marginalized group we're not making space for people in that marginalized group and lifting their voices right and I and I and this
has happened in many other fields and I really that our field kind of Nai moves away from that and towards kind of collaborating with people marginalized communities and giving them a voice and
lifting their voice thank you [Applause]
End of transcript