Waiting..
Auto Scroll
Sync
Top
Bottom
Select text to annotate, Click play in YouTube to begin
00:00:03
so welcome to today's interspecies conversations lecture I'm Kate Armstrong I head programming for interspecies internet and it's my pleasure to welcome you all on behalf of the trustees and
00:00:16
our organization to this lecture which is part of regular online talks and workshop series that gives the opportunity to invite leading professors scientists and researchers to share
00:00:28
their work with others to contribute to advancing the acceleration and understanding of the diversity forms and functions of communication with other species
00:00:40
so today we will hear from Earth species project AI to unlock inter-species communication and we're joined by Katie Zakarian and Sarah Keane
00:00:53
so the exponential developments we are seeing in the field of AI and in particular large language models such as gpt4 offer the potential to transform
00:01:04
our world Earth species project is a non-prof non-profit research lab focused on using the power of AI to decode non-human communication from beluga
00:01:16
whales to zebra finches the ESP team is working closely with Scientists to build machine learning models that will advance our ability to find patterns and meaning in the communications of other
00:01:29
species with the ultimate goal of transforming our relationship with the rest of nature so ESP joined our talks back in 2020 this time we are joined by CEO Hadi
00:01:43
Zakarian and Senior AI research scientist Sarah King they will explore some of the latest developments in AI unpack key steps on the esp's technical roadmap towards
00:01:55
decode and delve into some of the successes and ongoing challenges inherent to using these Technologies to expand our understanding of the non-human world
00:02:08
so I will pass it now to Katie and Sarah hi there thank you so much I'm gonna start sharing my screen okay can y'all see this first slide perfect
00:02:35
all right first off thank you so much inner species Community nurse species IO it is awesome to be back and um I love this lecture series I really
00:02:51
enjoy it so to be participating in it again is a lot of fun and I hope that people walk away learning as much as I have from the speakers
00:03:02
um that you all have had in the last uh years so thank you um today we're going to talk about sort of what's happening in machine learning
00:03:14
sort of set the stage of what's been going on the last couple of years to provide some context for the case studies that my colleague Sarah is going to present about how we apply machine
00:03:26
learning to non-human signals and we're also going to talk to the ethologists the naturalists and the audience about ways to get started with machine learning
00:03:38
um so briefly I'm I'm one of the co-founders of Earth species project a non-profit organization the organization was ideated and developed by my co-founders Aza Raskin and Brit salvatel
00:03:53
and launched in 2017. the concept for this uh this this organization goes way back to the early OTS when the Aza was an undergrad at the University of
00:04:05
Chicago but that's a completely different story for a different time but prior to Earth species project I had been working on developing animal-borne
00:04:17
sensors remote sensing Technologies and data collection methods and I collaborated closely with Wildlife biologists Wildlife veterinarians Park Rangers those on the front lines of
00:04:31
biodiversity conservation and in that work I learned that policy and Technology Solutions can sometimes have small limitations with very big implications and that developing the
00:04:45
ability to listen and learn at very fine detail yet at huge scale could be one of the most important problems that we technologists could help solve and then every once in a while you meet
00:04:59
extraordinary people doing extraordinary work and so in 2019 when I met Britt and ASA my co-founders it felt like opening up a clamshell and discovering pearls you know here are two very big minds
00:05:11
thinking about how they could harness their expertise and the development of new technologies and artificial intelligence research to address an extremely big challenge which was inner species communication and a deepening of
00:05:25
our understanding of our co-species on Earth and given my experience it was immediately apparent to me how machine learning could support field biologists
00:05:37
Wildlife veterinarians Park Rangers in accelerating the manual process of data analysis and as a technologist I
00:05:49
couldn't imagine a more impactful way that I could support scientists biodiversity conservation planetary health and moreover you know within our species every major challenge we've ever solved
00:06:02
we've done so with collaboration and communication and the giant ones facing us like climate change will only be solved with communication indeed those who profit from denying so invest
00:06:15
massively in communication so what we see is the solution of you know to the next order of giant problems will eventually be the power to communicate beyond our current capabilities Beyond
00:06:28
maybe even our current species and so this Earth shot as we call it that we're aiming for at Earth species project is for machine learning to decode non-human communication and then that new knowledge and understanding that results
00:06:42
from that would reset our relationship with the rest of Nature and you know this is a to me a really compelling as a potential unlock in addressing the biodiversity and climate crisis that
00:06:56
we're saying to help us find new ways to Coexist on the planet with other species so back in October of 2020 when we met with you all
00:07:09
um we were talking about our current approach our technological road map and uh one of our researchers Peter Vermont presented uh his approach to the
00:07:22
cocktail party problem or what we call also signal separation um which was later published in uh in one of the nature Publications
00:07:34
um but a lot has happened since then you know the field of artificial intelligence is advancing so quickly organizations like ours focused on machine learning research it requires that we remain very agile
00:07:48
and opportunistic and to constantly be reassessing of what's possible and understandable to our collaborators our supporters and be able to integrate new techniques into our roadmap you know in the last
00:08:02
years there's been the the Advent of many Technologies but one thing that's sort of been a breakthrough is the productization of large Foundation models which many of you might know also
00:08:15
as chat GPT and you know I I just instead of going into the technicality of of how it works I just want to just instead show you what I did prior to this uh
00:08:28
this uh presentation I asked GPT hey can you write a song for me about the benefits of machine learning to ethology research and the style of Peter Gabriel
00:08:43
um and this is what it created um and why this is interesting is it this model is able to preserve the concept of song it's able to preserve the sentiment it's able to understand
00:08:55
sentiment and bring that across in what it's giving back and also it can understand the concept of style and I'm sure a lot of you if you're um you know in Academia you're grappling
00:09:09
with the Advent of new technologies as Educators um and for many of us we've been playing with this it had over 1 million users in one month it's one of the fastest growing applications and I think today I
00:09:23
saw stat it's uh has 60 million you know monthly users so large Foundation models this work that had once been relegated to Academia and machine learning research is now kind of out front and
00:09:35
center um and then this other sort of development also happened in the last couple years just clip models um and this enables us to do predictive
00:09:47
modeling across domains um what do I mean by that it means that you can understand and provide the model information in one modality and it can essentially translate it into another
00:09:59
and so this example here is I put into an application called mid-journey give me an image based off of text and the text was inner species internet
00:10:11
and this is what it was able to give me back um and you can do this with any modality be it music text images fmris and maybe also animal signals and so the
00:10:25
what is the opportunity for ethologists in the Advent of these machine learning breakthroughs now at this moment I'm going to pass it over to Sarah Kane one of our senior AI research scientists to
00:10:38
talk through some of the case studies with our work to date with ethologists and behavioral neuroscientists to talk about these opportunities thank you so much Katie uh so I joined
00:10:52
ESP this spring after working in Academia for several years doing field and lab-based research studying bioacoustics and focusing on animal communication mostly Birdsong and I was
00:11:06
drawn to this organization because I'm very hopeful that machine learning is going to help us to discover patterns and animal signals that we couldn't previously detect and I truly think it will change our understanding of animal
00:11:20
communication so I'm going to tell you about some of our ongoing projects and collaborations and in all of this work
00:11:31
esp's approach is aiming to mirror that of the broader machine learning community so we are trying specifically to methodically establish foundational
00:11:44
components like data sets and tools that are going to help researchers to transform this area much like the other Realms that Katie was just talking about so one of the first pieces of our
00:12:01
machine learning toolkit is a benchmark data sets so a benchmark is an open available data set that anyone in the public can use and it serves to give us
00:12:13
a way to compare the performance of different machine learning models and these have been really key in advancing the field of machine learning within the last 10 to 15 years particularly in image recognition and
00:12:26
human speech recognition so the logos on the left here you might be familiar with these benchmarks imagenet it's a collection of millions of annotated photos
00:12:38
that has helped us to have the image recognition we now have in Google today and then there's also audio set which is manually labeled audio recordings and having this available to train machine
00:12:50
learning models has facilitated the software that's running on all of our phones now that helps with speech recognition so usually when you see a benchmark data set published it'll be accompanied by a
00:13:03
leaderboard like we see in the lower right corner here and this is just comparing the performance of different machine learning models like resnet and looking at the accuracy and precision of
00:13:15
each model and its ability to do a classification task on this labeled data so in bioacoustics um a lot of the methods that are
00:13:27
developed due to limitations on our data collection abilities they tend to focus on a smaller number of species or maybe a specific type of methodology but to get the generalizable tools that we
00:13:41
already have in say image recognition we need big diverse test data sets so some of our senior research scientists shown in the photos on the left here have created what we're calling beans and
00:13:54
this is a benchmark of animal sounds and it's a collection of audio recordings from more than 250 species and this large aggregate data set is a way to
00:14:07
test tools for classification and detection and these are outstanding problems in bioacoustics that we desperately need solutions to and in the publication of this Benchmark data set
00:14:21
our research team has gone ahead and run some of the most popular machine learning models on the aggregate data and in the table on the lower right you can see the performance and so this is
00:14:34
giving us levels Baseline levels of accuracy and precision against which we can measure our progress and that's really what we're hoping the Benchmark will be is a proxy for progress in this field
00:14:50
and so we know that animal communication is multimodal as well and we want to encourage the analysis of other data types Beyond just audio recordings so for that reason two of our senior
00:15:02
researchers Benjamin Hoffman and Maddie cusumano have also developed a biologer benchmark data set and so a biologer is an animal born tag like the one in the image on the right here
00:15:14
and these produce very valuable data because they can inform us about animal ecophysiology and allow us to improve conservation by monitoring animal movements and behaviors with very high
00:15:27
resolution unfortunately in analyzing biologer data we suffer from the same problems we don't have a common Benchmark against which to compare the merits of each
00:15:39
machine learning technique so Ben and Maddie have worked with many collaborators from other universities to publish this recent paper that has the Benchmark data set
00:15:52
and it includes audio or biologer recordings from nine different paxa um and within the paper we not only include the data set but we also provide
00:16:07
a framework so a set of guidelines for evaluating machine learning models on these types of data and a code package that will automatically run the analysis for you and so again we're hoping that
00:16:19
this facilitates progress within the field by equipping researchers with the tools they need to iterate upon and improve methods in a standardized way and another thing that we're doing to
00:16:36
advance the broader field of machine learning and bioacoustics is building Foundation models which Katie mentioned a moment ago so a foundation model is a
00:16:48
very large neural network that has been trained on huge amounts of data and these models can perform different tasks without being explicitly trained to do so and that's what makes them so
00:17:01
powerful so for example chat GTP can generate language um in the form of Peter Gabriel song lyrics or answering questions um or writing an essay for an exam
00:17:13
um so we have built the first and only foundation model for Bio Acoustics masato hagiwara one of our senior researchers has named it AVS and it has been trained with 5 000 hours of sound
00:17:26
that include vocalizations from many different taxa and the framework is an encoder based audio representation model so basically what that means is we can
00:17:40
put in recordings of animal signals and the model will output a learned representation of each signal that's basically a simplified Vector of measurements and in the publication that we have on
00:17:54
our GitHub page masato has compared the performance of this model to some of the other top-line machine learning models and we're finding that in many cases 80s is outperforming existing models
00:18:07
so we're hoping to continue to iterate upon this and to build more models and as we're continuing to build this machine learning toolkit we've also begun to analyze animal data and work
00:18:21
with other researchers um so one of the projects that we've taken on within the last year is studying vocal signaling in beluga whales and to do this we're working with collaborators at the University of
00:18:38
Windsor specifically a doctoral student Jacqueline Aubin and she is focused on the St Lawrence River Beluga population which is highly endangered
00:18:49
and despite decades after a hunting ban have passed this population has failed to recover and that's because their recovery is being threatened by high levels of underwater Noise by
00:19:03
contaminants from the water and limitations on prey availability so our collaborators are hoping to study the vocal behavior and social structure of the population with the goal of
00:19:17
Designing better conservation approaches so for this work we have been specifically focused on content calls which are vocalizations that individuals make to maintain group cohesion or stay
00:19:33
near to one another they're often used so mothers and calves can find one another um within a pod and this is similar to other dolphin species that
00:19:44
um use calls in similar contexts and what is thought is that belugas with closer social associations will have more similar contact call structure and
00:19:58
so what that means is that we can actually measure the calls and get an idea of the relationships within this population and in the lower left corner here there's a picture of a spectrogram
00:20:11
that shows what these contact calls look like for different examples and unlike other dolphin species they have a very complex structure so they have tonal components as well as these Broadband
00:20:22
clicks so this is a unique problem to be solved to analyze the differences in these calls foreign so I have been working with these
00:20:34
researchers to do an analysis of how similar the call structure is throughout the St Lawrence River population and I took focal recordings of belugas um that were manually labeled by our
00:20:47
collaborator Jacqueline and she classified all of these 1800 calls into six different classes and I analyzed these calls with our foundation model AVS and from the output of that model I
00:21:01
was able to assess similarity between all of the calls in this data set and I've made a 3D plot of the results of that that's shown on the right here and so in this point Cloud
00:21:14
each of the individual points represents a single Beluga call and I've projected this into 3D space and it's rotating so we can see the results better but points that are closer to one another within
00:21:27
this projection are calls that have more of a similar structure as assessed by our model and the colors of each of these points are the labels that were manually
00:21:40
assigned by our research collaborators so the fact that these similarly colored points are next to one another in this projection means that our model is matching the predictions of human
00:21:54
experts so this is hopefully going to tell us two things um first our model can do the same analysis that a human can do hopefully with high precision and we could use
00:22:06
these tools to then accelerate the progress of research by automating some of the analysis that needed to be that needs to be manual at this point and then number two on the next slide we
00:22:19
think that we can then infer population structure based on the similarity of these calls and that's important because even though all of the belugas that we're studying are in the St Lawrence River we the researchers suspect there
00:22:33
might be sympatric populations so populations that overlap in Geographic space that are socially distinct from one another and that's important to understand because we can better assess the degree of human impact on the
00:22:47
belugas and know what units we're working with in terms of different populations within the same region um to give one other example of an ongoing collaboration we are also
00:23:03
working with some researchers in Spain at the University of Lyon Victoria baglioni and Daniela canistrari to study Spanish carrion Crews and so this is a different ecological
00:23:15
question that we're hoping to answer here we are aiming to link vocal signals with behaviors and we chose this species because they're very ecologically interesting
00:23:28
because these carry-on crows are Cooperative breeders so that means that instead of a single pair of birds building a nest and raising their offspring just the two of them these uh
00:23:41
carry-on crows often live in communal groups with up to nine birds and they work together to raise each other's Offspring they help each other find food and fight off predators and it's thought
00:23:54
that perhaps some sort of vocal signal or body stance or other communication signal might be facilitating these Cooperative behaviors so researchers have outfitted the crows with these
00:24:07
wearable bio loggers there's an image of one here on the right and they have many different sensors we're focused on the microphone recording audio and the accelerometer that's recording the body
00:24:20
movements of the birds and so we want to use the data from these two modalities to get at this question of whether particular calls correspond to specific behaviors
00:24:33
so we have a few different ESP researchers that have been collaborating uh with the University of Leon Maddie and genuine Benjamin and they're hoping to design one of these clip models that
00:24:46
Katie mentioned earlier where we're able to translate between modalities so for example looking at the top right plot given this motion and this is a 3D rendering from an accelerometer of the
00:24:59
path of animal motion given that movement path can we predict what sound is being produced at the same time so can we generate predictions with a high
00:25:11
level of accuracy and so to approach this the ESP science team has developed models that can encode both the audio streams and the
00:25:23
accelerometer streams and create different classes of either vocal signals or categories of movement and then we're hoping to create mappings
00:25:35
between movement and audio so that we can have these accurate predictions and this is an ongoing project our researchers meet with the University of Leon every few months and data is
00:25:50
continuing to be collected and the research team goes back and iterates and improves above the model and and then regroups as we keep building on our understanding and expanding our work so this is currently in progress so stay
00:26:03
tuned for results and one last example that I'd like to give of some of our ongoing work is uh very fun we're using generative models to conduct interactive playbacks
00:26:19
with other species and so a generative model is a neural network that has been trained on a large number of examples so for example recording of speech and it
00:26:32
can create novel synthetic samples that resemble the training data but are unique and differentiated from it in some way so this is already happening
00:26:46
with human speech and it's best Illustrated through an example um I'm going to play an audio clip that has a generative model that was trained on human speech and it will start by
00:26:59
playing a little bit of the prompt of the training data that was used to build the model and then it will continue in a voice that is um
00:27:11
it's telling us or using the synthetic sounds that were output by the model so we're having the training data and then the novel output following it lording answered wolf we know not how to call you Lord or lady we have lived too
00:27:28
long in the forest and are now our first impressions are in nine cases statical reflections of the actuality of things but they are impressions of something different and more
00:27:41
[Music] yeah and so it's possible both the speech and with music and we are working to create a
00:27:59
generative model for animal communication so our senior researcher Jen you you has created one of these generative models for Birdsong and I'll
00:28:11
play an example of a chiff chaff which is a European song where some of you may be familiar with the first two chirps within this are part of the training data and the rest was generated by the model
00:28:23
foreign and so we are beginning to use these models interacting with animals uh collaborating with some researchers at McGill University uh we're working
00:28:40
with Dr Logan James who is a postdoc and Mcgill and we're focused on zebra finches and we've selected this species because as many of you probably know zebra finches are a well-understood
00:28:55
model system we know a lot about song acquisition and we understand that there's a sensitive learning period that is controlled both genetically and by social landscape and we even know in
00:29:10
great detail the neural architecture that's underpinning the acquisition of vocalizations like in this graphic in the lower middle here so because we understand so much already
00:29:22
about zebra finch song we want to start here and try to First address this question of can a generative model interact with an animal in a realistic way and once we have achieved that we
00:29:36
hope to allow for many unstructured playbacks with a computer model interacting or speaking with or having a back and forth with a bird and we think
00:29:48
that we can actually discover quite a bit from that so by creating a colon response behavioral Paradigm and recording the bird's response we expect that we'll be able to infer some of the
00:30:01
rules that are governing syntax and formation of songs and patterns and timing rules in these social interactions and we might be able to identify certain
00:30:15
features of vocalizations or particular calls that then influence behavior and I think a good analogy to explain this is thinking about chat GPT
00:30:29
um many of us have probably had interactions with chat GPT online through a text dialog box and that's almost like a playback for humans so chat GPT is a model that presents like a
00:30:43
human and it's very realistic sounding text and we will respond to it and if we record all of the human responses we can probably analyze that text and start to
00:30:55
understand how our language is formed and maybe even something about the neural basis of language or how our brains work so this is what we're hoping to get at within the coming years
00:31:06
um using generative models so that was just a few examples of some of our ongoing work we have many more collaborations that we don't have time to discuss today if you have a data set or a study system
00:31:22
or an idea that you'd like to discuss with ESP we would love to work with you here's a link to our website please get in touch and we're always ready to bring on new collaborators
00:31:37
and one last note for any other researchers in the audience who are studying inner species communication ESP has recently started working with the experiment foundation and we are
00:31:49
currently helping to put out a line of funding that is providing support for researchers in this area and it's the
00:32:02
experiment Foundation itself their goal is to match researchers with donors who can help fill this funding Gap with early stages of curiosity so if you have a nascent project idea that you'd like to get off the ground this grant would
00:32:16
be well suited for you and refunding projects up to ten thousand dollars the applications are open now at experiment.com hmm awesome
00:32:31
so hopefully in listening to those case studies it's inspired some ideation about how you may get started with machine learning we've learned a lot over the last years working with dozens
00:32:43
of collaborators and we want to pass on what we've learned um to the ethologists in the room um what's a good way to start working with machine learning and we we're going
00:32:55
to provide these on our GitHub and on our website but just to sort of review them in brief you know consider hiring a machine learning researcher into your ethology lab
00:33:07
join working groups or start a working group at your University that is looking at applications of machine learning in various disciplines outside of computer
00:33:20
science um start a machine learning office hours in your department bringing in machine learning students into your biology departments
00:33:33
consider providing data to the machine learning Department's ecological data to your your colleagues in different disciplines and join online communities
00:33:45
you know this is a great one of the inner species IO has a slack Channel there's also wild Labs we have our Discord there's an AI for conservation slack and
00:33:58
um you know if you're a grad student consider taking on a machine learning researcher as your co-advisor um we also have learned a lot about how data can be
00:34:11
um collected and annotated and pre-processed and applied for machine learning and we wanted to pass along those learnings to the broader field and again we're going to provide this
00:34:23
information on our website and you know good machine learning data sets has a lot of overlap with good documentation for biology and here are some sort of key things that we would recommend
00:34:35
around what you might annotate and what you might share we also have sort of highlighted two best practice data sets one is Dan stovall's bird data set
00:34:46
another is Muse db18 for sound separation these both articulate many of the attributes of documentation that we would recommend for those who make data collect data
00:35:00
um and sort of lastly if you if we we realizing the different reasons why one might share and not for publication and for IP protection and and other concerns
00:35:13
but if you do feel comfortable sharing your data publicly here's a few ways to get started and um again we're going to share these recommendations with with our colleagues
00:35:26
in the field and I guess the the point I want to sort of bring in is like we really want the tent of machine learning to be as broad as possible so that way ethologists are
00:35:39
having these tools and are benefiting just as much as other fields other in in the industry in Academia and in um and so we're thinking a lot about you know what are the incentives motivations
00:35:52
focus and values for the various stakeholders building machine learning and we want to make sure that those who are focused on conservation biology biodiversity planetary Health have
00:36:06
access to these tools and have a seat at the table and bring their findings and their perspectives to the conversation you know an interesting kind of like observation of the field at large is
00:36:20
foundation models which uh there are many and there's many more all the time they are named for animals like nine out of ten of them are named things like
00:36:31
llama alpaca Panda GPT Sparrow Falcon chinchilla gorilla goes on and on it's a it's a menagerie of animals and yet um
00:36:44
ours is the only one that is Anna looking at animal communication systems and animal signals and I think that's a again we want the field to become broader we want to bring ethologists
00:36:58
into this conversation and have machine learning be uh something that benefits everyone um and we realize that solving this problem of interspecies communication or
00:37:13
deepening on our our understanding is going to require collaboration and communication across many many different disciplines and none of our work or the work of this field would not be possible
00:37:25
without the advice and collaboration of of you all in the in the community who are working in ethology and neuroethology and we're just we're so grateful for the diversity of partners
00:37:37
that we have and know that the essentiality of the bringing together of multiple disciplines I mean even within our own team of AI research scientists we come from a diversity of backgrounds
00:37:51
in fields and just a little plug we are hiring we are hiring a a research director and if you are interested please go to our website um and to learn more
00:38:03
oh I don't know what just what just happened there sorry um can y'all see this slide cool um lastly I just wanna I would be remiss
00:38:17
without mentioning um something that's being discussed in the public discourse which is around ethics and safety and that is absolutely a concern and something we have a
00:38:29
responsibility to be thinking about and we want to ensure that we stakeholders conservationists Wildlife biologists field biologists are working together to Define an
00:38:42
ethical framework and inspecting these models and you know some of the things we may want to contemplate together are how do we create unbiased models and balance in our training sets you know what are the
00:38:55
requirements we're going to have in model performance should these models their weights and these data sets be openly available or should we have some kind of gating or safeguards and um you know we really
00:39:09
look forward to being part of that conversation and bringing in as many stakeholders to contemplate these big questions so in in essence I I hope that um you know we uh
00:39:26
understand the capability of these models and there's this uh there's this analogy that my co-founder is a um often references which is the ability of machine learning to act as a sort of
00:39:38
telescope or microscope it enables us to to see things that our human capabilities may may not um based on our physiology or our neurology and you know in in that sense
00:39:52
it has the ability to detect meaning and unlock understanding um it also has the ability to be a productivity multiplier in so many fields and industries including ethology
00:40:04
and the discoveries that we are able to make as an interdisciplinary field may very well change how people relate to non-humans but at the same time
00:40:17
um I I love this quote you know it is these multi-stakeholder really complex challenges that we're faced with things like climate change require
00:40:29
people we're you know decision making among human beings and so it's really important that ethologists Animal Welfare activists conservation biologists behavioral neuroscientists naturalist Wildlife veterinarians you
00:40:43
know those who are listening and care about science and change have access to these tools and are using them and creating the kinds of conversations and the ability to communicate in these
00:40:56
multi-stakeholder big Heritage complex issues um and so we really hope that the field continues to build and we move towards being able to better listen and
00:41:08
communicate with one another about these and maybe even bring in our other species on Earth um yeah so you know we're come to the end here um we're excited to answer some
00:41:23
questions um here's our contact information if you want to be in touch with us um and follow along with our research be part of our community on Discord
00:41:35
um please find us and and connect um and yeah thank you so much for listening and we're excited to to have a discussion together awesome thank you so much both Katie and
00:41:50
Sarah that was a really great I think overview and also a really great kind of follow-up to what we heard in 2020 so this is really I think going to be
00:42:02
um yeah a great place for us to start a discussion I do see many questions already popping up throughout the talk um in your sorry in the chat throughout your talk so I don't know if anybody
00:42:14
would like to use the um the hand button to ask a question if anybody wants to jump in if anybody is interested to start us
00:42:25
off in the discussion would you like to finish sharing screen Katie and that way we can get everybody's so you can there's a raise hand button
00:42:41
at the bottom if anybody would like to use that to ask a question so we have Jeff who would like to ask a question would you like to kick us off Jeff yeah thank you guys
00:42:53
um awesome fast-paced digestible information um I'm gonna repeat a question I had but I've seen other people follow up with it and I thought Rod did a good job of kind of summarizing what I was saying but
00:43:05
there's a lot of assumptions about phonemes versus morphemes or when does the sound carry meaning and so if you could help us ex you know understand some of the assumptions around how
00:43:18
you're tokenizing the sounds and what you're learning I guess over time and readjusting back to where there might be time scale meaning with certain species and
00:43:30
frequency meaning in others so that's the general question um thanks that's a very good question um how are tokenizing sounds so this to go back to describing the foundation
00:43:44
model that we mentioned AVS this is based on a Google model called Bert that was first published in 2018 and that was trained on human speech and um that
00:43:57
encoder model is taking human words and then creating some sort of shrunken down reduce dimensionality representation of them so just like a few measurements and we have fine-tuned that model to train
00:44:12
it with animal sounds so we are now expecting it to pick up on the different types of variations such as pitch Contours or or different time resolution and change over time that we're seeing
00:44:25
in different taxa uh so the tokenization is happening using that bird architecture the same way it's happening with human speech what um an open problem really is and a
00:44:38
really good question is how we are defining a word and the unit the unit of analysis and so at the moment we are using our human discretion to to determine this in many cases like where
00:44:52
does a single Beluga call start and end um we're limited by our own perceptual abilities and what we can hear and and see in a spectrogram and so that does leave some room for error uh the way that we refine this is continuing to
00:45:06
collect more and more data and and transform the data or apply manipulations to see if we get different results and then to do validations with animals with uh when possible so in a
00:45:19
very controlled environment in a way that doesn't put animals at risk to sometimes do playbacks or or measure their reaction into signals to discern whether or not they do contain the
00:45:32
meaning that we suspect so this is an ongoing process and and one of the biggest challenges in the area thank you thank you Jeff I think that played a few questions in the chat so that's great
00:45:50
Darcy would you like to ask your question sure um so our lab uh studies vocal Communication in an aquatic and urine
00:46:01
frog xenopus um it's a very old species a group of species there are about 29 species I have about 19 of them in the lab and we
00:46:14
study mostly courtship songs and the songs themselves have distinctive spectral features and the spectral features are due to entirely
00:46:26
due to their vocal organ their larynx so they developed a quite novel mechanism of producing sound underwater that does not involve airflow and each in a male
00:46:39
each sound pulse includes two dominant frequencies and genetically related groups of frogs frogs in a clade for example while they won't share those two
00:46:51
frequencies they will share the ratio and the ratio our our harmonic musical intervals so this was actually discovered by a student who was a musician it's in an E-Life paper Quan
00:47:04
Brown is the first author so I'd be very interested in collaborating with somebody to look through our entire library of underwater sounds which is extremely extensive and see if we could
00:47:18
make that database available widely and I'm particularly interested to determine whether there are acoustic signals of evolutionary Divergence and convergence across these species and these are also
00:47:32
species that form new species by hybridizing so the ploidy levels in these animals go all the way up to dodecaploid so the other part of our work has been identifying the portions
00:47:44
of the genome that are responsible for the different patterns of the song and the different acoustic features and with respect to the patterns we have mapped the entire neural circuit for both the
00:47:58
sound production and sound reception and identified one particular kind of cell whose intrinsic rhythmicity is responsible for the intrinsic rhythmicity of the songs themselves
00:48:11
and so we've done that in a couple species it could be done in more so just to let you know that those data are out there we've been using various machine
00:48:22
learning approaches to characterize the behavior of females these are hybrid females towards the cause of either parental species this is an area I think that's really quite right for machine
00:48:35
learning approaches and uh we're going to be working some on it this summer as a group and if anybody wants to collaborate just let me know dbk3 at columbia.edu
00:48:50
oh thank you that that sounds so right for increasing our understanding especially because you have the genetic data as well so we'd really possibly be able to tease apart the different
00:49:04
effects that are shaking these vocalizations so thank you so much wonderful uh okay so we have the next question coming from well I think the only dog in the group today
00:49:19
so this is this is amazing you might not be the only dog I'm not sure uh so it's it's splitting because all I mean is properly built uh ceramos an app to talk to dogs but um the question is
00:49:32
that we try to use AVS and the results didn't get us um the kind of embedding that you showed earlier so I don't know if they've been recent updates but um we ended up having to use uh
00:49:45
something like an efficient get net outside of that with our spectral but we have the Oh by the end of the summer we'll have the largest data set of barks with uh that are labeled from people
00:49:57
with what their dogs are saying both theme labeled and uh seen unlabeled so we've done a lot of inference and we're really close to doing the translation
00:50:09
um so but what we were finding is that with dogs specifically you also have to train them to expand their vocabulary it's not just in uh inherence I was wondering if you see that with other animals or if this is something that's
00:50:22
unique because they're Companion Animal and they're communicating with us and yeah if there's any other kind of observations but yeah we spent um the last year collecting data and building models and so we we've tried
00:50:34
out a lot of different approaches but um what it turned out seemed to work the best for like Dumber models and so the next question is like our foundational model is really the right approach with a very limited data set that uh that
00:50:48
you're getting especially that's unlabeled where a lot of foundational models are built with like massive data sets that are incred are high labeled and um and uh have gone through a lot of
00:50:59
like human uh in the loop uh feedback um oh that's such a good question there was a lot in there um I'll start at the beginning uh the first one about your comment about using AVS and getting different results uh one
00:51:15
thing that I glossed over when I was presenting our results from the beluga whale calls and that rotating Point cloud is that I did some post-processing of that Foundation model output I I ran
00:51:28
a random Forest uh on it and I was able to get the similarity between calls using that and that's some code that will be on our GitHub soon so that would allow for inferring those types of
00:51:41
relationships um if the model output is is useful for your analysis and also I guess what you're saying is that in dogs you might need a different approach and I don't
00:51:54
think we've worked yet with species that have acquired vocalizations uh through interactions with humans I believe we we have some of that data but we've not done thorough analyzes so this is the
00:52:07
kind of area that we'd love to dig into more and you mentioned having a data set that's ready to be analyzed and I think if you're up for working with our research team we could iterate on Which
00:52:20
models are most appropriate for that because it could be that the sort of variation you're looking for between signals is not uh perhaps what our foundation model is best at discerning
00:52:34
sounds good yeah um well yeah we could talk offline about the rest but yeah curious about like why foundational models when when we're uh data set limited as well yeah
00:52:48
um I guess Foundation models are expected to work best when we have a lot of examples that look like the data we're going to encounter in the wild exactly and um this is this is going to be a
00:53:02
hard problem when we're studying animal communication because all of our best foundation models for human speech we have millions of examples of pretty much the same kind of signal so we're getting
00:53:13
really high resolution like thousands and thousands of examples of the same word and I mean we found whisper is bad at translating um when people are speaking and it makes mistakes like the error word count is
00:53:27
like a very very high I don't know exactly but it's it isn't very good and they've trained on 640 hours of or 640 000 hours I think of human speech so that's why I was like is a foundational
00:53:40
model the right approach at this point yeah and maybe some unsupervised techniques would do better I guess is the idea where we're not relying completely on a really good training data set yes yeah and and smaller models
00:53:57
as well yeah what kind of dumb models worked well on on your sounds oh I mean we went way back we did fbms K n's we did a really really old uh
00:54:11
um neural network like it's like very it's just whatever works I mean we have to build a commercial product right so um so yeah we'll have the first dog communication thing within the next
00:54:23
couple of months but yeah it uh it took a lot of like going back to the basics because there's so much that we were not able to capture from the signal with neural networks because they oftentimes
00:54:35
will just categorize a bark and won't be able to distinguish it between barks and embedding so we had to create a lot to identify the differences in the in the characteristics that could allow us to
00:54:48
identify between even different dogs and like the embeddings that were out there don't don't do that so we spent a lot of time on just like simple problems um one thing that's relevant is that
00:55:01
there's a idea in the machine Learning Community called The Bitter lesson and the idea behind that is that models will continue to get better the more data we train them with and as much as we'd like
00:55:14
to have another solution we this is kind of a bottleneck and if we get enough data there's a belief that we will eventually be able to solve the problem um and in
00:55:26
speech analysis it seems like to get to those uh unsupervised or few shot approaches that work well we've still needed to go through that hoop of first building a foundation model with lots
00:55:39
and lots of data so I I think this is something that's Up For Debate it's an active area of research but uh it seems as though getting all the data first and building a big foundation
00:55:51
model is the classic first step great thank you for that that was really fascinating and I think we have a lot of dog lovers in our community so this always brings a lot of interest
00:56:07
um Jonas would you like to ask your question next uh sure um regarding what you what you guys just talked about um what is your opinion on the the
00:56:18
chances and and problems in in supervised versus unsupervised machine learning in this context um I I'm a big fan of unsupervised approaches for repertoire Discovery
00:56:37
because they're going to allow us to find things that we're not looking for and uh so I think we're going to have to use both into parallel um with supervised training we have to
00:56:49
make some decisions that we could easily be wrong about like determining determining categories of signals um whereas in unsupervised approaches um we don't have to have that uh priori
00:57:03
set in stone um but I I'm a big believer in going back and forth and figuring out which path is giving us the best answers and um and then validating those through uh
00:57:17
controlled um ethically done uh interactions with animals um to see that our predictions are actually matching reality so in your um in your Beluga
00:57:31
experiment example um were there um any unsupervised Parts did you did you uh go back and forth as the relationship
00:57:42
yeah that that analysis was done using a supervised model I guess our our foundation model um to First Take feature measurements and then the unsupervised part of that
00:57:55
was the post-processing uh random Forest decision tree that created the classes based on the model measurements and uh so so it was a combination of both the
00:58:08
purchase and the random Forest is a favorite of mine because you can have classes of different size you can have a category with two calls in a category with 200 and it allows for that and it um controls for other biases in your
00:58:21
data um so yeah that's an example of combining the two of them in order to better understand the communication system yeah thanks so much thank you Mark would you like to ask
00:58:36
your question we don't hear Mark I think you still mute okay fantastic I'm very interested to hear more about the extent to which the models that
00:58:53
you've been talking about have the capacity to take into account uh environmental context data and historical context data that can be correlated with the data with the
00:59:08
vocalization data to help understand what the animals are actually talking about is that clear all right yeah um Katie can comment on this too after
00:59:26
me um we are we're aiming to do that largely through associating text notes and other
00:59:38
environmental measurements with audio signals or the other communication signals we're measuring so these clip models that go between modalities like when we write a text description and an
00:59:50
image is generated by building those mappings and so if for thinking about the social context or behavioral context of two individuals interacting ideally
01:00:02
um this is this is where the power of researchers and intense observations come in a researcher could write a description of two individuals having an interaction and mention their relationship the fact that a predator
01:00:15
was just there 10 minutes ago the fact that they had some kind of dispute two weeks ago and we can do a text analysis using some of these machine learning tools that are already out in the world and Associate those text features with
01:00:29
the vocalizations that then happen um so that's that's our current approach to associating context and then linking things like GPS measurements climate
01:00:40
data anything else we can get our hands on and hoping that our models are power and powerful enough to build these mappings across modalities and then give
01:00:52
us predictions based on that because that's an awful lot of data isn't it I was asked on the basis of the work by councilor bochikov with the um uh
01:01:04
um prairie dogs uh where they changed one environmental variable at a time and the painstaking work that was involved in that
01:01:15
um it seems like the models would need a huge capacity yeah and that work has been so essential to understanding exactly what kind of models we need and and all of that work
01:01:30
like a key lesson from it has been that a behavioral interaction is never just what we're observing within the moment there are days months years of context behind it
01:01:41
um that are loading to every interaction we're observing yeah thank you great I see I don't know Khan I think you're you're in the room yes there you
01:01:57
are so we often have this reference so this is great thanks for being here again con and I don't know if you want to make a comment on this at all if you thanks Kate
01:02:17
I just want to say that I think it's really wonderful stuff that you guys at Earth species project are doing I think these are the building blocks of
01:02:29
eventually getting us to the point where we can understand what animals are communicating as you know I talk about animal languages I think that these will form the building blocks of understanding
01:02:43
that animals have discrete languages of their own and my hope is down the road we not only will be able to understand those languages but also be able to talk to animals so that eventually we can get
01:02:55
to the point where we will see animals as sentient thinking beings that we can have Partnerships with and not consider them as creatures to be exploited for
01:03:07
our own uses so anyway I think you're doing wonderful stuff and I applaud what you're doing awesome thanks come on I don't know if anybody else has any questions we have
01:03:21
the the chat is on fire so I think we have lots of discussion happening there lots of potential collaborations popping up so this is really really wonderful does anybody else have any last minute
01:03:32
or final questions that they wanted to bring up and discuss while we have the team here I think could I just maybe raise one
01:03:55
question that came from the group it's in the chat and it's from Antoine who is uh joining us he's one of our art um or artists in the community and I
01:04:10
think this is a really interesting question that he brings which is around you know how maybe artists can get involved in collaborations here and if there is an opportunity for that to happen so I don't know if Antoine you
01:04:21
want to ask this question but I think it's nice to bring this it brings a different perspective to the discussion that we have today then if you want to comment I see you have your mic off but we don't hear you
01:04:43
yeah look oh yeah well we hear you sorry the camera is not really working but no it's okay thanks for picking up on on the question here it's kind of a very general question I've been really interested in
01:04:55
uh through my practice in Sound and Music in in um animal communication um and obviously my background is you know there's a little bit of coding involved and sort of as a general
01:05:08
cultural into into using these tooling composition but not to the level I've been hearing today in Mission learning and so generally as an artist I'm just really interested in typing into this subject to question it and share some of the beauty of it and the importance of
01:05:22
it as as Khan was mentioning um that you know it's really has your work is amazing for the same reason that we're explaining I think that the the scope new relationship with other beings and uh other species and hopefully
01:05:35
improve our relation with the environment in general and so yeah as an artist I'm just interested in creating work and and thinking of Art and Science collaboration that could communicate some of this research which a lot of the
01:05:48
public doesn't know about um and yeah so my question is is whether you've already or you are maybe collaborated at ESP with artists or if this is something that you envisage in
01:06:00
the future and and and how would you see it and uh as part of maybe your your work more maybe on the communication side or the sort of broader reflection size that is a great question
01:06:18
uh the artist that um captured the image at the start of our um at the start of our presentation his name is Carrie Wilk he's a photographer
01:06:30
um and images and storytelling and art have such a powerful way of connecting with human beings um who like we said are going to be
01:06:44
necessary for us to collaborate and communicate amongst each other in order to solve these massive problems and so certainly we believe that art is going to have a very big role in this in
01:06:57
making the science accessible and driving um change among uh sort of social and political decision making and the relationship that people have with
01:07:10
um each other and other species I would say um we're deeply inspired by artists but we're in our nascent stages of our own Communications work and trying to build
01:07:22
capacity to have relationships but I think it's for sure something that we view as essential not just the science itself but making sure that the science is accessible to as many people as many
01:07:35
stakeholders as possible and like I said art is going to be a huge Avenue in doing that um to drive that much needed change urgent change for all of us
01:07:46
um and so we you know we love to be in touch we may not have the capacity to um to engage and participate in in work right now but it's certainly something we hope for in the future and would love
01:07:59
to just build relationships where they are so please be in touch and join our community and and reach out to us um that would be wonderful thanks very much for your answer it
01:08:11
sounds great thanks great thank you um we have another question from Serge would you like to ask your question yes hi can you hear me yes
01:08:23
okay so I would like to thank ESP for this great presentation and amazing work that you're doing uh I finally understand what the ESP is all about and I just would like to have a narrow
01:08:37
question on what is um ESP doing with uh uh Dr Albay and the belugas of the Saint Lawrence which I've been following for over 50 years and from far and
01:08:52
I think it's a primary species where we could have a real interaction with the smart animal that could actually speak to us that also may be interested in speaking to us so is there any more work
01:09:06
and what what are you planning to do with Dr obey or or any other balloons researcher um with the presented example we are aiming to help the researchers publish
01:09:23
results showing that these analyzes can um mimic or replicate human abilities and Discerning between calls and then that'll be applied more broadly hopefully to their entire database and
01:09:37
um this is part of a researcher's PhD thesis so she'll be carrying on this work for for many years and she is part of a conservation group that is more
01:09:50
broadly trying to implement policies and change human behavior within the St Lawrence River and to so she's aiming to communicate her findings and then take
01:10:02
action by helping to put rules in place that will protect the population so esp's role in this has been supporting her in the analysis and then helping to
01:10:14
get the results out to the public and in the future we may also start working with the conservation groups themselves to implement plans based on our findings
01:10:25
uh-huh so you're not finding more communication research with the belugas um we're hoping to I think it'll depend on uh what data they have available and
01:10:39
um their capacity as well okay thank you Frank thank you Jeff would you like to ask another question yeah thank you for giving me the time I live in the west uh Western Rockies near
01:10:53
Yellowstone Park and there is kind of an ongoing renewed battle against predators whether it's now the grizzly bear debate is going to hit the national stage wolves cougars I mean it's constant I
01:11:07
already know of some people in What's called the hunting camera trap world and and I hunt so just to be clear I'm not an anti-hunter and I'm also not a pro Predator killer so set those
01:11:22
motions aside for a second but I already know of people who are building AI detectors with camera traps um that are already thinking through how do we do vocalization prompting to lure
01:11:36
in a predator then detect it visually right and then text a message and you're you're done right um one of the problems I see in this AI
01:11:50
space is partly the the race and the war about commoditizing Hardware not the software and the conservation
01:12:01
World Keeps producing this next new great sensing remote sensing device that can do audio context plus and we don't get it at scale right so we use the same cheap crap that comes you know from
01:12:14
years ago that file name things like dcsn and so I'm and you guys have a big Who's Who funding base at Earth species you can go to your website and see it
01:12:26
and yet it seems like we're taking a Microsoft approach of let a bunch of other people solve the hardware problem and we'll be good at software when we sh if we're going to win some of these wars we need to take an apple approach and
01:12:39
let's consolidate hardware and software into some solutions that are going to do good sorry for the sermon anything you can do to help with those of us in the field would be great yeah I think uh love your question and
01:13:00
love the issues you're raising Jeff as well as helping broaden the scope of this challenge to it the tools that we in the front lines of conservation and in the field of in the multi-stakeholder
01:13:13
field of interacting with non-human species it is uh it Hardware is a key part of the entire collection process of data and how the those stakeholders are
01:13:25
diverse in their incentives um diverse in their motivations and so we have a responsibility as technologists to consider um what those uh what those incentives
01:13:38
and what those value systems are so when we are innovating and we're thinking about supporting uh you know our our aims of changing our relationship with non-human species
01:13:50
we're taking into account you know the unintended consequences or the bad what we might judge as Bad actors in this space um and as I raised in the earlier in the presentation this is a very big part of
01:14:03
our work is sort of ethics and safety um for machine learning models um and whether they're in the cloud or on board in Hardware it's important to think about about these things with key
01:14:17
stakeholders like you all and making sure that everyone is educated and can communicate about this so that we might come to conclusions that um we feel are are aligned with our
01:14:30
principles um and human Wildlife conflict and de-escalation right now as we as we know is sort of it's driven by a certain set
01:14:42
of values where animals can be destroyed or translocated um sort of in this in in a very specific way especially in the west so changing those underlying social values around
01:14:54
how we relate to animals is also very important in addition to um the technological advances we might have about understanding them better yeah I mean it's just as a quick
01:15:08
follow-up there are lots of positive things going on right there are different value systems but what I'm getting at is democratizing sensing in hardware for the common person who lives out in the field whether it's a Rancher
01:15:20
who owns a large landscape and you know is for coexistence but has to deal with poaching whether it's a person in Nepal who's four tigers but has to go into the forest to gather wood and doesn't want
01:15:34
to get killed and I just see us failing we have Ivory Tower sensing devices that the few get access to right and they
01:15:46
produce all the data you guys crank it out and create some good AI models but the democratization I'm not talking about bird feeder images I'm talking about audio and visual camera traps 360
01:15:59
Degrees we know how to build them you know they're in the commercial Market but we just don't seem to have the interest from the big donors to say how do we do this so that we can help
01:16:13
people do conservation in the landscape because there are plenty of people who would like a gunshot detector that triangulates down to the location and they know a wolf howl just came through I mean it can be used for security and
01:16:27
policing as well so anyway again sorry for the sermon I hope there's other people interested in this and we can keep the dialogue going yeah no I love this and I um coming from the remote sensing uh an
01:16:45
animal-born sensor Space drawing down those costs and making it accessible to stakeholders across the entire landscape is really important and I love that you bring that up Jeff and I don't know if we still have Neil
01:17:00
gushenfeld in the the room but I think you know the Open Source Hardware community that comes from the Fab Lab world would be somewhere that would be very interesting to have these kinds of discussions I don't know if anybody is
01:17:13
aware of any projects happening in that space but that could be somewhere to start that conversation as well or to continue it Jeff so thank you very much I think we we have almost got to the halfway or half a
01:17:27
half hour past the hour so I think that we might in respect for everybody's time on a weekend wrap it up there I think this has been one of our best attended lectures so this is really exciting and
01:17:40
it seems to be a field that is really picking the interest of lots of different people who really come together in this field of being species internet and what we're doing in terms of bringing ourselves and and our work
01:17:53
closer to the world around us so the chat is very very full so please if you do want a copy of that you can you can email us um at info interspecies.io and I will send
01:18:08
that over to you that's available for anyone and please keep in mind that you can always join our slack and keep this conversation going they shared the link to that um before in the chat and you can jump
01:18:21
on there and find the people that you've been talking to during this so yeah I hope that everybody has enjoyed this session today and thank you again for joining us Sarah and Katie this has been
01:18:33
wonderful to get an update on everything you're doing and we hope to see more from you soon thanks everyone and have a wonderful weekend
End of transcript