Video: The Underlay: A Distributed Public Knowledge Graph

00:00:00

hi everyone welcome to protocol lab's research seminar series today's speaker is joel gustafsson joel's research is in knowledge representation and distributed systems at protocol labs he is working to create a decentralized semantic web

00:00:12

to enable more open and equal access to information his other interests span interface design media theory human computer interaction and programming language theory joel has previously worked at the media lab

00:00:25

csail and notion labs and he holds a degree in mathematics with computer science from mit today he will be talking to us about his research project the underlay joel i'll let you take it from here

00:00:38

all right thank you for the wonderful introduction jonathan um okay so i'm joel uh i want to talk about knowledge graphs today

00:00:50

uh so here's like the rough outline uh i'm gonna try to do a really quick introduction of knowledge graphs as a general concept section two i'm going to try to look at some limitations

00:01:04

of knowledge graphs and issues that you start to run into when you start trying to make really big ones and then i'm going to shift into proposing a concept of a distributed knowledge graph uh and

00:01:17

go through an overview of the underlay a distributed knowledge graph project that i've been working on for a while um just as like a first level breakdown of it's like different components and ingredients

00:01:29

uh so i'm gonna try to move fairly quickly but if you have any like questions or clarifications or anything feel free to uh interrupt me if uh if it feels right

00:01:43

so knowledge graphs what are they interestingly enough knowledge graph is kind of a contested term and it doesn't have any one formal definition uh it's been around for like a long time

00:01:58

since like the 70s uh in early 2020 uh this 136 page survey paper was published by a large group of authors which is kind

00:02:09

of i found to be like the best and most comprehensive introduction to like the state and history of the knowledge graph world and it helped me a lot and in it they take a sort of observational descriptive

00:02:21

approach that i'm going to borrow in this introduction here so in the next couple slides i'm going to try to highlight four like primary essential characteristics of knowledge graphs

00:02:34

and the block quotes in the next slides are from this survey paper so what's a knowledge graph well it's a graph it's data that is in a graph data model

00:02:45

in the sense of like nodes and edges uh but it's not just any graph it's specifically one where the nodes represent entities and the edges represent relationships between those entities

00:02:57

and this idea of representation is very important a knowledge graph is a graph that has this direct correspondence to the real world this like specific kind of correspondence to the real world

00:03:09

so in this uh very simple example graph in the lower left we have some entities like bob the mona lisa leonardo da vinci and they're related by the edges which are is interested in was created by

00:03:22

etc another way of looking at the idea of a graph data model is that a graph data model is basically data that has been like enriched by one level or by like

00:03:39

one degree of abstraction and that like shift is from thinking of things as like the first class things to thinking of relationships as the first class things so another way of thinking about a graph

00:03:52

data model is basically like a knowledge graph is a database of relationships which is like one step more indirect than the way that we normally interface with data so as an example rdf is

00:04:07

the most prominent graph data model that's out there it's the model for the semantic web and an rdf graph is modeled as a set of these subject predicate object triples

00:04:19

uh and so an rdf graph you know you can draw it pictorially as a graph but the way that you like represent it in this like hypothetical sort of uh very fake example database uh the way that you'd actually like

00:04:32

manipulate or represent it is just as a big list of these like subject predicate object like statements so you can see the sense of like it's a big database of relationships

00:04:43

that this like uh degree of indirection and abstraction is also a degree of freedom which is why knowledge graphs are very powerful because they can represent very heterogeneous data so in this other example graph in

00:04:58

the lower left this is from the survey paper we have a bunch of nodes and edges that are relating like events and venues and places but there's not any like schema or like consistent structure to like the

00:05:10

types of things that are in here there are no required properties like some of the edges have cycles around them it's this big kind of like chaotic miscellaneous mess uh

00:05:21

so like whatever we happen to know about the entities we can just like say it we don't have to refactor you know anything to add a new kind of relationship because relationships are the first class thing that we're dealing with um even though in practice they aren't uh

00:05:34

literally stored in like a sql database like this so okay so if the knowledge graph is just a database of relationships you might think well uh there have got to be other ways of like structuring

00:05:50

what a relationship is uh or like other models of relationship and sure enough you can do that and like you just derive a different kind of graph data model uh so you can imagine that the rdf model of

00:06:04

like subject predicate object that's like a powerful kind of like a relationship and you can represent lots of things like that but you can imagine some kinds of things that is awkward to like fit into that thing not every kind of relationship

00:06:16

has a distinct like subject and object so if you were to say hey i want to like uh generalize this and my database of relationships are going to have like arbitrarily many named roles or something then you just independently derive the

00:06:29

hypergraph data model uh so the point of all this is like we tend to think of graphs visually but this correspondence between a graph and like the distillation of idea of relationship is very deep they're really

00:06:42

the same thing um and it also uh the other point i want to make here is like the choice of specific like graph data model the specific way you model relationships in a knowledge graph

00:06:54

has consequences but is not particularly like uh that important like you know no matter you know you can model a hyper

00:07:06

graph as a certain kind of directed graph with a little more structure and vice versa the specifics of how you pick to model your relationship aren't the biggest thing in the world um so knowledge you guys care about entities and relationships the next

00:07:21

thing to say about them is that they are typically kind of like semi-structured or half structured in a really odd interesting sense so if we look closer at the example graph in the lower left uh

00:07:34

we you know when we look at these two relatively central like event nodes eid 15 or something and we look at their outgoing relationships uh the node has a name okay we expect that

00:07:46

it has like a start time and it ends time and a venue relating it to a place um but then uh it has these three outgoing edges that are not really like the others

00:07:59

that are labeled type and if you think about this this is kind of like wild so the classes that each node that each entity in here is a

00:08:11

conceptual instance of those classes are represented right there in the graph as relationships to entities that like represent those class classes which is kind of like collapses

00:08:25

uh levels that are normally kept separate and this practice is very common especially in the rdf world where almost every rdf graph that you get will come like bundled with a little ad hoc class

00:08:38

hierarchy right inside the graph itself um which doesn't actually inherently like mean anything none of those like semantics the data model isn't aware of any of those semantics

00:08:50

uh it's just there so you can like query for instances of a given class in the same way that you'd query for like any other graph pattern this is what what i mean by like half structured knowledge graphs don't have like a schema

00:09:03

in a strict sense but they usually have these like weaker intermediate degrees of internal structure which are kind of like confusing to deal with conceptually

00:09:16

what else can we say about knowledge graphs one observation that the survey highlights is that a knowledge graph is not just a graph database and it's also not just a graph database that represents

00:09:27

things it's always a graph database that represents lots of different kinds of things so in the world when you find a knowledge graph you don't normally just like create a

00:09:41

knowledge graph or even like manipulate it directly you normally grow a knowledge graph by integrating lots of different data sources like together into the graph so a knowledge graph is like it's a

00:09:53

database of relationships that comes from the integration of many other data sources and lastly there is here's where i'm really going to go off

00:10:05

the rails uh there's uh like a continuous like living process oriented aspect to knowledge graphs so the according to the paper they say

00:10:17

effective methods for extraction enrichment quality assessment refinement are required for a knowledge craft to go and improve over time i would go even further and say that there's no real difference

00:10:29

between a knowledge graph and a knowledge graph project which is a little silly it's like saying that an administrator is part of the database but i think it's a good perspective to

00:10:41

take especially for knowledge graphs i think the best way to think about knowledge graphs is as a loose collection of ongoing processes and uh it's it's easy to overlook this

00:10:54

and just focus on like the data because those processes are often like implicit they aren't that visible and they're kind of ad hoc there's they aren't standardized and they're like done by people

00:11:07

and they are embedded within like organizations and have really specific workflows uh and so on and so the overall picture that i'm trying to paint here uh is that a knowledge graph is like a

00:11:21

mushroom there's this like big static produced artifact up top which is the data that's in the graph database but that's just like you know uh that's just the fruiting body and the real organism the real

00:11:33

thing of interest the actual substance of it is this network of living processes that are like constantly pulling in data from other sources and integrating and working to integrate them into the graph uh so that's what uh that's the overall

00:11:47

picture of a knowledge graph oh yeah okay so this is like a little tongue-in-cheek but um yeah like a process perspective is an important lens when you're thinking about data

00:11:58

so uh if you shift over to the process world you might say knowledge graphing is the practice of integrating reconciling and curating multi-domain data sources

00:12:11

um so uh who's actually engaged in this uh we'll really go quickly through these because they're not that important but like there are lots of big like general purpose open projects not lots but like several

00:12:24

prominent ones maybe the most prominent one is wikidata which is like all of the structured information in the boxes on wikipedia on the right uh before that was freebase which is the first real like large scale crowdsourced

00:12:37

knowledge graph is run similar to wikipedia but it was like a knowledge graph structured data and then freebase sold to google in 2010. um but also enterprise knowledge graphs are really

00:12:50

a hot topic now um lots of different companies are finding it valuable to like grow their own little internal knowledge graph for the things that they're interested in so airbnb for example has an internal knowledge

00:13:04

graph relating like events and places and attractions and interests and topics all together which you can imagine to be like very heterogeneous and like interwoven um to

00:13:16

populate like recommendations and like little interested in carousels and whatnot ebay is another interesting example they have a big knowledge graph that models their internal architecture like which services interface with which

00:13:30

other services and they find that useful for yeah something and then the big elephant in the room is of course the google knowledge graph which powers all the info boxes that you see when you search

00:13:41

for movies or literally anything else do they do this so like what's the value prop of the knowledge graph what would you like to get out of this well

00:13:53

the obvious ones are like airbnb and google and wikidata where like letting people explore the graph is of direct utility um but there are lots of other uses too so like one big one in the rdf world is

00:14:05

reasoning once you have a graph you can use like entailment rules to do logic and like derive more statements from the statements you already have and that's like a whole field in itself you can do since you have a graph data model you can do graph analysis stuff

00:14:19

like finding like centrality uh and like spectral methods to do whatever spectral methods do um and a new one that's uh becoming really interesting is knowledge

00:14:32

graph embeddings where uh you can compute vector embeddings for all of your concepts and then use that to train ml models for like really interesting like improving domain specific search and whatnot

00:14:45

so the theme behind all these uh use cases is just generally like there's a lot of if you have a lot of different data sources and you like do the work to get them all in like one place in one big graph network of

00:14:58

relationships there's just a variety of ways you can capitalize on that okay awesome uh section two scaling knowledge graphs

00:15:12

the first i hinted at this earlier but the first interesting thing to say about knowledge graphs is that unlike traditional databases

00:15:24

which scale vertically in a sense knowledge graphs scale horizontally this analogy isn't perfect but i'll try to explain what i mean like most databases

00:15:36

pay a relatively like upfront cost to like design you know their schema and like the layout and whatnot and that schema is of course like subject to a continual process of like evolution and refinement

00:15:49

but for the most part like the dominant way in which you manipulate your database is just by like adding things and mutating things and like working within that one schema knowledge graphs are kind of interesting

00:16:02

in that uh like the dominant way in which they grow is not within any one domain but like horizontally by integrating more and more

00:16:14

different kinds of things so it's a different direction uh from this perspective that like degree of freedom using the graph data model is like that's the mechanism that's the

00:16:27

transformation that transforms like horizontal conceptual scaling into vertical physical scaling since you actually have to physically do this you switch gears into a graph data model so that when you add more different kinds of things

00:16:38

uh like physically uh you don't have to like change be changing things all the time so okay with that in mind let's like uh let's let's like let's like step through this process what is

00:16:54

like a day in the life of integrating something into a knowledge graph look like so suppose we have an existing knowledge graph this is the example from the survey on the right and we have some new data that

00:17:06

we want to integrate so we have our little graph about like events and places and now we want to integrate some data that we found on the internet about like cities and their populations and locations

00:17:18

so what do we do uh well step one is we take the new data new data that we want to integrate and we transform it we project it onto the graph data model that we're using

00:17:31

uh very commonly this produces just like a forest of similarly structured trees kind of shape so you know here we have a bunch of different rows each of those are going to produce this node and then

00:17:43

for each column we have an outgoing edge which maybe has like a little recursive kind of like nested tree beneath it to represent that value so that's pretty straightforward um step

00:17:56

two so once we have the new data that we want to integrate and the existing knowledge graph in the same data model uh we go through and we identify the like common entities and

00:18:09

relationships between them so we go through so by reconcile here i mean like co-identify or equate uh and this is like the same concept as joining but i don't want to say join because we're not really doing it on

00:18:21

like a whole like column or anything since we lost that consistent structure by projecting onto the graph data model so like mentally we kind of do it individually for every uh pair of like like for every equivalence class that we want to merge

00:18:34

basically so we go through when we find like the shared things uh and then once we've found the common things uh we merge them and then we compute the merged graph and great

00:18:45

we've grown we've integrated data into our knowledge graph and we can just do this over and over and over again until the knowledge graph gets bigger and bigger and it's awesome

00:18:59

um right so uh as you can probably guess i don't think it's really that awesome my claim for this section is that there are like fundamental limitations on how well this process can possibly work

00:19:16

which depending on exactly how i say it and depending on your like philosophical predisposition might seem like incredibly obvious or like impossibly pedantic um so we'll see

00:19:30

which one it is so what are some potential issues here like what could be wrong with this well you might say uh you might expect that like different sources are going to represent the same thing

00:19:42

in different ways and yeah that's definitely true and that's an issue the knowledge graph like solution to this is just to have everyone rendezvous at a common graph data model

00:19:56

and sometimes it's hard to like do that projection step and to like you know figure out how some given data structure maps onto the graph data model and sometimes you get sort of like awkward structures in like the image um

00:20:10

but graphs are like pretty general and overall most things like fit them like reasonably well this isn't this usually isn't like that big of a deal uh what else might be wrong well maybe like uh you're you're worried about like

00:20:21

sources conflicting um and yeah like that is definitely a problem in how you handle it you know uh i don't know how do you want to handle it in the worst case you could just keep both values since graph data models are like

00:20:34

have plenty of like generality and allow that um that's usually not like a big like structural like issue um the big structural issue

00:20:48

is not that different sources represent the same things in different ways it's that different sources never actually represent the same things in the first place so the danger isn't in the

00:21:03

projection step mapping things onto the graph data model the danger is in the reconciliation step when we make this statement equating two things that are not actually strictly equal

00:21:17

um so yeah this can seem like an unbelievably pedantic thing to decide to care about and i'll be the first to tell you that on but like

00:21:33

like it or not the real world doesn't actually consist of these fixed discrete entities and relationships and as you like try to compare and reconcile

00:21:45

more and more roughly similar things you'll start seeing this as like more and more of it like a serious problem uh and you can pick like any number of

00:21:57

examples to try to illustrate this like just from the example graph you know when you hear a person in the world talking about santiago for example they might be talking about

00:22:11

like the municipality like the administrative region or they might be talking about it in the sense of like a political actor and describe like

00:22:21

desires and actions to it or they could be talking about santiago the vacation destination and the exact same thing is true for data every time you find something

00:22:34

called santiago in a data set that data set is coming from a particular context and each entity in it that you identify has a very context specific

00:22:46

often very idiosyncratic scope to it um and yeah we can like look at any any more number of like examples you know is david bowie the artist or like the touring group the

00:23:00

same as david robert jones obviously not but obviously they have some degree of correspondence user accounts obviously have some degree in some context of correspondence to like the people behind them etc

00:23:11

etc uh and so when we are trying to grow a knowledge graph like this and we're trying to integrate data into an existing ontology

00:23:23

by ontology i just mean a choice of division of the world into discrete entities no more don't be scared of it um so whenever we try to integrate something into a knowledge graph with an

00:23:35

existing ontology we're faced over and over again with this decision between whenever we encounter like a partial correspondence between two similar entities we're faced with this decision to either

00:23:47

reconcile them or to not reconcile them and like refactor the rest of the graph around that uh and both of these options have significant

00:24:00

like drawbacks like negative consequences if we choose to reconcile two things that only partially correspond to each other then the resulting entity in the

00:24:13

knowledge graph even if it has like the exact same id and exact is represented the exact way if we choose to reconcile the two things that partially correspond to each other

00:24:25

the resulting entity effectively refers to something that is less specific in the real world remember knowledge graphs like the point of a knowledge graph is to directly correspond to the real world in

00:24:38

this like entity relationship way so when we reconcile things that are only like loosely connected or not only loosely but are only partially connected we effectively like

00:24:49

dilute the specificity of the resulting knowledge graph on the other hand uh and this is the side that's like easier to imagine the consequences of if we choose not to

00:25:02

reconcile them then we just get this like explosion of like different but related concepts that make the graph like harder and harder to like work with so

00:25:15

this is not like really surprising anyone could have told you like from the outset that the value prop of the knowledge graph is in balancing these tradeoffs

00:25:29

the point that i'm trying to make is that every time that we integrate something we'll eventually like well in the course of integrating something we'll apply you know a combination of reconciliation

00:25:42

and refactoring to get the thing into the into the data set but every time we do that the overall like quality and specificity of the graph decreases no matter which option we choose

00:25:53

it's not like there's a single ideal that you know the knowledge graph can approximate but never get to it's not even that kind of situation we're like trapped on both sides to just get worse and worse and worse um yeah so it kind of sounds like i'm

00:26:09

just describing the problem of data modeling so it's worth emphasizing what is like specific to knowledge graphs about this problem because humans have been dealing with uh like the troubles and struggles of data modeling

00:26:20

for as long as like uh human civilization so what's like what's the real problem here well like we said earlier knowledge graphs scale horizontally so in most applications you pay for this cost of

00:26:34

data modeling like up front and you like continue like refine and adjust it but the scope of what you're trying to model doesn't like grow like

00:26:45

with like the factor of your application but in the knowledge graphs that's exactly the case with knowledge graphs you have to like re-encounter those same compromises over and over and over again even about the same entities over and over and over

00:26:59

again um let's see all right can skip the slide uh it's easy to it okay well it's easy to overlook like the significance of like that dilution aspect because when you look at

00:27:14

the knowledge graph before and after it's just a bunch of nodes and edges and like the labels are the same and you don't visually get a sense of like the consequences of merging things that are not really

00:27:26

uh uh identical um okay this is all this point is also a little out on a philosophical limb but i think it's important to try to make dilution here in the way that i'm

00:27:42

talking about it is more of a serious effect in knowledge graphs because there's often no like feedback loop or no

00:27:53

like inherent pressures to keep the knowledge graph grounded so what i mean is knowledge graphs don't usually have like very tight feedback loops with the

00:28:06

things that they're representing primary data sources usually do like they're directly modeling some part of the world for some purpose and that purpose like keeps pressure on them to like be accurately modeling that big

00:28:19

knowledge graphs are like open-ended and aspire to be general purpose and what that kind of means is that like the effective

00:28:30

the effective reference of the entity is in a big knowledge graph like that that has survived like uh iterations of integration the effective thing that they refer to are not actually concepts that anyone

00:28:44

really uses in real life they're not even necessarily averages of concepts even if that meant anything they kind of tend to lift off and like

00:28:56

start attaining like a life of their own um is that bad maybe not inherently but it means that a big knowledge graph's correspondence to the real world isn't like held down very tightly uh

00:29:11

the term in my head for this is uh ontology from nowhere knowledge graphs tend to accumulate an ontology from nowhere uh which is like a reference to the view from nowhere which is a term in journalism

00:29:23

i'm gonna check in with the chat to see what's happening good stuff keep it up okay so uh what else is there to say about like the seriousness of i don't know this class of like semantic

00:29:41

concern well one of the further like complicating factors is that a lot of like these processes a lot of the things that really influence the like end result of a knowledge graph

00:29:52

are like opaque like these processes the actual thing that form like the mycelium or whatever of the knowledge graph underground are just inside like companies or projects or are kind of like implicit in just the way that

00:30:06

things have been set up and so it's hard to like really identify them they're not very like visible and lastly the overall stakes of this kind of thing

00:30:18

are pretty high and only getting like bigger like data is a really big deal uh you don't need me to tell you that but like specifically with something like the google knowledge graph we're getting

00:30:31

to the point where the google knowledge graph isn't just modeling some part of the world the google knowledge graph actually like is the foundation or like the reality uh

00:30:43

of like a significant part of like people's like digital experience uh and so you know it's like it's serious stuff we want high quality you know semantically sound

00:30:55

like ways of working with data and that like sense of like i don't know a semantic engineering mindset isn't really common or well understood um so to summarize okay i

00:31:12

i've been using the word ontology which i don't really like to use because it's kind of antagonistic uh so we could say instead that the theme of what i'm talking about here is

00:31:25

context data is only meaningful relative to a like specific context every data set every entity and every data set is like has that like

00:31:38

little specific idiosyncratic scope that comes that is like defined by that data sets context um that context like isn't explicitly represented because we have no idea how to

00:31:51

explicitly represent something like the scope of what an entity refers to uh so that context is like where it is is like implicit you're only able to interpret

00:32:02

a csv by like like through the fact that it has these human readable column names and also because that you know from the website that you got it from what it is generally supposed to be about and that's what

00:32:16

lets us like assign a scope or facet or role to you know what santiago means in this data set for that data every time we go through this integration process that context from each data set that we

00:32:32

integrate gets compressed which is even kind of generous you could also say that it just gets destroyed and the accumulated result of this iterated integration process

00:32:46

doesn't neces like we don't really understand like the kind of effects that accumulate when you try to just create a general purpose knowledge graph uh that is like their value prop is to sort of like be in between a bunch of things

00:32:58

um but like those don't actually refer to any real concept that anyone actually uses uh

00:33:11

and so uh to really close it out with something sort of uh hyperbolic uh my sense that yeah but my basic message here is like as you try to grow this knowledge graph it's going to get worse

00:33:23

and worse in every possible way which checks out it should be totally it should be impossible to create like a perfect universal database of everything and you'd expect that the more you try like the thinner and thinner you're going to stretch your ontology

00:33:35

and the harder it is to like organize things in a way that like makes sense to anyone um cool so having having having seen that uh

00:33:53

let's go back to uh this slide where we tried to like uh just like i said that like the theme or the thing that connects the use cases of knowledge graphs was that there's a

00:34:07

lot of value in getting many different things together in the same place okay so revisiting this with uh the context of thinking about context and anthology um it's definitely

00:34:21

literally true for some of those applications like the machine learning applications like it's clearly like the value prop comes directly from just having a big graph uh but i think describing it in terms of

00:34:35

like getting many different things together in the same place is overly specific and actually we could maybe rework this into like a more natural characterization

00:34:46

of the function that knowledge graphs really serve and here it is i think essentially the purpose the role that knowledge graphs play when people like use them for fun and profit is they

00:34:59

serve as an interface between contexts so for most of the uses people use knowledge graphs to like transform one context into another so you have

00:35:10

some like application something that you want like a cross domain query or like a related items carousel that you need to populate uh and you need to connect that with your like data sources

00:35:23

which are all organized like completely differently and they aren't just in data different like data formats they're like organized differently like they have different concepts of like what exists uh

00:35:35

and so i the knowledge graph approach to uh like addressing this is to be like the universal intermediary you put all of your data into one general purpose context and then you get

00:35:49

it back from there uh going a little further you might even say that knowledge graphs solve the context translation problem through centralization and being protocol labs

00:36:01

and given a diagram like this uh you can basically just uh guess what the next slide is going to look like right uh perfect awesome we did its team

00:36:12

okay so uh this okay this slide in this diagram is like uh the mountaintop this is like peak abstraction actually uh how where how am i doing on time what have

00:36:28

we got left okay there's a there's a little less than 20 minutes uh take all the time that you need we'll pause a little bit before the hour to make sure no one has questions if they have to leave but just do your thing okay i think i think we're

00:36:45

on an okay track okay so this slide is kind of like the mountaintop here this is like peak abstraction and it is now our job to try to like descend the mountain

00:36:56

and like try to rope this idea back down to the earth and try to figure out if like concretely it really like what does this really mean what am i really talking about here

00:37:08

uh because it's easy to be like really like uh i don't know uh cavalier about decentralizing things uh so what are we talking about and what is it worth so the first point that i want to

00:37:24

really drive home is i am not talking about data formats this looks and feels very similar to like the ipld vision

00:37:38

where we are trying to build a big graph of different formats that all know how to like interpret themselves and link to each other etc etc uh that's not it the different nodes

00:37:51

in this diagram on the right are not different like serializations the different nodes here are different sets of decisions about which real world things map to the

00:38:04

same entity and which real world things have their own identity okay so if anything this diagram kind of like presupposes a common graph data model

00:38:18

and or like a way of uh like mapping between them ipld or something so the division between like syntax and semantics isn't a perfect figure but very roughly you could say i'm trying to

00:38:32

talk about semantics and i feel these uh about syntax that's not it don't pull that too far um also to illustrate this idea uh this little diagram this illustration

00:38:46

in the center is from a cmu lecture on order theory and it's just one of my favorite images in the world uh it's illustrating what's called a galwa connection which is a kind of like

00:38:59

algebraic correspondence between two structures that is weaker than an isomorphism but stronger than a homomorphism um and you can see like on the bottom like

00:39:13

there are like lots of dots and some of them like kind of like roughly like relate to each other and then also those like rough groups like relate to each other in other ways

00:39:24

and a galway connection describes a sort of like summarization or like abstraction relationship between two maps of things and i don't mean that all of the arrows in the diagram on

00:39:37

the right are literally like some formally typed like kind of algebraic mapping but i really like that concept and that illustration of that concept that's exactly what i'm like this is

00:39:50

exactly the type like the intuition that i'm talking about that there are different ways of like reorganizing or drawing boundaries around things uh and that we should be thinking more explicitly about

00:40:02

those as like mappings and those mappings should be like reified okay so yeah what are we talking about uh what what is it like meaningfully what does it mean to like meaningfully decentralize

00:40:17

a knowledge graph well we said before that a knowledge graph is like a set of these processes you know including people who do people things uh that all work to sort of like pull in

00:40:30

data and reorganize it and integrate it into the into that knowledge graph so decentralizing knowledge graph means like identifying those processes and lifting them out from like being implicitly embedded into things

00:40:43

into their own like explicit like reified resources themselves putting putting interfaces around them uh yeah and yeah putting interfaces

00:40:56

around them which is like the real story of the whole pl playbook really not as much about decentralization but it's more about like uh interfacing so one reason to do this like one thing that you get

00:41:08

out of this is like the natural application of like decentralization which is that these different things could happen in different places under different people's control but just making them explicit

00:41:22

itself is like just as important or like even more important um because when you make those like when you make a process like that individual processes like that explicit

00:41:35

then you're able to like inspect and critique and maybe substitute them or change them um and like the whole like opacity of the knowledge graph uh starts to fade

00:41:48

away because you've like reified the actual network that the thing actually is in yeah and if you like want to indulge me in like a little meta comment this like very roughly mirrors the core

00:42:02

idea of the knowledge graph itself which is about redefining relationships uh that there's not like too much interesting to see in that thought um yeah emphasizing like the human elements again with these processes

00:42:16

it's great if they're like literally algebraic correspondences um but that's obviously not going to happen because so much of what goes into a knowledge graph is like about teams and people and

00:42:27

review and curation um but that doesn't mean like we can't just make we like we can still effectively decentralize the knowledge graph uh just by coming up with interfaces for them

00:42:41

um yeah so this is kind of like the [Music] vision that i'm talking about a decent a distributed knowledge graph uh as a like federated network of

00:42:53

context mapping processes uh so can we be more specific um sure so there's a project called the underlay that i've been

00:43:07

working on for a little over two years and anyone who's like followed the progress of the underlay uh knows that it's gone through a lot of iterations from just being like

00:43:20

uh pub sub for like signed rdf data sets to being like uh a decentralized query engine to being like git for data uh but with category theory

00:43:32

and now finally turning around into a much more like coherent vision that i've felt better than ever before which is this the distributed public knowledge graph

00:43:45

so what's actually like a component of this what's like the first order breakdown what are the ingredients uh so the underlay is like a project and i don't want to like describe it as anything more

00:43:57

specific than that because the things that we need in order to create this ecosystem like are very varied like the things that we need are like a collection of tools and services and those all need to exist in order to get

00:44:10

an ecosystem like this off the ground but to a first order breakdown those components are like a data model so just being able to represent data and like store it this could be

00:44:24

as simple as just picking one of the existing popular graph data models or this could be something a little more sophisticated uh and this is actually like the front that we have like spent the most time on and

00:44:37

like developed the most and i don't think i'll get into it specifically but i'm like uh i'm happy to talk about like the specifics of the data model if anyone wants to like uh

00:44:48

ask afterwards yeah um but to finish like the quick overview here uh the the next component so once you're able to represent things you need to build like manipulate things just in like a rest your like database

00:45:01

management sense and between these like the first two this basically comprises the get for data vision that a lot of people have independently come up with but i don't really like the get for data

00:45:14

analogy because it like when you enter it from that framing you just kind of like stop there and all that you think about is like uh just working like with yourself or like with your like

00:45:25

close collaborators on a specific resource and you don't think about like a larger like connective ecosystem which i think is the real goal that like we need to set our sights on uh so you know once you have a data

00:45:38

model unless you have uh just a means of interfacing with it the real like bread and butter are a new set of like integration and

00:45:49

curation tools that are specifically ones focused on the kind of like semantic reorganization that is specific to knowledge graphs

00:46:02

the easiest like concrete kind of reorganization to imagine is that like summarization or like abstraction relationship where you know maybe in like data as it's used by people who

00:46:15

produce it they're like a very complicated like many different you know nodes that really if you're integrating it with a broader knowledge graph could be summarized into just like one node or a complex kind of relationship that

00:46:27

has lots of like metadata whatever associated with it that really for the purposes of someone else can just be summarized into one um but like summarization is really just like one dimension along a much more like

00:46:39

complex space of ways to like reorganize uh your ontology and no one's really talking about it like that directly in that sense and i think it's partially

00:46:51

due to like people just over indexing on being like on like syntactic like transformation tools when you think about like manipulating data uh it's easy to think about like

00:47:03

manipulating its syntax and semantics in one and that way seeing the difference between those two so it's a suite of tools for uh specifically doing the kind of like semantic manipulation that is characteristic

00:47:15

and integration projects for knowledge graphs uh we also need like uh like tools to just like run locally to host things and like services that people can pay for to like

00:47:28

store things uh with the uh knowledge futures group which is one of the um like one of the collaborators on the underlay project uh we're developing

00:47:40

basically like github.com or npmjs.com for the underlay uh which is what i've spent like the last three plus months on developing which will hopefully be publicly available

00:47:52

sometime this year um which is basically this like uh for the data model and the protocol uh you want to be able to like publish things in a way that just is universally accessible that people

00:48:04

can clone them and manipulate them their own and we need people to actually use it which is obviously tough uh a lot of projects

00:48:22

in like the semantic web space don't end up going anywhere because they follow like the general form of like man if everyone's structured out of this way then things would be really great and they're not even necessarily wrong it's just like uh how do people start

00:48:35

doing that um so yeah yeah i don't have any answers for this as much as uh i have just like uh uh this breakdown of the project and

00:48:47

like uh a commitment to like working on these things this is it i guess yeah yeah i'll take uh any questions i can try to like expand on anything that anyone's particularly interested about yeah thanks that was really cool um in

00:49:00

terms of uh ontology dilution um a way of putting that is that when you combine all of these different knowledge graphs together you lose sort of a coherent

00:49:13

world view right is that is that creative way of thinking about it yeah and i i i it's a little the claim is a little more narrow which is that when you combine that you have like

00:49:25

no guarantee whatsoever that the result corresponds to something coherent it's not for sure that it doesn't happen it's just that there's nothing there to make sure that it does right and it seems like if you if you

00:49:38

have two separate knowledge graphs that are sort of organized according to different ontologies like you could you could imagine sort of two different ends of a spectrum one in which you just take both of them separately and just like paste them into

00:49:50

the same space and the other end of the spectrum being like they're completely interoperable right and so you you talked about how there's no there's no force like i thought i thought it was interesting how you framed it in terms of forces

00:50:02

for like enforcing some sort of a coherent ontology or world view once yes the image in my head is like as you iterate integration a bunch of times your concepts just like tend to drift or like expand and like though

00:50:16

that's just what they'll naturally do when you try to do that because each little delta you're like oh i can merge this with this it's like close enough right and and you can kind of glue them together

00:50:27

with some connection like like a galwa connection or like an ad junction or something and and as a way of thinking about the end what the underlay is doing sort of providing tools for making those connections explicit yeah yeah absolutely

00:50:41

both like on a wide spectrum of tools both on one end whenever we can because it's great to be like uh mathematical and strongly typed when appropriate like whenever we can making something literally like

00:50:55

establishing like an injunction or on the other hand just making tools for humans to perform an intuitively similar task i have a bunch of thoughts and and some of them i i wrote down

00:51:09

um it's it's really interesting to me because this some of the underlying philosophy of making the implicit explicit mirrors so much like uh the pro the progress or the changes in

00:51:21

like humanities in the academia over the last x number of decades where i mean once upon a time probably even some of us and and folks listening uh or watching the recording there was this like really strong rule

00:51:35

that an author should never refer to themselves in in the first person when writing a paper or a book or something like that and and by the time i was in grad school like the the instructions were actually very

00:51:49

very direct that in your in your writing you need to name your social location explicitly like you need to acknowledge the fact of like what your gender identity is like what you're

00:52:01

i mean not so far nowhere right like for nowhere was this drastic ideal of objectivity which turned out to have a ton of like negative consequences when people tried to do it yeah

00:52:13

yeah so so just like this aspect of um sort of defining or making the implicit explicit is would already be such a

00:52:26

a revolution in terms of like every every aspect of what a user of the internet is experiencing when they search for something when they search for something with one search engine versus another search engine like something that says like this is why

00:52:39

this result is is showing up um yeah so yeah i mean that's absolutely the the part that needs improvement like no one's ever going to solve the no one's ever going to successfully make a universal knowledge graph because

00:52:52

obviously you can't but you can like address like the opacity and like the robustness of the steps that we produce the current ones with hey joel thanks for um giving a talk

00:53:07

yeah um so one of the one of the potential like i guess hopes that i have for um the underlay is providing more accurate

00:53:21

provenance for the internet and um where information um and creativity originate from especially from you know the perspective

00:53:33

as a queer black mom you know a lot of times um we go well historically we've just gone totally uncredited folks have set up shop in our spaces and and really just

00:53:45

pilfered at will kind of um from our spaces of um you know usually you know more private spaces and um this is this has happened quite a bit in you

00:53:58

know academic spaces and and in feminist spaces and all of these places i'm interested in sort of like how the underlay can help maybe address some of those

00:54:09

issues of sort of you know have you know improved integrity of you know attribution and things like that yeah this is actually the first time i think i've ever talked

00:54:22

given like a an overview of the underlay that doesn't involve like a heavy emphasis on provenance and historically that's been like like one of the core like central motivations for the whole

00:54:35

project is like we need you know the provenance of data what does that look like how do we build that in uh and that's still there it's just like not something we that was like useful to

00:54:47

like build directly around or uh but yeah like that's still here so like i like specifically with the like these processes like in order to have

00:55:00

provenance in the first place you need to like make them explicit so that like you have something to like point to or to talk about so ideally yeah in like the underlying world you want when you you know uh are looking at a thing you want to be able to know

00:55:13

not only like where that data originally came from but like all of the steps along the way of like who joined it with what how did this get here right yeah that's super exciting um

00:55:26

especially like as we continue to move into a much more socially integrated world from from the perspective at least you're virtually socially integrated i don't even know if that's like an oxymoron but

00:55:38

um especially social media you know we're seeing things kind of like open source music where folks are kind of creating really amazing um compilations of music but you

00:55:51

know they're happening on top of each other kind of in layers that this creativity is sort of um is is birthing totally um you know innovative and unique works of

00:56:04

art um and and how you know we keep track of that um and make sure you know obviously the underlay itself wouldn't make sure that folks are like appropriately credited or compensated

00:56:16

for that work but it is a foundational tool in creating tools that can help that to happen so that's really exciting yeah yeah totally we need like we need the infrastructure for it and

00:56:29

before that we need to like just understand like the concepts of that infrastructure so um i guess in my mind this is similar hopefully it's not asking the same question just from a

00:56:44

using different words but one of the things you mentioned in your talk was the emphasis on breadth rather than on depth um and literally right before you said

00:56:56

that i was thinking about like how something like this can incentivize uh movement away from the familiar and from like the echo chambers of information and ideologies and things like that so i

00:57:10

i saw that you stated that as sort of like a different orientation of of knowledge graphs but i don't know if i wasn't clear on how that how that is brought to bear

00:57:23

like at the at the user level interacting with something like how am i moved from sort of the the familiar um perspectives that i have political sources the sort of like

00:57:36

because search engines with algorithms often work in the exact opposite way right like they they tighten the knowledge and obviously like we we think small color instead of larger and larger so

00:57:49

is there an actual mechanism that moves the user away from that sort of shrinking yeah oh man i mean that's a huge question and that like even in the best case

00:58:04

that's like uh a layer built on a layer built on a layer built on the underlay um but that is like like yeah i do uh

00:58:15

dream about that like uh being able to be looking at a representation of a thing and then in the ui being afforded like like almost like

00:58:30

spatial tools to look at the thing from different perspectives or from different contexts like because because all the systems right now are so like totalizing like there's just

00:58:43

one like universal context there's just one universal thing uh and like the way that like we tell narratives and stories is like much more limited yeah and i i wouldn't expect technology

00:58:59

to to solve the the problem by itself it's obviously a human psychology and sociology and a lot of other things wrapped into that yeah it's interesting that you say that

00:59:14

um having you know it be able to look at something from different perspectives that's just like my brain just exploded um

00:59:25

that i mean it's just i was we were recently having a conversation in about you know kind of the rise of racist robots and um you know racist algorithms and ai

00:59:42

um and how that technology develops and like how impactful it could be for the people who you know the engineers and researchers who are at the center of developing that technology to sort of

00:59:55

find a way to kind of alchemize the perspectives um and and live sort of situations of more marginalized groups into the way they develop and perpetuate this technology and just hearing you say that i'm just

01:00:09

like uh i love joel yeah i mean it's like one of like the most common and like immediate effects of like that context collapse or like that context

01:00:21

like compression is that like like any structural biases just like propagate and propagate and propagate and you're forced to just have one way of looking at the world and then how is that way constructed well however it happened to

01:00:33

be constructed there yeah it's really interesting to think you know kind of creatively about ways to sort of penetrate that homogeneity it's really um inspiring any other thoughts

01:00:47

questions all right so thank you joel so much for that that was a fantastic talk uh absolutely loved it our next talk is tuesday january 26th at 1700 utc

01:01:00

from anka nitulescu who is a cryptography researcher at protocol labs their talk will be on verifiable computation over encrypted data if you're watching on youtube please remember to hit the like and subscribe buttons and share this video

01:01:14

if you want to suggest a topic or a speaker there are instructions on how to do that in the description or you can email us at research protocol.ai if you haven't already please join our mailing list this is how we advertise uh future talks research funding

01:01:27

opportunities and other important news coming out of protocol labs research our mailing list subscription is very customizable so you don't have to get any emails that you aren't interested in you can subscribe in the footer at

01:01:39

research.protocol.ai thanks everyone we will see you next time