Video: Accelerating Scientific Discovery

00:00:05

hey there welcome to today's protocol labs research seminar today we are joined again by joel chan who is an assistant professor at the university of maryland's college of information studies or ischool and human computer

00:00:18

interaction lab previously he was a postdoctoral research fellow and project scientist at the human interaction institute at carnegie mellon university and received

00:00:29

his phd at the university of pittsburgh you can find a link to joel's previous talk with us below and today he will be talking about accelerating scientific discovery by lowering barriers to

00:00:42

user-generated synthesis of scientific literature which will include discussing how scholarly practices could be transformed so joel i will let you take it from here good morning good afternoon

00:00:53

good evening i'm excited to share um an update on what i talked about in the previous talk which was linked in the chat and so let's jump right in the focus of

00:01:06

this talk is on synthesis and so i thought we could start by talking about what i mean by that my goal is to remove various defective synthesis so any scientist can ask better questions faster

00:01:19

and so what i mean by synthesis it's probably an intuitive concept but some examples include theories models design spaces and very good systematic or literature reviews and the

00:01:33

key intuition is that you create a new innovative conceptual whole that's greater than the sum of the parts of things that you're integrating so you wouldn't just be surveying and saying here's a bunch of literature but

00:01:45

you'll be sort of like giving insight into here are some of the gaps here are where we should be going next um driving progress forward so one concrete example of this uh led to a nobel prize uh this

00:01:58

is also duplo a recent nobel prize winner who credited some of her key inspirations to a masterful survey of the literature in a handbook chapter of economic developmental economics and

00:02:11

it really laid out some key problems in the field that she was able to sort of connect with her expertise and experimental methods so so this is super important if we ignore it

00:02:23

we risk wasting our time on questions that are trivial impossible misframed one of my favorite phrases is you can't play 20 questions with nature and win in fact there's a delightful

00:02:36

paper that recently talked about how to play 20 questions in nature and lose in the kind of area of brain training research where they lack this kind of synthesis this theory of what's actually going on and just had a bunch of

00:02:48

experiments one of the key things that i'm interested in is this problem that synthesis is actually really hard not sure how much this resonates to your own experience but this definitely resonates with mine and lots of people

00:02:59

that i've talked to one concrete example was systematic reviews which are really important um and they're really really difficult to do they take a long time to pull off and

00:03:11

they're often not updated um because it's difficult and so often they're out of date pretty quickly after they're published i think of this as a lower bound on how

00:03:24

difficult it is to do synthesis in part because systematic reviews are very singular focus on a single question using a single relationship whereas often you need to synthesize across multiple disciplines uh across

00:03:38

different kinds of evidence at least and so cleanly focus on just a single dimension right so just to give another like metaphor here um studying an ethnography of medical

00:03:51

system reviews um this evocative paper uh talked about this metaphor of feeding its slave to the trap data you have this sense that um of fighting against uh the infrastructure

00:04:04

it's not getting any easier um you know growing better knowledge is making this harder and especially as we face problems that perhaps were always interdisciplinary

00:04:16

required as to converge and uh integrate ideas across multiple fields of knowledge um this is pretty pressing so pretty neat the

00:04:30

core conjecture of this line of work is that it's not just about the tools it's not just about our motivation it's about the infrastructure it's about the unit of analysis right why why does google scholar work

00:04:43

that way why is it indexing papers why is it um what is the data structure that's operating on that's that's my focus think about this um you know when you go search for papers

00:04:57

you're looking for papers but really what you care about is what's in them the ideas the claims the arguments the theories the findings and the discourse relations between the

00:05:08

right support opposition replication lines of evidence lines of contradiction right these kinds of things are what you actually care about instead you get documents metadata and article types right so a concrete

00:05:21

example of this if you're interested in whether bands are an effective intervention for hate speech you keep that into ghost collar and what you get back is a bunch of papers

00:05:33

many of whom well some will be relevant they're all pretty much with me in the same topic area but now i'm going to ask to answer your question about what's the evidence for this particular class of intervention in the setting that i care about

00:05:46

theory is evidence problem solutions they're not first-class citizens same thing in other tools like semantic scholar um i should say it's not their fault okay but these these indexing systems work with what they have the data

00:05:59

structure is what's at issue similarly um we have entire industry of system review tools and processes that are essentially dedicated to working against the underlying data structure they are sort of like we've got these

00:06:12

papers and we need to work our way around it so we have all these processes for screening for data extraction to get to the thing we actually care about and then we don't share anybody else and everybody has to start from scratch next time

00:06:24

around uh this is a slightly cheeky meme about how this kind of systematic reviews are sort of abandoned on top of a to me fundamentally broken scholarly communication infrastructure that

00:06:37

doesn't allow us to ask the questions that we want so i think it's pretty important i see connections between this in terms of i think it's an underexplored under appreciated

00:06:48

potential mechanisms for what people are perceiving as a slowdown in scientific productivity and progress it's getting harder and harder to

00:07:00

to win a nobel prize now you got to sort of spend a longer time to kind of cross the field um we're doing more of our high-impact science teams uh which brings its own challenges his overhead and we have this

00:07:13

uh somewhat controversial but to me pretty concrete observations of um some slowdowns in the impact on research

00:07:26

so the core question is how to accelerate significant discovery by lowering barriers to synthesis um today i want to talk to you about um this class graphs um what are they why do we think that they're interesting

00:07:38

um why is it hard why don't we have them uh this is core idea by authorship bottleneck and i want to spend most of our time uh talking about this solution path of scholar power contributions as a

00:07:50

sustainable authorship model okay so first this close to france what are they so comparing to what we had before where we're looking for bands and hate speech uh literature um

00:08:03

something like this is actually what where we're trying to construct or look for right we have a question are bands an effective way to mitigate anti-social behavior in all our forms we have

00:08:15

claims um which are answers to that question right it could be banning as a strategy it's effective uh banning strategy cannot affiliate cannot scale and then we have underneath that evidence right

00:08:28

particular results that are supporting or opposing um these claims okay this this is a pretty intuitive model we think about how people talk about invalid papers too they'll

00:08:41

sometimes complain uh this paper makes claims that are not supported by its data right or we talked about theorizing from data this is kind of a core distinction between claims and evidence this is not an invention of mine uh this

00:08:56

has been talked about for a long time um you know decades right we've got a kind of decades of uh literature on standards for how do we actually formally represent um a discourse graph um we

00:09:08

have the skalando monthology it was based on claims we have medical publications we have cpo um we have a bunch of these um very mature well thought out clever standards for essentially how do

00:09:20

we represent a discourse problem gives you a little bit more intuition about why um discourse across a good idea beyond just the match on the surface with the kinds of

00:09:33

questions you're trying to ask um i like to think about it in terms of three scenes compression contextualizability and composability so compression is the first one that we were talking about where we actually want to find and

00:09:45

manipulate compressed units like claims not just whole papers right papers contain lots of different things in them but we actually care about manipulating thinking about the underlying uh ideas right so we've got these models of

00:09:59

scholarly augmentation but also in creativity literature if we break things down um that enables you to more creatively combine them with new uh theories and models and concepts

00:10:11

contextualizability is super important right this distinction between claims and evidence uh turns out to be pretty important right to really understand the scope of the claim uh be able to question it be able to

00:10:24

repurpose it um it's really important to dig into right if you have a claim like most private annotations aren't useful to other people you want to know uh how was most measured what kind of annotations in what setting to really

00:10:37

understand what this means for say some question you're interested in same thing on the band side right it really matters but then for example to distinguish submits between different subreddits that have different subcultures or different platforms for

00:10:50

different kinds of behaviors and interventions and so on the devil or diamond is individuals right we have lots of recent examples of you know the importance of context where we

00:11:03

really care about unbundling for example children into different uh age bands it turns out infants risks are quite different from young kids which is different from teenagers and many studies that don't

00:11:15

distinguish those we lose resolution and ability to synthesize there's a really fun twitter account called just says and mice that kind of riffs on this where a lot of medical studies i reported on and talked about

00:11:29

um without the context that this was a mouse model right they'll say like this implies x for for people but actually the context is is lost which is really in pairs of abilities

00:11:40

so we need access to that uh easily okay um we see this actually in user behavior as well scholars will constantly re-read during a literature review they will return to papers um

00:11:53

this is repeated also in studies of general sentence making where you got to keep the data information around and in studies of computer support and cooperative work where we have these kind of knowledge to management

00:12:05

infrastructures it's really important to retain the context of how a thing was produced how a piece of information was produced by whom and so on one fun desire path that inspires this

00:12:18

work is you know people repurposing tools for different functions so qualitative analysis software is meant for things like thematic analysis and interview analysis but we actually see scholars a

00:12:32

good amount of them actually repurposing it to do literature reviews because it supports the ability to do this dialogue between claims and evidence between the theory and the data which is really interesting

00:12:45

this core distinction between claims and evidence seems really important lastly composability um we get some of this from from formality if you have these units but you also have

00:12:58

connections between them you can construct uh more interesting representations to help you reason about things like tables or causal graphs or arguments and timelines something like this

00:13:11

so as i mentioned this is these are new ideas i'm not the first by far to talk about this there's a pretty large literature body of work uh building out these technical standards of infrastructures we basically have most of the warehouses

00:13:24

they're built but they're still mostly empty uh in the bottom right i'm not sure if you get the reference after the field of dreams if you feel that they will come we have some of these people talking about uh

00:13:36

the you know we want to have an ocean of these micro publications uh these disgust at the moment it's no more than a puddle um and this is this has been repeated across a lot of a lot of

00:13:48

different platforms this is our key bottleneck i call this the an authorship bottleneck um we have a different models for authorship that don't seem to be quite enough on their own one of the most popular ones is a

00:14:00

specialized curator model where you basically pay people or ask people to volunteer to do this extra work on top of the literature you can think of systematic reviews as one instance of that where they take in the dirty data right and then they kind of

00:14:13

add structure to it this turns out to be very difficult to sustain this is an extra piece of work that people don't want to do um one really sad recent example was this mark to cure effort there was

00:14:26

curating literature for ng ly1 deficiency um and they had to shutter because they ran out of funding essentially they couldn't sustain um some examples

00:14:38

of platforms like research objects hub or nanopublications the number of active users is actually very very small so we need something more uh some people think that we can do this in a

00:14:51

completely automated fashion with text mining uh this happens to be partially in my field um it's very cheap but it's significant accuracy and transparency challenges um extractive summaries of research papers

00:15:03

is very hard still a very open problem um and you know some of the promising approaches that you were exploring today with language models we have some pretty significant uh challenges in terms of accuracy and

00:15:17

transparency so i'm not optimistic that either of these are going to install the problem on their own so what i want to do is extend the space of possible contributions to look at this thing that we haven't explored yet which is what i

00:15:29

call scholar power contributions intuition um here the inspiration for the design patterns is some previous work that i've done where if you think about a um collective brainstorming effort right

00:15:42

where we have lots and lots of different people producing ideas sometimes it's useful to really understand like how the ideas relate to each other how they cluster um in the field of crowdsourcing they have these judgments they'll have

00:15:55

people do they'll look at these like three things didn't say which ones are related again similar to what we're seeing in scholarly domain it's very tedious people don't like to do it it's annoying however one insight that we saw is that

00:16:07

people naturally cluster ideas when you give them a digital bypass and this means something the idea is clustered together and there's some meaningful relationship between them and we can exploit this we can integrate the

00:16:19

usually tedious semantic adjustment work into this intrinsically morning activity and then construct things like an idea map that helps coordinate efforts uh deliver more interesting inspirations with basically no extra work right here

00:16:32

and integrating to the work that people are already doing that's one of the things that we're not exploring enough yet i want to explore more okay so specifically i've just explained how we can integrate scholar power

00:16:44

contributions of discourse graphs into individual and collaborative synthesis practices all right if you think about it people read lots of papers lots and lots of papers right uh just

00:16:58

some some rough estimates about the envelope um you know multiplying the number of uh faculty by the number of self-reported uh number of papers read per year we're in the ballpark of about you know 100 million

00:17:12

people is read per year compare that to the size of literature is pretty big if you look at it by itself but it's around the same order of magnitude right so this is i think an interesting uh source

00:17:24

of untapped creative exhaust people are already doing all this work we're wasting it what if we could you integrate into that work it's also a little bit more feasible to me there's lots of really strong

00:17:38

efforts and very smart people working on this problem of changing the way that people publish work to begin with which i think is super valuable but extremely hard it's really entrenched in incentives so i'm a little bit nervous about going

00:17:52

into that the area i'll leave it to people smarter than me and exploring complementary paths okay so that's all uh you know we talked about uh what this questions are gives

00:18:04

you an intuition for why they might be important why um you don't have them yet um so now i want to talk to you about how we could have them right this what does this mean the scholar proud contributions right

00:18:16

the basic idea is we give people tools to build personal discourse wraps for themselves or for their labs we give them the means to share and

00:18:28

federate this discourse graph with others and then over time we can layer protocols on top of this to start to aggregate these into decentralized comments of resource graphs

00:18:40

that's the roadmap so there's two questions here the first is um we started touch on this right is this even a thing that people are doing that we could integrate into right are there integration points for

00:18:52

authenticity and second question is so technical one right is it actually possible to build tools that can help people build discourse graphs that are shareable i'm going to focus mostly on the second

00:19:07

one so i'll move a little bit quickly through the first one just give a proof of concept right so the first question is about integration points for offering discourse graphs right we've done a fair amount of user

00:19:18

research participant observation uh need finding to really understand there's actually a lot of opportunity here we've put gopros on people's heads while they do reviews uh we've done interviews with people we've done participant

00:19:30

observation in communities of hackers and users tools of thought and um we've been kind of looking for uh where are scholars already creating artifacts

00:19:42

that have properties of compression contextualizability and our composability right the things that discourse graphs provide what we find is that there are integration points in a range of

00:19:54

behaviors and tools um all the way from people using really specialized tools to just people using everyday tools like canon paper which is really interesting right so uh on the lower end of the spectrum in

00:20:07

terms of like technical sophistication or i guess no technical uh uh new fangledness uh we have traditional tools that are used really well by what we call virtuosos right so you have

00:20:20

things like color coding your annotations when you're reading right they mean something right you're trying to distinguish you're not just trying to you know highlight things but you're trying to distinguish between uh you know things like in blue would be like a

00:20:32

claim right and in green would be like a piece of context for example um you know writing things in structured ways in word documents and google docs right they'll talk about the templates they'll have like here's the overview

00:20:44

here are the arguments here's the evidence right here's the key takeaway um there's these structured things that people already do um there are also other tools that are slightly more specialized that are

00:20:55

mainstream like uh we call it explorers using tools like liquid text that allows you to pull out these excerpts and then create um structured links between the items so we have this compression pulling out these

00:21:07

pieces and then uh composability having these edges between them and in contextualizability you can jump back to annotations i told you about repurposing qualitatively analysis software that's

00:21:19

another example right and also adoption of like tools like new tools for thought like roamy search or obsidian as oxygen we also have a lot of hackers who create

00:21:34

homes fun system enhancement with whole systems to enable these features um the orgmo emax community has created a open source part of

00:21:45

rome research this idea of backlinks into emacs for example we have you know it looks like this you can sort of integrate annotation features also into essentially a terminal

00:21:58

you also have people building on top of tools like zotero to enable you to enhance contextualizability of your notes right you can extract annotations in a way that allows you to jump back to the source of the pdf

00:22:10

just very briefly right um we we see across all these settings that scholars in their everyday practice without anybody telling them to do it already create artifacts with key

00:22:22

properties of compression contextualized being composed of bill so this is promising it's not enough though right we have this problem of private public alignment personal notes are

00:22:35

contextual they're idiosyncratic they're informal they lack structure what we want is something that's general shareable that has some level of standardization and reliable capture how do you bridge this is it possible

00:22:48

right is it even possible to tap some of this creative exhaust that's research question too right is it social technically possible to integrate authoring of shareable discourse files

00:23:01

so at this point i'm going to pause while i pull up the demo here's a of um a discourse graph tool um in the software rom restitch um so if

00:23:14

you're not familiar with it uh what you basically need to know is it's kind of like a document uh like a giant google drive folder but you have the ability to create outlines and you can create links

00:23:26

between documents and what's useful for us is that you see these like bullet structures and links actually um in the data structure underneath the documents that we can

00:23:39

parse so but you don't need to know that right so just looking at this right now you can imagine uh in the context of a research project you might have um a single document with a bunch of

00:23:52

sources right that you are interested in so for instance if you're interested in the susceptibility of children to cognitive infection um you might have a bunch of studies that you want to read right so this looks very similar to an annotated

00:24:04

bibliography a list of papers to read you know each of these things here right is a is a paper so if you look here for example you can

00:24:17

see we've got metadata this is a paper the key uh thing that the discourse graph extension allows us to do is to integrate into what people are already doing in terms of say taking structured

00:24:31

notes on papers for instance i can see here this is a paper uh it's a meta-analysis of previous household studies and you can see here looks pretty similar

00:24:43

to how you might take these structured notes as we saw in a word document right you have thinking about the aims of the paper methods what they do you can drop in screenshots have figures to

00:24:56

to keep the context right um we have definition of index cases uh and then we have like you know the results and contributions right one of the key takeaways uh so for instance you

00:25:08

can say here one main result is if you men are now doing melanosis of these 11 studies then you get an estimate of lower susceptibility for children versus adults right

00:25:19

um and specifically you can see the relative risk ratios here this is not too different from what you would see what we have here is the ability to

00:25:31

mark this result as a discourse node of type evidence just by a manner that looks like an annotation

00:25:44

right so you can highlight this get a hotkey and then you say this is now a piece of evidence then it shows up in the sidebar this tells us that it's now a document by itself that we can reference and search

00:25:58

for elsewhere okay so that's the just ux right so we don't have to do it from the start we can write informally and then as we think okay this is the key takeaway i want to

00:26:10

remember this but later i mark it this is not a piece of evidence if i want to uh i'll often do this in my own work as needed if i want to keep the details

00:26:22

handy i could for instance just drag this and put this in here so it's just right there right i could also copy over in the rom sense the key contextual details of the study

00:26:36

so it's all in one place right this compact micro publication has everything that we need to make sense of it and use it in synthesis it's got the summary it's got the methods that produced it if i want to dig into that later which

00:26:50

we'll talk about in a second so that's node creation and annotation now let's talk about how do we create edges between a discussed graph right for this workflow have nodes and has edges the edges are part of the

00:27:02

is one of the more annoying things to do without formality so let's go and see inside here what this might look like so inside this document we have sources but we also have a space to write in outline

00:27:14

so what i want you to notice here is right here let me just zoom in a little bit all right we talked about this first uh claim that children are equally susceptible this is a reference to a claim

00:27:27

right i've marked this as a thing that i want to remember this is a type claim and then i've indented underneath here some results right so yeah this one this is the evidence this is an evidence this

00:27:40

is an evidence yeah and i've written down here following the case for this thing right i've marked this as a support relationship with this right if we

00:27:52

go inside here right we can see right as you saw before it's got all this data in here but additionally we also have discourse contacts right we have this

00:28:05

relationship now that it supports this claim why how do we know that because we said so right we set this here

00:28:17

and so we have this information now and if we jump into this claim the reverse is also true right we know all the uh papers that

00:28:30

support all the pieces of evidence that support this claim right this is what we just saw that is part of what makes it formal in the sense like um you know you might specify this informally in an outline

00:28:43

but you can't query it right we need the reverse right we need to know that evidence uh supports the claim and also the claim is supported by and then searching query from either direction right and it's horrible so this

00:28:56

essentially makes the relationships real right so what the plugin is doing is it's recognizing these relationships and essentially making them real right we now have this evidence supports claim

00:29:10

evidence supports claim same thing here we have another claim here we have these supports and we have actually a surprising amount of evidence um which i'm still digesting okay

00:29:22

that's edge creations outlining right so creating the just the edges between the squares graph nodes is not an extra task but it's integrated into uh my desire to structure my thinking here

00:29:35

so as i outline it i create that structure what is this by mean right so now that the nodes are real and the edges are real let me give you uh a couple of intrinsic benefits right so beyond i just want to do this anyway to

00:29:48

structure my thinking right so if we have um whoops if we have um the nodes and the edges and we know what the type of relationships are we can do some

00:30:00

computations for example we can see uh the number of discourse relations the claim participates in right so here we can see it's supported by these guys

00:30:12

right you can get a feel for roughly how much we thought through uh a particular idea right so if i reference this in an outline this claim um has four disconfirmations this game has 15 discourse relations

00:30:26

uh i might want to you know look more into this one for example if there's uh you know not not enough evidence not going to kind of really weigh those okay same thing about my hypothesis here i

00:30:38

can keep track of the fact that it's the hypothesis that i've got one piece of evidence maybe that's uh the supporting list actually i have none right there's no evidence behind these so i might want to sort of look into this more

00:30:52

that i can do with uh structured querying right so if we know what the nodes are we know what the edges are we can also query um over the nodes and we can maybe create attributes for them if you want

00:31:03

to right so for instance if you wanted to look over the evidence base and say okay well you know maybe we have this evidence for lower susceptibility because there's something going on with the testing they're under

00:31:15

testing the kids right we can now more focus in the focus made we can in a more focused way go over the pieces of evidence to say pull up you know which of these results were

00:31:28

from what kind of testing right are they all um you know symptomatic testing are they some of them like uh you know exhausted they can filter for the ones that are um exhaustive and see what the result is

00:31:41

for example that's just a quick overview right of the of the prototype right this is what we can do we can uh we can incrementally formalize integrate the work of creating nodes and edges into our normal note taking workflow and

00:31:55

writing workflows and we get some immediate benefits back we can also export to this cross graph so let me stop let me now switch back

00:32:08

there's a few things that make this all right so we talked about this idea of incremental formalization where this targets a key long-standing bottleneck and sort of integrating structure into this

00:32:19

early stage creative knowledge work like literature reviewing and thinking right now this is a pretty old problem uh since as early as the 80s people have identified this right this

00:32:31

work of this system of issue based information systems is pretty uh pretty famous back in 80s for structured thinking over design and even there we have this

00:32:43

observation that you know the early phase of consideration of writing is uh needs to proceed with a contradictory incomplete forms and if you force people to only think in terms of structured nodes

00:32:55

and edges for example it sort of kills the thinking process right and so they talked about uh providing tools to aid in the structuring of raw materials where we don't have to organize them at the beginning but we sort of graduated that

00:33:09

and that's kind of what we're we're going for here to enable people to write informally at first in unstructured ways and then add structure as it's useful so this is accomplished with like we said note creation is annotation right

00:33:22

we create nodes as where as we need them as we're ready for them and we create edges not by typing out properties uh necessarily but by

00:33:34

integrating that into our writing and outlining work where we have this immediately useful notes with implicit discourse structure that we plug in parses into a reusable shareable explicit discourse file which i didn't

00:33:46

show you but you can export actually as csv as json as markdown you have this structure that's preserved right and then we also provide immediate interesting benefits we have structure

00:33:57

querying we have talkback in terms of the level of development of particular ideas and exploring uh your notes in a more systematic way this is a screenshot of what i wanted to

00:34:10

show you which is you can export this into open standards compliant neo4j version of the nodes and edges so technically speaking all you need is three ingredients you need a convention

00:34:24

for note writing which people are actually pretty open to adopting and actually most people already do something like this um we need something like a hypertext notebook that allows understanding of links between things

00:34:36

of which there are many this is roman research we have obsidian we have notion we have tinderbox we have emacs we have athens lockseek foam rimnote lots and lots of different tools can implement

00:34:48

this kind of protocol and then a simple plugin that parses no singularities as well i want to show you this because this turned out to be pretty important in our field study

00:35:02

the the way that the system knows how to translate the writing patterns into edges is through a grammar that is user customizable that essentially says

00:35:14

when i write something like this which is on the left i want you to save it and recognize it that this is a particular relationship right if the evidence is indented under a claim and has a little bit supported

00:35:26

by there then there's a relationship supports between them on the right here is a essentially a dialogue query pattern that uh that's that creates the mapping and users can actually extend this and create patterns of their own korean

00:35:39

nodes in their own which turn out to be really useful so some brief snapshots from the field study and we'll close and uh have a conversation uh so right now we're about

00:35:50

100 ish or so uh alpha testers um we are sort of deploying in these different contexts in the discord community of academic rome users their note-taking courses experimental journal club

00:36:03

just to give you some snapshots right so obviously i'm actively using this pretty actively in my lab finding it very helpful for my own thinking my students are finding helpful um it's useful for our discussions right

00:36:17

and this kind of gives you a flavor also the um you know this is an outline for a paper that i'm writing that integrates these claims and evidence and helps you structure my thinking and the right here you can see the context being easily

00:36:29

available um there's another library information science research team is doing a systematic review that's using this to structure the thinking over evidence to understand impacts of

00:36:44

a culture of stress or mental health or physical health for example we've done an experimental journal club working through a bunch of papers and structuring out the claims and questions and evidence that are in that

00:36:57

knowledge base um a microbiology lab is using it to kind of structure a project in it review i'm not going to go through the details here because i don't understand all of them but essentially this is kind of

00:37:10

cell biology for our particular team and what we see here is that you know structuring a project in terms of questions and also enabling you to then tie those questions to what what are we thinking what's the claims that we have in play what

00:37:24

evidence do we have so far and then you can see at the bottom we have what looks like results and hypotheses that are also added right to uh to the project right and so

00:37:38

this this user actually extended the discourse graph grammar to also enable thinking through conclusions and results that are primary primarily produced by the lab as opposed

00:37:50

to from the literature and they can all be part of the same conversation so that whenever a new new member joins the project they also have the full context of past results but also understand the chain of thinking

00:38:03

an international lawyer used the discussion graph to write two books is my understanding that really think carefully through a bunch of evidence it was really helpful for structuring his thinking and he also extended the grammar to

00:38:16

say distinguish between claims and conclusions for example and adding relationships like substantiates for him it was useful to distinguish between those that flavor of support

00:38:30

uh we have a venture capital firm that's using this for road mapping they're trying to tie discourse graph evidence to a discourse graph of uh outcomes of constraints and solutions to understand opportunity areas and so here again we

00:38:44

have to extend the grammar and we can right so we have constraints relationships as opposed to support or pose for example um more excitingly also because it's out in the wild and open

00:38:56

source uh people are starting to partners in their other tools so tinderbox is another tool that's uh one of one of the original hypertext notebooks and we had a user forum posts uh describing an implementation of

00:39:08

discuss graph protocol in tinderbox uh nothing to do with me i came across this like sort of by chance uh you know you can actually thunderbox has pretty powerful affordances for exploring a discord staff that uh roma

00:39:21

lacks and uh you know implementing this and that's pretty pretty interesting we have a grassroots founded bounty to part this protocol to lockseek open source hypertext notebook we also have

00:39:33

some efforts to port this discourse graph to an html annotation standard for broader web publishing use so pretty excited about what's been going on uh i want to talk about

00:39:46

some really key initial insights from this field study right the first is that i was mostly surprised that we didn't have to do anything that's super flashy

00:39:58

to convince people to use it um fostering more careful and career thinking patterns was most of the time enough for people to adopt this they wanted to think in this way and the tool helps them to think in this way and

00:40:11

that's that's good um and using it to find and access important ideas like later on they can use the more advanced features but uh just simply having the discourse graph as a way to structure right that model

00:40:22

is a way to structure it thinking it's really useful also extending or personalizing grammar is crucial right many extensions were finer distinctions right we have different flavors of claims or different

00:40:34

flavors of evidence but you can also be collapsed together if you wanted to sort of translate to a different graph and this connects to this kind of concept of boundary objects from information science and ccw that enables

00:40:47

coordination between different social worlds we have this like weak structure and common use very minimal ontology of questions and evidence and you can extend it in local use uh but enables you to sort of translate between those

00:41:00

this is a bigger promising design pattern okay so summarize uh in versus question two i think we have a proven concept it's possible to write close to probes and create shareable discourse graphs as

00:41:12

a byproduct uh i think this opens up new pops the sustainable scalable authoring so i'm going to uh you know now talk about some conclusions then we'll address some more questions right so

00:41:25

where we want to go next is discuss graphs from a2 everywhere right i think we've established sort of the utility of the model um we've established that there are integration points and i'm excited about the idea of

00:41:37

this uh being a synthesis protocol um you know tools as clients on the open protocol uh integrating the close graph extensions of the mural twitter slack hypothesis and so on right why would the spec look like um how do we enable

00:41:50

portability transferability interoperability uh in particular to avoid the one standard problem now we don't want to sort of like force everybody to use a single standard and so this idea of translating between

00:42:03

user extendable grammars but also having a similar underlying idea seems like one promising way to enable this peer-to-peer thing to start people might be concerned about formality machine readability

00:42:15

right now we can see there's minimal formality in terms of like we don't have any ontologies in here right there's no like connections to wikidata or owl or anything like that but it could be invincible if you wanted

00:42:27

to you could integrate that into the the document itself right you can sort of have it all representation of the um particular relationships that are specced out in a piece of evidence for example there's no technical reason that

00:42:40

you can't do that so i'm excited about this sort of middle layer between the sort of more structured knowledge graph that's more granular and the more coarse documents this kind of middle layer the

00:42:52

disco stuff could enable sort of communication across the systems um i don't have much to say about this maybe nlp would be useful for stitching things together um because at some point we need some way to

00:43:05

understand which pieces of evidence are the same or similar which pieces of claims are similar or not uh anthologies probably won't be enough and so that's something down the line i think it's it doesn't seem uh i've not worked on it yet but that's

00:43:18

like something something for the future okay so revising a larger vision we have this building block for a new infrastructure beyond itunes for papers and

00:43:30

yeah we start by just facilitating collaboration systems and then you can scale up by prioritizing decentralization and federation and we can publish to say databases like ceramic or the graph or so on and people

00:43:43

can subscribe to graph queries you can start there as opposed to every publishing into a single stream immediately to have people talk to each other i like this metaphor uh the way that we're working as opposed to starting from the top and saying

00:43:55

everybody's going to do this we start from the bottom and build tools for people to adopt and then you grow the infrastructure around them those are the kind of general strategies that we're excited about okay so closer the call to action for my

00:44:08

friends the uh tool builders hci people uh there's a lot of space for tool building and science reform it's not just about the institutions or incentives of funding it's also about what we do in our day-to-day the tools that we use and so i'm hopeful that more

00:44:22

of us will work on this problem um and you know not just make it required but also make it easy to be able to bridge from the bottom to the top okay so i'll read the slide up and we'll

00:44:35

have a chat uh you know so this is hard our infrastructure failures are wrong enough analysis discuss graphs can help but we do lack sustainable means of authoring but i think we've got some promising directions with tools for scholar power

00:44:48

contributions so thanks for having me i'm excited to chat thanks joelle for as always a very interesting and thought-provoking discussion join us again at our next research seminar i believe it is nine

00:45:01

august taking a little bit of a break for the summer