Waiting..
Auto Scroll
Sync
Top
Bottom
Select text to annotate, Click play in YouTube to begin
00:00:00
this is amazing you guys are the fastest group to break out of coffee break it's fantastic thank you very very much everybody um so what I wanted to talk about is something where I've been having a conversation with a colleague of mine
00:00:32
and I needed a bar about the future of scholarly publication and that and whether we can get to a point where we can in some sense envision this world of
00:00:44
applying this ted nelson notion of transclusion to scholarly publications and and the notion is are we ready for that and do we have the Internet
00:00:55
infrastructure in it in order to and so that's what I want to talk about and I want to get your kind of opinions upon this conversation so I can go back to my colleague and we can have a bit more of
00:01:08
a debate internally um so there's been a lot of notion around this ability of what we call dataset centric science or your network science so people talk
00:01:21
about the Internet of objects so interacting and tracking the histories and integrating data from multiple sources the notion of research objects
00:01:33
from our friend David Rohrer being able to again piece the parts of a scientific paper together from they're just different sources whether it's the background literature or the methods or
00:01:46
the logs or the workflows or the results so this notion of network enabled research has really been around for a while and indeed you see the emergence of what I would call the modular paper
00:02:00
in around 99 yeah the idea that in fact just like in kind of hypertext Studies we can take the parts of the paper and assemble them
00:02:11
together in 2000 there was a whole discussion around argumentation networks for scholarly literature again building together networks of arguments so representing the paper not as this
00:02:26
linear form of text but again as networks and then you have probably about to two years ago three years ago the rise of things like Jupiter notebooks where you have the inclusion
00:02:40
of code and text together in one environment and so what we've seen is this kind of changing the unit or the form of research now I think this is
00:02:53
kind of personally the kind of future of how we will do in create research communication and so I have this debate with my colleague Anita who instead
00:03:06
emphasizes the real importance of persuasion so she always likes to say that scientific articles are stories that persuade with data and she almost compares them to fairy tales and really
00:03:19
her her notion is that really the whole point of us writing a scientific article is to persuade and we need all this persuasive verbage in our text in order to convince people and so that really
00:03:33
goes against this notion of modularity the point of assembling papers together or assembling research communications together so that's kind of the debate
00:03:45
we've been having internally now I want to tell you a little bit about some infrastructure and some trends in scholarly communication that I think lends itself to this notion of a modular
00:03:59
or integrated paper so one trend is the autonomous ation of scholarly communications so what we've seen is the emergence of smaller and smaller forms
00:04:12
of articles so you have something like methods X where the idea is you don't you only publish the method itself right in a form of a scholarly article but it's
00:04:25
just about the method or you'll have something called a data paper so it's just a paper describing a dataset or a paper describing software so here in
00:04:38
this journal is called software X is an abstract it's a link to github repo with all the details about the software and a little bit more context and these are a
00:04:48
couple page long journal articles so this is one this Atanas ation of scholarly communications and that goes even further so people are starting to
00:05:02
kind of do away with the text almost completely so you have the publication of just three up data this is just a dataset that you put online but you get
00:05:15
the metadata that is directly associated you get that DOI associated with this scientific article you can get a DOI of your software so the same kind of
00:05:26
citation that we give for a journal article between now and get for software and data and let's get rid of the text altogether in publishing our scientific output furthering this you can get even
00:05:40
smaller you can just have commentary so you can really pick out pieces of comments using something like hypothesis so this is an annotation on a paper that
00:05:53
you can cite specifically you can cite a review itself so just the commentary in a review and you can get a DOI for that as well you can keep getting smaller so
00:06:07
we have for example a description in Elsevier of particular concepts themselves that you can specifically refer to there's an ocean
00:06:21
with something called a nano publication which is one triplet statement where you can cite the actual subject-verb-object pairing together in data right so this
00:06:34
is this is this trend of Atanas ation um also in the scholarly landscape we have this trend towards putting persistent identifier is on everything putting duis
00:06:47
on everything so every paper we published has a DOI every dataset we published has a DOI we have persistent identifier z' for funding and version oh
00:06:58
I'm missing bubbles oh well we have IDs persistent identifiers z' for scholars themselves so orchids we have persistent identifiers z' for chemicals we have persistent identifiers z' for physical samples we have
00:07:11
persistent identifiers z' for resources so what we've done is create this web of persistent identifiers or scholarly objects and all of those identifiers are
00:07:23
actually backed by detailed machine readable metadata so another trend is trend towards duplication in the
00:07:35
scientific community scientific literature so what we're seeing is people publishing pretty prints on something like SSRN or our classmate put up their paper then they send that to a
00:07:49
journal it gets put up as a before being reviewed it gets reviewed it gets commented on and so you have this kind of ever-evolving notion of duplicating
00:08:05
the same over and over and the question you might ask yourself is why are you doing all this duplication right of your work so
00:08:15
my third or my fourth friend here is this notion of machine writing so machine reading is kind of a hot topic so this abilities for computers to do
00:08:29
natural language processing and read text but I think an interesting strategy for us is actually having computers do the writing forms right so this is a
00:08:41
really nice article called towards automating data narratives and what they did was what they did was they were able
00:08:52
to actually take data set descriptions and generate different narratives depending on the audience involved and so what we have is I think in an
00:09:08
existing infrastructure an existing scholarly infrastructure that actually lets us do transclusion of documents so you can take these persistent identifiers with metadata from that
00:09:22
metadata actually use machines to generate narrative we can use hypothesis and open annotation structures to quote in context we can assemble that in a
00:09:33
Jupiter notebook and then from that super a notebook actually generate papers and presentations so what we have is an alternative infrastructure this DOI PID based infrastructure now could
00:09:47
let us do this notion of transclusion for of future of scholarly communications so there you go thank you questions oh no she IDs everywhere is
00:10:11
attractive but assumes that ground literals are recognizable whilst a ground literals into the future only fifty seventy years ago for example
00:10:24
women were not scientists so we have to reconstruct the history of say astronomy because people never cited work by women oh how do you resist making exactly the
00:10:39
same mistake again your assignment of ground literals so usually the way I could talk about technically how these persistent identifiers is get made right
00:10:51
so usually there's organizations that are minting these persistent identifiers that's kind of one of the downsides of the scholarly infrastructure that's there through things like crossref or
00:11:04
data sight so these organizations building these these P IDs so you probably could end up in the same in the same place the the difference is that you have some sort of social or institutional control versus what we
00:11:17
have just with the web itself everybody can miss IDs so that's kind of the difference that's probably it you could end up having the same problem I guess yeah yeah who said oh so there's this
00:11:43
notion of something called a Jupiter notebook is where you can ru Jupiter notebook so it's a very popular format for doing the combination of code and
00:11:54
text okay and data in the same open file for me there yeah well so I'm sure you have a question no I mean I think I think you need to
00:13:04
look it all so I guess the question is what do you qualify as innovation right so I think in the last probably three years the emergence of people putting up
00:13:17
data sets and being able to cite data has been a dramatic change in the way people do scholarships right and I thought that as a that is an innovation so that's one example or for example the
00:13:32
rollout of the notion that it's important to cite data as equivalent to citing papers and the fact that you know most major journal publishers now have
00:13:44
support for data citation right so I think that's a that's a big step or this emergence of smaller kind of micro articles around different artifacts so I
00:13:57
think all those are innovations per se I think the amazing workings of if you go look at the the environment environment around something like num focus or my
00:14:12
plot lib or Jupiter notebooks is pretty amazing how how fast that's gone in terms of people producing and putting kind of these interactive pieces of scholarship where you have the
00:14:25
combination of code and text and data all at one so I call if I all those as innovation I think a place where people are frustrated are is on the
00:14:38
social factors where those kinds of outputs are not seen as equivalent to what would be traditional academic outputs so I see a lot of social
00:14:51
frustrations but I see a lot of cool technical developments and I think that sometimes thank you everyone so I was
00:15:09
trying to get a big picture on this I'm not a researcher it turns I tried to do degree once in scholarly communication PhD sorry and it turns I hate reading rating research papers which really was a problem but it inspired me a lot to
00:15:21
study what's wrong okay so my backgrounds open data open access open source software I begin to get a theme I was trying to find something on the line to show that information is complicated
00:15:35
and playing cards that's from analyzing an IDF dataset and I was thinking this whole thing what is texts Texas bunch of symbols in a row
00:15:47
mostly of it putting some ideas into orders and I thinking why do we bother beyond the actual grammar and sentence structure why do we actually put things
00:16:00
into order at all why don't we just do it - nard yet why don't we just humans do RDF instead wouldn't that be easier and I've had a lot of lunches with Road lately and this being a sounding board so some of this may sound familiar but
00:16:12
one of the obvious things is that humans experience things in a linear way even if I happen to read through the phone book cutting to find a thing my experience of time he's in a straight line even if I'm playing a computer game
00:16:26
where I can make choices I'm having an experience of straight line and it's being discussed books of random acts of Scrolls I love that but books are by their very nature in a straight line and
00:16:38
that's what most of our things have been written for in the past and even books which are nonlinear in themselves the books where you have to find specific sections in are ordered because
00:16:50
you need to find something in them so thinking some good reasons other than the media to put stuff in order why do we bother at all and one is the huge one
00:17:03
is to orchestrate a response an emotional response I've been trying to find something about spoilers without actually spoiling anything so this indicates to me some kind of emotional response due to things happening in order if they have it in a different
00:17:16
order you wouldn't react in the same way but there's other things that are still orchestrating responses they're effectively a form of entertainment and most sharing even between academics its
00:17:28
persuasion as we said it's not just stating the facts as is we are still telling a story this is beginning in Midland if I read it's five page article in New Scientist it's actually if someone told me what the punchline was
00:17:41
it's still not as enjoyable as reading the crafted story and but it's only narrative French tainment which really has spoilers I've flown up this is my other picture I found to illustrate
00:17:52
spoilers without actually spoiling in a theme may not seen yet yes I'm a butler and I am the killer and so other good reasons for putting stuff in order the obvious one is to help people understand
00:18:05
it you put the ideas in an order so that you can start people with your ideas I need to understand later ideas and your crafting a route through it and I don't actually have this book maybe I should
00:18:17
write so a sequence of steps this is actually terrible recipe for a cup of tea I found on a children's website it's actually skipping wait for the bottle it to boil but and to enable nonlinear
00:18:30
access as we talked about the books that actually enable you to find information in a nonlinear way in linear information but but we also don't to fill out a template either for good reason or just
00:18:42
because it's required of us and this should be a fairly familiar one we've already seen there are variations on this but this is still basically what most research communication is for
00:18:54
squeezed in and I'm thinking the whole point of academic team you it might be persuasion but it's not entertainment it shouldn't matter if someone tells you the punchline of what the paper is about in fact you want the spoiler for the paper
00:19:06
in fact that's the abstract in fact it's the last line in the abstract usually the after it's quite long skip to the end what did they actually do it's inconclusive first souther and I think this isn't quite true he may be formal formal academic communication is not
00:19:19
entertainment and I still I've been listening today and rewriting your slide to give them again so the academic record is not entertainment but I'm not sure research papers an academic record quite the same thing and I'm starting to wonder whether or not blocked the
00:19:32
academic record is more like a blockchain but this is a bunch of things that we need a recording of and that's not the same thing as talking to each other alright so this is another classic academic piece of text I have got a
00:19:45
single specific symbols the time with a bit of experience know I know that that represents deep people who contributed to this work I don't know what C Guruji's first name is why the hell do I
00:19:58
not have my full name on that that's my paper no one knows uncle Christopher from that text that's really annoying I also know who the tongue it's part of this it's
00:20:10
part of a journal volume which is part of that actual journal and I know that that is in volumes and that is pagination it has these dimensions of relationships I also know that it must
00:20:22
have been accepted published because it has a published date and that means it was probably accepted and I guess it was probably submitted and must have been completed at some point I don't know for sure it was completed before it was accepted maybe there was a second
00:20:34
version yeah it's lost and with crow barring normal research into these structures with the variations we just seen at the your stalk would be much better after mine so you keep you
00:20:45
answered my questions but I think whether in putting things into order is a form of destruction as well as creation you are actually deciding what not to tell people I've deleted a lot of slides I've deleted a lot of things some
00:21:00
of what I deleted was because it wasn't helpful to the argument or wasn't have been seeing or it's a bad flow some of it just there wasn't time for I had a whole different analogy laughter you're not coding I'm a coder
00:21:11
so coding is not like authoring coding is like writing text of short text whereas you take an instruction you turn it into a sequence of statements like a
00:21:24
paragraph which authoring is a much more complicated thing and programming it's not the same as authoring but it is the same level of complexity we're taking non structured information and trying to structure it in a way that benefits both
00:21:38
the writer and the reader and in this case computer and we've got all these tools this is a tool for children to program in but you can see the shapes of the things we don't have this in writing
00:21:50
this is a paper for 2017 this is a paper from when the era was born they haven't changed so all right I'm gonna skip from the end and this is the bit to life Road
00:22:02
oh if it said one minute that was a three minute slide oh you're waving that slide Oh God okay what I'll stop washing in okay paving coding and is grammar and syntax
00:22:17
programming is more like authoring a novel so but no one tells us how to do news structures of novel maybe else very beginning to and we need to be designing how we communicate and how we allow communication blah blah blah cc-by okay
00:22:31
so linear communication is good there were reasons for it you're helping people it's occurred to see but it should be a signpost through information not work to our forcing people through information this bit when I froze the tools aren't the solution
00:22:43
why is that but tools are not the only solution and we need to create the environment this is not the right environment to function as a tool and so we do need to create the academic
00:22:56
environment with eprints make we created an expectation that open access could happen and once we've seen a paper available online then they start to ask for it and it changes the environment and we need to create this environment
00:23:08
and what I think we also need to ask is the academic record of what people have researched the same as the way we communicating with humans [Laughter] minutes if by one person if multiple reads were
00:24:27
reading them they're a big nonlinear cuz multiple viewpoints at once what are the main points of your talk to return from meeting what are the main points of my
00:24:40
talk what you want to use them as an example as a case study what we want them to take from your talk the last slide that summarizes the my main points that's we don't need to just need tools
00:24:54
we need to change expectations so that we create consumers demanding the tools and then they'll pay for them and people aren't paying for liquid and people don't know that they're wrong yet and people we need to create consumers who
00:25:08
know they need these things and that's hard chicken at this a chicken and egg problem no one wanted RSS until life Journal started doing RSS and people started building tools for ourselves
00:25:30
well sources that could ibex are gifting austell applause just two slides to highlight because I think they're really
00:25:50
important yes our event I know you had important business with academic people so yeah the point there being that information people give routes through information that's really valuable and that is an important part of human communication
00:26:03
and we shouldn't be trying to stop it which would just be stopping people putting up barbed wire and encouraging people to sign posts for information instead and this was the one to annoy wrote to tell him that tools aren't useful if you don't create the correct
00:26:14
environment for them to work in and it's all by itself without the environment to be that's going to work in isn't enough you've got a lot we're going to figure out how do we help
00:26:37
the reader do a Hamiltonian through the text to make sure that you've got yes mm-hmm and that's what foot that's one for tell tell tell games again that's how do you get the most value out of
00:26:50
computer when you today make the wrong choices and didn't see half the cool
00:26:53
video I can't read two pieces of text at
00:27:16
the same time maybe some people can't but I think it's really important to Christmas highlighting its linear experience for the reader mmm even if you jump around for him and his time oh it's linear which is kind of interesting
00:27:32
Dave no my time is still linear Dave trails through the revisions annotations updates so it's the property here I'm not a fictional character I think about one thing at a time I can have
00:28:38
maybe very short staff different kinds of linearities we're talking about alright there's the experience linearity
00:28:57
and then there's the offering linearity and then there are focus on more than
00:29:03
one thing yeah can you do my honor of
00:29:57
saying your name amazing my name is excellent I'm pretty sure my dad was pretty drunk pretty much lost so a
00:30:15
little bit of my background so I graduated in microelectronics in yuning Steve Hampton and I did a bit of electronics for a while and recently I moved to software and now I'm a
00:30:28
developer Sparrow so Sparrow is basically an online platform used in 2000 Institute's over 150 countries so users can find research that matters to them it's pretty similar
00:30:41
to area so if starting off like to thank fruit for giving me this opportunity to present here so in today's age there is an overload of information and technical
00:30:54
novel and complex text is no exception to that so therefore there is a need for technology to help us effectively track information that's relevant to us so for
00:31:07
today I'm gonna speak what we think at Sparrow are the three main aspects of pillars of research text and they are primarily oh this is what I want you to take from this presentation
00:31:19
accessibility of the text how easily it is discoverable and mr. Fowler how is it reliable so effective dissemination of scientific research is crucial for human progress
00:31:32
so Reese's texts such as patents and papers and conference proceedings are primarily the way right now researchers communicate their findings and according to the scientific technical and medical
00:31:46
publisher in 2015 the robot over 2.5 million research papers published every year and that volume is expected to double every eight to nine years so starting off with accessibility
00:32:01
what do I exactly mean by accessibility so in the world of research accessibility primarily means reading the full content so why is this important so even a recent example is when you had
00:32:13
coffee at the coffee break recently miss Claire just pointed out that this this genius kid who discovered a way to detect pancreatic cancer but was complaining that he did not have enough
00:32:28
pocket money to access research that he wanted to study basically so one of the cases well as more and more funders and Institute's support open access there is
00:32:40
a decreasing portion of latest content that's behind paywalls but that alone is not enough just accessibility on its own it is good but it's just not good enough where is this for example I'll take these talking public library
00:32:53
which is actually the background of this slide there once you get in you have ax to over 2 million books and without the appropriate tools it's going to be extremely time-consuming to find
00:33:06
relevant material and work out whether it's up to date so as mentioned before the output of research London is only going to increase and it's not enough to have text to be merely accessible so
00:33:18
that's why it has to be also discoverable so with these millions of scientific research papers published up to today how does the research should go about finding information that's
00:33:30
relevant to them so search engines and scientific database have sold part of the problem you put in certain filters you put in and you're able to narrow down little down the papers that you
00:33:43
scan and find the relevant information you need so at Sparrow we basically tried to make that discoverability factor a bit easier so for example if
00:33:55
when you have research tax that uses very highly specific terminology and in a very narrow context it is difficult for researchers in even adjacent fields to actually understand the full
00:34:08
implications of these new findings and have cross collaboration even even by serendipity so we believe that so how do we actually improve this so we believe
00:34:21
that the key factor for discoverability is actually simplicity of the language so if you go to the Sparrow platform you have access to over sixty million research papers that are updated daily over four from over 45,000 journals
00:34:35
across all fields of science so if you do any single research still you're gonna get a daunting number of results so that's why to do stress on our belief
00:34:48
of the simplicity simplifying language for to a discoverability we introduced the concept of pin bones onto the platform so what is the idea being that
00:35:00
a researcher can curate and summarize their area of interest so you have the heading for the topic the curators name and also a pinboard summary briefly explaining
00:35:13
what their area of research is we stress that this has to be very simple and concise and also the papers that they use to actually generate the spin book
00:35:25
oh so what does this what does the spin would actually give it is actually a portal for readers to get initiated to a new field of science so since it uses
00:35:38
simple language it also helps users to understand the new area of research the summary of the pinboard is actually is created by the curator but it is in
00:35:54
Sparrows a road map to actually have the AI generate an automated summary which can later be which can be given to the curator to be revised improved so um the
00:36:08
so that's one part of us helping discoverability that is trying to simplify the language by the introduction of pin boats so our second part is our recommendation engine so which uses text analyzes text
00:36:22
analysis and natural language processing so here's a simple visualization oh sorry wrong keyboard so here's a simple visualization created by one of our
00:36:35
interns as to how we draw connections between a single paper in the pool of say 70,000 papers so sparrows recognize system takes various factors into
00:36:47
account and the degree of connectivity between a single paper paper and the rest of the papers in the pool is based on just plain text to text similarities not just words but also their linguistic
00:37:00
structure they uses input and the author similarities and also interestingly the network built by the sparrow users for example meaning take the pin boards how are these papers actually used us the
00:37:14
user pin two papers and how are they interlinked so we use this information to create tailor-made recommendations for other users so it is still a work in progress but we
00:37:28
are still working to make science more discoverable and reliable so one of the challenge of having like so much information at our fingertips is that
00:37:41
the immediate is that whoever shouts loudest gets believed so for example the anti-vaccination movement so which gained traction from paper that was
00:37:53
published in 1998 on the lancer it has been discredited sense but still there the movement still process so we take reliability quite seriously espero and
00:38:06
we are still exploring ideas as to what makes the contents that we have it's found to be reliable you know if you think of using a hand selected experts
00:38:19
to review the content which is similar to what University the model did universities have of course well the model that exists right now or assist the person who's actually creating the content they attract history in creating
00:38:32
projects and datasets and content they might have authored and also public talks they might have given what really interests us is how do we actually determine reliability at scale and
00:38:46
without bias but right now it's pero our primary focus is on making enhancing discoverability so to ensure that people
00:38:57
get the credit that they deserve oh I can't wrong keyboard very good music so we know that making the latest
00:39:14
scientific research research discoverable is extremely difficult and cannot be achieved by Spyros efforts alone so we are always keen on working with other organizations who align with
00:39:25
our mission and so any ideas or suggestions in increasing our discoverability or reliability of the contents of a label available thank you and also did you dislike me we
00:39:43
are hiring as well if you know anyone who's interested questions they discovered Alyssia you do something
00:40:27
very probable reusable new content or just the usage having value yeah definitely so as we get more and more users use users we collect more and
00:40:54
more data as to how they behave and so it helps us actually refine our search methods which directly affects the Sparrow recommendation engine so it makes we can provide a final result so
00:41:07
the more users we have and the more behavior you actually can gather the more information we gather the better yes yes that's that's what that's the user and we one of the challenges right
00:41:30
now is to actually anybody can create a pinboard so how to actually validate that the pin builders of substance so that comes in interesting very interesting
00:41:51
things but what I'm wondering about is that the research on the current academics in social networks indicates that endings are quite relaxing to news using the same
00:42:14
questions there's a private button feature you press that you all your recesses hidden so your PIN board is hidden but your public profile is still be available but if you look at the profile page it would
00:42:36
just be what you do oh if you it's you fill it up and the pinball section or any Content that you created or any research that you've done use it but it's not thank you so who who
00:43:10
knows about the blue edge no it's it's
00:43:21
again beside the mentor and I got to
00:43:43
learn about that when the teacher of my boy was 10 years old Cougars telling us that my boy was talking about this school so it means
00:43:57
it's us maybe but also its relied on that they are cheap erosion control it was fine but it's it's a story that he that is heavy so I looked at it at a
00:44:13
time back in in May trying to understand where it was coming from and apparently that comes from Russia and you try to find any evidence
00:44:26
and there is nothing you really can't do something thinking about this presentation I looked at it again recently two weeks ago and you continue
00:44:39
to see articles in the press and you try to find evidence again nothing ok so to me provenance is something that could be really useful in that context can we
00:44:54
find the provenance of those means how memes or those stories or those texts and having something that is a bit automatic would be really useful in this
00:45:06
context so provenance 10-15 years ago people were talking about it in the context of food or in the context of art okay for artists is that gives
00:45:19
authenticity to a piece of art for food it's typically a sign of quality now look at this quote from 2010 from Jeff Jarvis good creation demands good
00:45:33
provenance provenance is no longer merely the nicety of artists academies and winemakers it is an ethic we expect and it's true that over the last 5-10
00:45:46
years you've seen something very different there are lots of contexts in which provenance is mentioned to me it starting on the right hand side in the context of East Side's or the Europe we produce scientific results we talked
00:45:59
about that earlier open data and open data journalism also look at provenance the business in terms of making their supply chain transparent and being more accountable also talk about provenance
00:46:13
I'd like to give you a good example of provenance if if I had known that journal would have been here today I would have chosen the example from The Gazette sorry John it's a competitor but
00:46:27
it's the climate assessment a report produced every four years by the US government the last one was produced in 2014 and I
00:46:39
hope that there will be another of those reports in the future they decided in 2014 to make this report provenance on a bird it means that when you navigate that document online you will come
00:46:53
across pages like this one showing precipitation change over the US and then you can read the narrative this image there is derived from a data set
00:47:06
and you can follow the link and a process was applied in order to generate that data set again you can follow the link and find instruction about it information about
00:47:19
it underpinning this narrative is data model RDF near this case and you can see that you are right this image was
00:47:30
derived from you are right to that data set the term was derived from is one of the terms that we standardized at the w3c at the World Wide Web Consortium in
00:47:44
the provenance working group that's a definition of provenance as we defined it in the working group so provenance is a record it contains a description of
00:47:58
the people the organization's the entities the activities that have influenced a piece of data or something in general could be a book as we were
00:48:09
talking about it earlier so coming back to the problem of fake news what is the problem so first news is not new it has been around long time ago what is
00:48:24
different this time over the web is that you've got a number of characteristics coming together there is a lot of a lot of automation lots of births involved we've got viral trauma creation on
00:48:38
social networks very little editorial control over what is being published it goes across organizations and and sometimes even you've got
00:48:50
laundry of news you've got something that is published on a social network and then respected by newspaper in quote is going to publish an article about
00:49:01
what about the social network news and then suddenly it's becoming official and publish on our on their website an overall a total lack of transparency and
00:49:14
from my point of view having worked trying to standardize provenance why can't we see the provenance of those stats on the web so I'm not gonna give
00:49:26
you a presentation about the standard trough that we defined but we spent two years Paul was my other co-chair in the working group we spent two years
00:49:38
producing this specification trough and I thought that would be the first to show some proof but no did it earlier but really proof is a vocabulary that allows you to describe the provenance of
00:49:51
things so you've got a notion of entity for instance a piece of text some activities like writing piece of text or editing of a piece of text and then you've got edges people services
00:50:06
organizations involved in in in these processes and then you've got relations between those different classes so when you talk about web resources you can
00:50:19
describe some minimal provenance for it you can say we've got this resource on the web there is some agent associated with that resource and that resource
00:50:31
refers to other resources by means of links over the web so what I did was to write a little script that would go over
00:50:42
in fact would travel the web from a few of those articles about the blue whale challenge would create some provenance out of the dead town was finding
00:50:55
in those web pages and then I wanted to see what the provenance look like well I couldn't find a tool that could visualize the provenance that I was generating was generating I had about
00:51:09
1,500 documents and I simply couldn't visualize it that you don't have it good enough what is interesting is to develop to apply some of the techniques that we've developed as researchers over
00:51:24
those provenance for presentations and what you see here is the simplest summary that I could generate out of those 1500 resources and it says that you've got resources that talk about the
00:51:38
blue whale challenge they refer to other resources that are not about the blue whale challenge and then there's a set of other resources that are not processed because I limited my search
00:51:52
over 1500 documents and then that's a summarization technique you can refine it and configure it in different ways and a second summarization is here which
00:52:06
is starting to present some insights and I just spent a couple of hours on this so I I don't have a final result but we can see that we've got articles
00:52:18
referring to themselves or articles of an organization referring to the same organization so we've got kind of echo chambers in this kind of phenomenon so I
00:52:31
I think that we've got a very powerful tool to to understand what's happening in those kind of networks so fake news
00:52:43
are important okay we know the kind of potential very negative effects they can have let's over elections I think the corporate world does not always have the
00:52:57
incentive to tackle the problem fully and in part because it's it's not something that is within a single organization is cross organization and another thing we need solution
00:53:09
auditable that allows us to understand those connections between documents and the measures of trustworthiness that are being developed and for these provenance
00:53:21
for me is critical and with others [Laughter] [Music] that's the source the fact that there is
00:54:17
a provenance is already an indication that not necessarily you should trust it I think that once we understand once there is a way of a grid way of
00:54:29
collecting provenance we can have tools telling us there is no provenance about this don't trust it okay all have some
00:54:41
form of writing that is related to the fact that there is no provenance associated with some resources so in fact if you start making this explicit
00:54:53
there is a some form of social incentive for people to publish the provenance of the resources now you're gonna say your guys in C Petersburg may create false
00:55:05
provenance and of course there are ways of dealing with that some of it are so part of the same model your work with provenance but what really brought it to my attention the importance of what you're doing I read
00:55:23
an article during the election it was clearly absolutely horrific but there were lots of links so it looked like there had citations did you clicked on any of those and it said this is this is a spoof mmm
00:55:35
but it turns out that clicking on a link even takes energy so people don't check the citation so that's why to take the provenance front and center is so important mmm and there are websites
00:55:47
doing kind of journalistic analysis of forensic analysis even of some claims okay some in the US some in the UK everywhere and these can be part of all the same
00:56:03
set up okay so you you could have not only information about the provenance of resources but also ratings produced by various organizations and all that I
00:56:17
think could be processed automatically and give us some help because when I see this thing about the blue whale challenge at the moment I still don't know whether that's true or not
00:56:40
[Music] many of the references amor theory trail brevity and so digital signatures things like that although they are attractive sometimes still don't necessarily do the
00:56:55
job have a certificate from certificate authority knows knows what I'm talking about but there's a second observation about provenance which I was quite surprised to work in the early history
00:57:09
of church the Catholic Church there were a number of documents written under the names of what we would call the Apostles and the only reason that those names were attempted to those documents was a
00:57:23
document credibility now if they were completely fake and in spite of that and despite the fact it was known some of the manage making it to the Canon anyway because I guess somebody said well I don't care if it's not true it's a good
00:57:36
story so that's included I hope I haven't insulted anybody but the point I want to make is that the idea of trying to establish provenance has no we haven't this problem has been with us
00:57:49
for a long time technology might help but it is not not the absolute solution that I'm pretty sure you didn't say that I agree with you so first I don't believe there is
00:58:02
necessarily one version of the provenance of something the way we designed the probe model is that we support multiple provenance of provenance okay Timothy these are
00:58:16
assertions that someone made and and those assertions have does a provenance okay and we we don't have I believe the tools the techniques the logics in order
00:58:29
to reason about those multiple versions of provenance but I think we have to start somewhere we I don't think we making use of the technologies that that we have at the moment I'm not claiming
00:58:43
that they're proclaiming they are sufficient to solve the problem but it's a start and more work is required first
00:58:54
question thank you very much just looking for your name slide Lindy professor Dame Wendy Hall phrase that I didn't need to do slides so I haven't
00:59:39
lots of wonderful sized a friend also said I should summarize so I've been here apologize to the two or three people whose tops I'm miss and I was gonna say no one mentions amount of our bush and of course what did so I'm gonna
00:59:54
sort of weave ten minutes around mmm how I got into this world where I think it's going and what people have said today it's been amazing set of talks and I want it before I forget I want to thank
01:00:09
froude and Edgar for an Emily of course yes thanks yeah anyway so thanks to froze and Edgar for organizing this day let's start off with that shall we
01:00:29
later and there's tomorrow but I know you want to do something at the end as well so yeah so um okay so what well I started with with Bush because that was
01:00:43
where that was where to me this all started that paper as we may think and I'm going to come back to that at the end as to what the future text is also
01:00:57
as I think it was froude said Ted was right and the more I continued to work in this field I know he was right he's
01:01:11
just released he was 80 in June he has the biggest chip on or on we're all recorded not me he's just released his as a YouTube of him talking about why he
01:01:25
should have been famous more famous all the things he but he's invented and that people have ignored we still don't do right and I'll try and
01:01:37
remember to send the link around to everyone it's it's very Ted it's quite a concise it's about 15 minutes which for Ted is quite concise but you know of the
01:01:48
things that he I he it was it was that what was 10 that were saying about that inspired me to get in type of media I'm gonna speed out of it but whatever was right things like but really it was
01:02:01
nonlinear text was what he that's what that's what I used to teach about Harpeth X is nonlinear text he got hyper media back in the sixties right amazing two-way links
01:02:13
he was right about that Tim I'm sorry Ted was right micropayments he was right about that transclusion which I am a great fan of and I think that was one of his most profound inventions that we
01:02:26
don't use enough of and I often ask me but is that a transfusion I don't know actually very profound of that I think we should use more on of course he wanted everything to go through his system it was very proprietary and but
01:02:39
in terms of the what we needed for this new world he knew about his Xanadu design inspired the the design of the dynamic generic links we had in
01:02:51
microcosm again also as we may think and who then along comes Tim with the World Wide Web embedded one way links retro and we've been retrofitting hypertext to
01:03:05
the web ever since which seems the most ridiculous thing to say but it's true it was a very very basic form of hypertext in the web and we've been
01:03:17
representing security to the web as well and other things until would be the first Amit security but I'm not sure he'd agree about the hypertext bit and today it's all about I hate to use term that's 20 years our
01:03:29
but it's all about multimedia actually and I I maintain you can't think about text without sitting at setting it in the context of all the other things that we're using and doing because people
01:03:42
still write but they write in very different ways they write using blogs microblogs I mean I think I write using Twitter okay I love Twitter and I love
01:03:53
doing short microblogs with links in with photos in with videos in and that's all interconnected or deeply into twinkled we write using we write using
01:04:06
speech increasingly speech to text and text to speech in all the different languages be very important Google's done so much on this we teks we share we
01:04:18
write using photos and videos in a way that we couldn't possibly do 20 25 years ago so I'm reminded and I was reminded earlier that this this is like this is
01:04:30
2017 and in fact it's September 2017 which is twentieth anniversary of the hypertext 97 conference which Mark famously was the program chair for but miss because he was ill several other of
01:04:45
us in the room were here it's where we first met Nick Gibbons Dave was there less was there who else was up here who's at hypertext I think how he was
01:04:56
there Luke was in the university we didn't attend oh you can't remember we did we did drink a lot
01:05:08
as I recall that's what everyone wrote masking email via the other week said I'd love to come see you next something the UK I still have fond memories of all the alcohol we drank in 1997 in Southampton I could tell you lots of
01:05:21
stories about the drinking team that we had then but anyway that was the conference where Ted and it clashed Cohen say it was the same dates as the world what the sixth world war web conference which is run by one Bebo
01:05:33
white who I've now got sober in Santa Clara at the same week as an hour hypertext common at the time I was trying to get high and the web closer together I got them to change the name of sig link to sig web but that's about as far as I got
01:05:46
and that was the cut time with Ted famously addressed over a live video link was run by Dave de raw which is really difficult to do at the time a live video link with Santa Clara when
01:05:58
Ted said your future is my past to the World Wide Web people at the other end and of course his phrase I've lived by that everything's deeply entangled Tim didn't like separable links he said to
01:06:14
us when we were doing my cousin he did like my cousin he want these he thought it should have been a front end to the web but of course mosaic came along but he said he always said to me Wendy people will lose those links we've got to embed them but they will lose them
01:06:28
but if you remember Dave we have the most amazing visions of hypertext links in those days you do we used to think about we'd think a link right we're thinking and that's what we were
01:06:41
actually trying to do we have that portfolio of projects with my cousin where we try links in audio and in video and in photos of course all that is even harder to preserve you try and reverse
01:06:53
preserve social media with links in right well we Brewster Kahle has led the way and so that this is I think absolutely Nigel talked about earlier it's one of Vince main topics approach and I was
01:07:07
gonna I'm getting more and more involved in it how we preserve this stuff when it's also deepening to twinkle I'm just thinking about things like Luke was talking about fake news do we preserve
01:07:20
the freight and fake news and how do we tell the historians of the future that this is fake and that wasn't you know I mean it's gonna be much harder for them this fake news as Luke says existed forever but the it's much much harder I
01:07:34
think to this disc is set this is some disassembled what is true and what isn't in in the social media and Luke's point about provenance is going to be increasingly important that we store all
01:07:46
that with the the other data that we store and the text but you see I think what I loved about frog system I've got minute maybe less always is the fact that he he picks
01:07:59
up with liquidiser the flow of linking that we visit in microcosm the site that the generic links and applying a link to a potentially abstract concept and being
01:08:12
able to follow it and to do that in the text and I love that but I think in the future it is going to be more than that the future is about cognitive links I think picking up on the sort of things Paul was talking about and again that
01:08:25
goes back to Vannevar Bush as it may think so for me the future of text is I'm afraid Ted was right the future of text is hyper media because I don't
01:08:38
think text and Lennear text isn't how we we communicate these days it's a part of it so it's hyper media or it might even be augment or maybe it's social machines
01:08:52
augmented by hyper media thank you since I recorded this special because you're talking about Ted who couldn't be here this time maybe we could all have a round of applause
01:09:08
for Ted we miss you Ted we miss you Ted we did try to get him to be me alive to this but some time differences in what was hard Henry
01:09:29
[Music] something above it's not Piper as in boo boo boo it's yes it's a I mean his links were actually quite I think abstract in
01:09:56
although you always had problems with how to implement that well that's what it meant above and across so that was a
01:10:09
definitely a Ted invention he - he totally invented the world's hypertext and hypermedia and that I think hyperlink is a derivative of that that the was coined
01:10:21
in more recent times yeah make it really clear that a link is not hypertext okay Ted has clarified later that what he means by hypertext is freedom for the
01:10:33
God the author and freedom for the god of the reader that is about friends but also non-sequential because Ted's work is all about post-it notes
01:10:46
it's the copying and pasting cuz ten is all about the copyright thing which is one of the first talks today all about who owns that information that's why micropayments were there that's why transclusion x' was there and in his
01:10:59
model and I think blockchain is perfect for the original model that Ted had if my understanding of blockchain is correct because his model was very much anything that appears on Xanadu which we
01:11:12
is partially implemented by the web anything you have its provenance user who wrote it where does it come from and that is carried forward in that piece of information whether it's a
01:11:26
sentence or a paragraph or a whole book whoever copies and pastes it and that's how you track the micropayment issues how you track who gets what bit of any
01:11:38
publication that has that in I think so you do citations and I remember the first time I met Ted I was with Andrew fountain some of you will remember because he was one of the key proposals of the early microcosms
01:11:50
and we were in Toronto which is where Andrew ended up running a seminary he left hypertext to become leader of a religious sect but uh we worshipped ed
01:12:04
but the the thing remember asking him because we were this was like I think it was 80 in late 80s early 90s we were into my cousin anyway must be an early nineties and I first met
01:12:17
heard Ted speak there I think or maybe at second time I heard him speak and we were asking him how would you apply hypermedia links to video and he said it's easy you know you just use what I've given you
01:12:29
transclusion Smike repayments isn't that covers everything get the implementation in terms of the ideas he was so right Vint objects in giant linking them together and imbuing them with properties that
01:12:55
that we can make you so the context for example that talks about yourself and these things so I think we're going to push this a little further we're going to have to think in much more general terms that these objects whatever they
01:13:09
are text video or anything else have properties that that we can get access to the problem that's being one of them what sort of software needed to exercise the object that said so at some
01:13:23
point I think our topic may have been broadly in say we leave the last word to Vint because that he is right and he's
01:13:37
echoing what I said the we and it's objects that include extract whole social machines parts of social machines we're making those objects harder and
01:13:50
harder as we go for but this is how we communicating you may think that I would disagree with you in the military there is a concept of force multiplier right
01:14:02
so you need to do something bad to those guys over there you have a gun or whatever it might be but then you have other things that come into play to make that more effective right the reason I started the future of text is not
01:14:14
because I think it's only text forget everything I am educated as an artist I used to be a photographer so when it comes to multimedia images of video and all of that it's very important the reason we have the future of text is
01:14:27
only because it's getting very little attention if you search for the sentence the future of text in Google you will find this whoo that's scary as insert they squeezed in here right that's
01:14:38
really scary that it isn't something that one of the big guys are doing right but of course both Ben and Wendy there should be right we need to have the force multipliers onto each medium by text makes pictures more powerful and
01:14:51
vice versa of course we need to work together but I'm a bit worried about suddenly and so pictures again well I think it's everything's deeply into English so we finish with that yes
01:15:09
[Applause] did you manage my next one where she come here so we will be having drinks
01:15:24
outside till well for an hour and tomorrow we starting at 10:00 for those of you who can make it and you weren't all here this morning so the day will be split in half first we define the issues
01:15:37
in the morning and goodness gracious today was an amazing resource for that second half of the day is discussing out of habit and I would like to thank Nicola whom you may have seen a little bit out there but she was the one who
01:15:48
made today actually happened so yeah drinks everybody
End of transcript