Waiting..
Auto Scroll
Sync
Top
Bottom
Select text to annotate, Click play in YouTube to begin
00:00:06
hello everybody I hope you had a good lunch I had a very quick one and it was delicious all right my name is Daniel petronik and I work at
00:00:25
Fleury where we are making an immutable graph database with closure and it's really cool and I'm going to start with what may appear to be a digression
00:00:38
um a non-sequitur but I would like to assure you that it is in fact a sequitur and uh you're gonna love it it's great so stepping back into a serious problem of
00:00:52
shipping boxes in the days of yore and I want to time travel back to 1950. and say you're a a Goods manufacturer
00:01:06
maybe you make manufacture sewing machines in Chicago or something and you'd like to sell your great products to somebody in London um in order to do that you've got to take
00:01:20
your stuff you've got to put it in a special box that's going to protect it for this long journey ahead um and you load these boxes onto a truck the
00:01:32
truck takes it to a train depot that's unloaded at the train depot arranged physically with other Goods that are similarly destined that is loaded onto a boxcar carefully so the other thing
00:01:44
Falls over the train goes across the nation it reaches a port city eventually it's unloaded from the box car loaded back onto a truck the truck takes it to the docks there again you're unloaded
00:01:58
now you've got to rearrange and put the boxes close to the ships that are going to the places where you're going uh you put the load them onto a pallet the palette is covered with a net it is
00:02:10
lifted by a crane over the ship dropped into the hold there it is maybe unpacked carefully arranged because you've got to take into account how heavy these boxes are what's inside of them are they going
00:02:23
to break and what's next to them you don't want to store like anvils next to bags of coffee that's going to cause problems for the coffee not for the anvils maybe problems for the ship so uh
00:02:37
it's it it's a very long journey and then you go across the ocean eventually you'll read your poor reach your Port of Call it may not be the first one so maybe these Goods need to
00:02:52
be rearranged on the on the trip there and you reach your Port of Call you have to load it back onto a pallet back up on a crane over the edge onto a dock where it needs to be unpacked they will
00:03:04
literally break the Box open to look inside assess any tariffs uh carpenters will have to come fix it up and then you do this holes process again to get it inland so
00:03:16
needless to say this is a a long and expensive process and this is the simple version this is the easy one if you're trying to ship through New York New York has cursed
00:03:30
geography for Logistics the trains can't get to the the actual docks so what they do is they take the train cars put them on a boat the boat
00:03:42
is floated across the river then they're unloaded there it's ridiculous and as a consequence uh
00:03:53
at each step of the way you had to have literal human person who had to become physically acquainted with this box and move it from place to place and it may not be a box it may be just a bundle of
00:04:07
pipes or sacks of concrete it's heavy work it's dangerous work people died often you're often injured it was irregular work you might have four Ships coming to
00:04:22
port and you're busy for a week straight and then nothing for a while so these guys had a hard job um and every step of the way when somebody touches it you've got to pay for Freight you've got to pay for labor
00:04:35
and if you're unlucky you also have to pay a couple bribes um it's uh it's it's not a great scenario which brings me to my thesis about this
00:04:55
problem shipping across oceans was expensive risky because your storms happen right cargo tips over
00:05:07
it could get destroyed it's sitting around on a wharf guess what happens at wharfs rampant theft it was a huge problem something like 20 of goods could disappear just sitting on the on the wharf waiting to
00:05:19
get picked up so this just was not worth it for many categories of goods it was just too expensive but that's not the world we live in today
00:05:33
shipping things isn't that expensive we have containers steel or aluminum boxes of a very regular size that can fit on a boat can fit on a train can fit on a truck there
00:05:49
are standardized cranes that can lift them no matter where they came from and move them around they can be stacked they could be stored anywhere you can have refrigerated ones
00:06:03
and I can knit hats in my living room and sell them to almost anyone in the world I've got a laptop here that is the result of a global supply chain that I
00:06:17
paid less for this than even the computer a tiny fraction of its performance was capable of 20 years ago so
00:06:28
containers didn't happen until 1956. and it's an obvious idea like people you just had to stand on a dock and look at the chaos and say oh gosh there's got to be a better way
00:06:44
and there was but there are political social economic reasons why it just didn't happen and so in 1956 wasn't the first container it was the first integrated logistical chain where you've
00:06:56
got the boxes you've got the the containers you've got the ships you've got the cranes you've got the trucks and it all works together for the first time the rest is history like the world
00:07:11
changed once this happened the cost of shipping dropped orders of magnitude and this is a fan it's a fascinating history uh
00:07:26
but it reminds me of something or at least that that story of before the container reminds me of something that we do every day sharing data is rough and I'm going to specifically say
00:07:43
sharing data across trust boundaries is expensive risky and not worth it for many categories of applications and let's be clear on what I'm saying because it is just what it says but
00:07:55
let's define it what do I mean by sharing data I just mean bring it together two different places n different places wherever sometimes you want to look at two different sets of data at the same
00:08:08
time and make a decision based on that maybe you're going to dispatch some fancy side effects flash some lights maybe you're going to generate new data do some synthesis
00:08:19
this is this is what we do all day long and what's a trust boundary I'm just talking about any place where like there's a set of people who are allowed to read and write the data right this can be between two different
00:08:33
companies this could be between two different departments in a company I've seen Matt this can be in a Byzantine peer-to-peer network with one person versus the entire world
00:08:48
and I'm just claiming that that's expensive and I'm not just talking about having to Fork over some dollar bucks for like literal bites of data I'm talking more
00:08:59
about the indirect costs the time and the labor and the emotional anguish of trying to make it happen because we've got to do a little dance when we want to make it happen
00:09:16
I've got to figure out the transport what server is this on how do I talk to it is it is there an HTTP API do I have to find a TCP socket uh
00:09:27
what what are the rules here and when I finally can connect to the database it has to know who I am somehow do I have to figure out oauth or oauth
00:09:39
2.0 or open connect ID or do I have to email an admin somewhere to get me set up and once that step is that hoop has been jumped through now I've got to obtain access there's there's authorization
00:09:54
rules what am I allowed to see what am I allowed to change this isn't like we we've all lived through this but even once you've got access to the data you're allowed to see it
00:10:12
it's not easy sometimes it's like what am I looking at I'm going to steal a story from uh a guy we work with as a president eliad uh who worked at Citibank in 2008 and in
00:10:26
September of 2008 and specifically on a Friday after the close of trading in 2008 Citibank got a call and the call was hey
00:10:39
Lehman Brothers Investment Bank is not going to exist when markets open on Monday and we're going to try to auction off all their assets uh to recover some
00:10:50
value from this debacle and here's their data here like you've got full access we're giving you their literal data files please look at what we've got look at
00:11:03
what they've got figure out what's worth bidding on and let us let inform your CEO and he will get a bid in and we'll have a have us a grand old auction real fast
00:11:15
and they couldn't do it JP Morgan Chase Bank uh Goldman Sachs they're all but the bids are rolling in Saturday is gone
00:11:26
and the CEO is calling in hourly give me something that I can bid on all the good stuff's getting taken up finally Sunday afternoon they finally are like okay we've got that we've
00:11:39
finally figured out how to how to correlate the data in our database with their database and we got some bids out and then Sunday evening they get another call
00:11:51
and Elliot talks about seeing analysts around him in at their desks literally weeping because they just received news that AIG the insurance company was doing
00:12:04
the exact same thing it was going to be gone and AIG was an order of magnitude larger and they just couldn't handle it they had full access consider if you want to write to the
00:12:19
data I want to add to this store of knowledge do I just send send information to you and maybe I get a validation error maybe
00:12:30
I get an error back and I know okay that wasn't the right shape but uh what if you don't get an error does that mean that you did it correctly is somebody having a bad day because
00:12:42
something screwed up that data needs to be validated and who knows maybe it is you don't know because it's on that side of the trust boundary so but sometimes we don't we usually don't
00:12:56
want to just write the data we want to do synthesis like this is the good stuff this is where the value comes from we want to combine it with other stuff and extend it and I'm not going to write that on the
00:13:08
other side of a trust boundary I had to fight so hard to get access to this I'm not giving away the stuff that I've got I'm putting it over there over this clones and spikes and Thorn traps
00:13:21
no I'm going to create my own this is my data you see the problem right this like it propagates and now everybody's miserable
00:13:34
and this project is not fun maybe documentation can save us right just follow the docs and you know sometimes that is a happy path but you know there's a lot of
00:13:48
things that documentation has to get right before you can actually use it and anything is missing you're unhappy and so that leads me to this thesis sharing data across trust boundaries is
00:14:05
expensive risky and not worth it for many eight categories of applications even if you get all five of those steps worked out this is risky who here has worked on a data
00:14:17
integration that failed to produce any value because of reasons completely removed from technical implementation yeah we once spent four months integrating our product with another
00:14:30
product and nine months after we released it that business that business disappeared they went bankrupt that's just a waste all in vain we made it work and it was nice
00:14:45
but for reasons completely outside our control it just didn't work so this doesn't happen you don't share data unless it's too expensive not to
00:14:58
so there's an entire class of applications that just are not even feasible not from a technical standpoint but from just a
00:15:09
the system is gunked up perspective however I don't think it has to be that way um here's a proposal each of these little data Islands these
00:15:34
trust boundaries the reason they're so expensive to access is because each one is this special snowflake a unique and beautiful creation lovingly hopefully created by
00:15:47
people who care about what they do but they'll choose to do it differently um so let's simplify the shape of the data and get rid of that application server
00:16:02
that special snowflake um and I think we can do this and I'm going to talk about people who've been thinking about this for a long time
00:16:14
so this is another historical sequitur so walk with me uh in the 1990s Tim berners-lee created the world wide web and he saw that it was good uh
00:16:27
and then he took another look and thought well you know maybe we could make some improvements um because what the the web was these HTML documents with links between each other
00:16:40
these are readable by humans but these aren't readable by machines the semantic information in these documents are encoded for intelligent thinking creatures
00:16:53
the machines can't understand it it's not regular famously HTML is not regular I don't know if you've seen the stack Overflow question about how to parse HTML with a regular expression
00:17:04
don't try it it won't work and so we've got we developed heuristics for discovering semantics Google was like ah we can just figure it out by how things are linked together we'll just
00:17:17
follow the links but if we had linkable data a web of data instead of a web of documents we could have we could encode explicitly the semantics of that data in a way that
00:17:32
is automation friendly and so the semantic web people thought about this uh and they argued with each other and the entire world just moved on
00:17:44
they're like oh Web 2.0 we've got Ajax JavaScript cool stuff is happening and but the semantic web people kept working and they kept arguing and they kind of
00:17:58
were grinded up until they got a really useful um some really useful ideas and tools that I don't think people have really paid
00:18:10
serious attention to for a number of reasons but one of them I think is just the moment had passed people were like ah we can make documents work it's fine they're good enough worse is better
00:18:23
whatever and so at the foundation of all these things that they came up with is is the resource description framework rdf which is just a notation for graphs
00:18:37
um it's a simple Universal way to model data without sacrificing any expressivity um and rdf is just triples
00:18:49
subject predicate object Daniel that's me favorite food mango pie I made my first one last week and it blew my mind so that's new
00:19:02
which if you've been writing closure may be familiar to you in this form entity attribute value it's the same thing I think it actually came from the same place if you find Rich like just ask him it's like is that
00:19:15
rdf because I think it is um it's a good idea and it's the simplest schema three columns and anything can be boiled down into
00:19:28
this without losing any expressivity and the cool thing about it is you can you can have different notations for it because rdf is cool but who knows about rdf you
00:19:45
know what people know about Json this is the same data same shape but this looks a lot more familiar right Json LD that's Json linked data is a w3c
00:20:00
standard for encoding graph data into uh Json and basically it boils down to adding Global name spacing that's what that at context is giving Global context
00:20:13
to your data and I'm not going to get into this because I don't have enough time and there's too many cool things to talk about but this is this is a rabbit hole that's worth going down if you have to talk to different systems this is
00:20:26
cool um and they built this whole stack of really useful tools you've got rdf on the bottom Json LD is just a different representation of that you've got this
00:20:41
shape constraint language for validation rdfs the rdf schema sort of data um the on the web ontology language
00:20:54
and the these are all just they're all they've got language in the name but they're not different languages they're just data we enclosure have pretty explicit experience with the power of
00:21:08
data as a language as a DSL and they had the idea first and we all forgot about it uh and it's really cool
00:21:20
and then there's Sparkle which sits on top Sparkle is a graph Quarry language and it's magnificent you should check it out um and the entire point of this is you
00:21:35
interact with these Technologies literally every day and you probably don't know about it every time you do a web search and you see sort of an info box pop up with just the answer you're looking for that's Json LD in that web
00:21:47
page uh if you've ever heard of Mastodon or activity Pub the protocol that it implements and the entire fetiverse it's powered by Json LD
00:21:59
and it's cool it works Federated social networks can be a thing and the data the data representation helps so we talked about simplifying the shape of
00:22:16
the data so that's what that was semantic web has got that covered for us we didn't have to do anything because it's already been done we can decompose any arbitrary data into a very simple schema and now we've got to do something about
00:22:29
the special snowflakes that stand in front of the data that we want to access is application servers and we've all seen this right we've all done this I didn't put any numbers next to it
00:22:42
because who knows what order this is going to happen in but this is the basic stuff that needs to happen so what if we didn't have a special beautiful unique API to do
00:22:56
this every single time what if this was standardized and made as small as possible let's move transport into the data layer let's make
00:23:07
our database speak HTO or HTTP now we can now can sit there on the web anybody can talk to it with Json because that's what they're doing anyways how many application servers are
00:23:20
glorified HTTP to jdbc converters you know and let's move authentication to the database as well
00:23:32
right the database postgres has has this who's who here has used row level authentication or our authorization it's cool it's not very widespread it's there
00:23:46
we're not the first people who have this idea well that might be a good idea let's move authorization in there as well this is a postgres did it so it's got to be a good idea they don't make mistakes
00:23:58
um and let's make sure that what data does come in doesn't break our semantic guarantees our semantic model
00:24:09
and this goes beyond type level guarantees like is this a uuid is this of our Char 50 or varcar hmm who knows
00:24:24
uh we want semantic guarantees like is this thing valid uh compared to that thing like are you allowed to make this sort of change it it all kind of uh there's a
00:24:38
lot of expressivity that we put into the application server um I'm saying let's just move that into the database so it's regular and all that's left here is orchestrating side effects
00:24:52
right I don't want to put that into the database no thank you but when the application server is only doing that it's not a guardian of the trust boundary anymore it's just another
00:25:05
client IT issues a query looks at the results and either does or does not do the side effect right this is a Lambda function now this isn't
00:25:17
a an application server it's just a Lambda function so that feels pretty good flurry right we've we've got a database
00:25:32
and to to tie this back to our metaphor the container is Json LD and all the infrastructure for handling it slurry other things like we're not the only ones who are who
00:25:47
think Json LD is a good idea and a lot of people are starting to figure out the benefits of having very regular self-describing data flurry we we make our database a
00:26:03
first-class citizen of whatever system it's in it doesn't need to hide behind a guardian it can negotiate transport it can authenticate requests it can authorize access validate inputs and like persist
00:26:17
data of course um and because we're closure developers we chose to persist it as an immutable Ledger um these are rdf triples that have been extended to add a
00:26:32
temporal dimension and give it a certain retract semantics we map things to integers for to compact them and also to make
00:26:43
comparison and sorting a lot faster um so it's a it's a septuple and not a triple anymore but it's the same basic idea an extended Triple Star
00:26:56
and data transport we just speak Json you give us Json okay and this is a closure conference so just so you know we also do Eden like you you can use use it
00:27:09
properly uh and it's great that way um so this transaction is just it's just a hunk of Json if you want more advanced we've got some
00:27:23
syntax that you can do to do more targeted updates but if you just want to schlep data in you can do that um the query language like it all boils down to an internal
00:27:37
representation right now what this is that I'm showing you is just a sort of datified version of Sparkle um but we've also got parsers for SQL
00:27:50
and graphql and actual Sparkle So if you want to stick to the strings uh we should be able to handle that um and we did steal the the really cool eql
00:28:06
style select statements those are a good idea our authentication is simple it's just public key cryptography uh if if a request comes in and it's signed
00:28:19
by someone we trust we'll take it right we know who they are or we don't care who you are like we should be able to field queries and transactions that are one-offs where
00:28:31
we the database doesn't know at all so you should be able to delegate Authority and this is has anybody here heard of verifiable credentials okay you guys don't count
00:28:43
they're my co-workers um it's just a it's just a standard for how a shape of Json again Json LD um because it's all just data for
00:28:54
passing around signatures as data and so if somebody we trust signs uh a credential and says yeah you can you can query this stuff well like all right as long as you're authorized to do that so
00:29:08
we we do make sure that people aren't just running away with data um we've got a a data representation of security policies
00:29:21
um and then for validation we use the shape constraint language shackle it's got a bunch of different constraints it's a very expressive
00:29:32
data DSL for the specifying validation rules but like the whole database is designed a la carte like if you want to sort of make
00:29:47
data sharing cheap do this stuff please but you know not all shipping happens in containers you did like sometimes you're just carrying a box of
00:30:00
cookies to your neighbor or maybe you're trying to ship an orca to Sea World who knows like you're not putting that in a container or maybe you've got to get a an envelope to somebody in Japan like in 13 hours as
00:30:14
fast as you can so you give it to somebody and they hop on a plane like so this isn't going to cover every use case but just like containers didn't take over everything in order to provide
00:30:26
value they took over enough that it made a huge difference so ideally now data sharing
00:30:43
isn't so darn expensive and you don't have to jump straight in um I like to Rebel or compare the experience of cheap data sharing with the process of
00:31:00
sitting and doing Rebel driven development I spent a little bit of time doing front end work in typescript where I had to think really hard about you know how to change the code and think about
00:31:13
how it's going to work and I went back to developing enclosure and uh I was sitting and contemplating a change that I wanted to do and after like 30 seconds I was like wait
00:31:27
what am I thinking about this for let's just see what happens right I can play with my program while it's running same way we could play with data and there's so much that we can unlock
00:31:47
um when data is cheap Federated queries are when you just join databases you can't do that with a relational database the schemas are different so somebody has to sit down and figure out
00:32:01
like postgres has the foreign data tables right how many people have set up a foreign data table in postgres yeah it's awesome that you can but it's like it's a project right
00:32:15
so what if it wasn't a project what if it was just an extra clause in your query and then bam you've got the data right there
00:32:26
you don't need a budget you just need to know like what's the name of the database right bam it's right there and vocabulary mapping so you say I've
00:32:39
done this and I've joined two databases or eight databases because you can do that well now I've got this semantic problem like I've got an employee ID
00:32:52
and you've got an account ID and you've got an Eid whichever whatever that is are these the same things well all you have to do is like I talked
00:33:06
about the uh the web ontology language a while ago owl owl is for inferencing and so you can say you can add one fact that says oh uh account ID is the same
00:33:18
thing as customer no and now every time you query for one you get the results for everything you didn't have to go update everything you didn't have to go through a an ETL pipeline you didn't have to CL scrub and
00:33:32
clean data you just inferred the relationship you just put it in the data right along every side everything else and it just does the right thing and you could go way much further with
00:33:46
inference say genealogies right brother sister mother father these are all different predicates in our in our database and I'm like I don't want to
00:33:58
have to do every time I do a query say is brother of or is sister of or is a parent of or is child of uh because that's that's that's tedious
00:34:13
well more than just tedious sometimes it's just too complex but if I add some uh a couple of facts that say oh hey guess what if if you're a sister then you're
00:34:26
that's a sub property of family I add that one bit of data and now I can write just give me all the family members of this uh thing and because Sparkle is
00:34:40
this beautiful graph query language we can handle recursive queries really well it's very nice data viewers you want to play with your data we get
00:34:52
to do this enclosure they're fantastic projects like Clerk or portal or Rebel or data rabbit have you seen data rabbit it's really cool if you haven't check it
00:35:05
out if you have to deal with data like we've got this Rich ecosystem of tools enclosure world and other people deserve that you know like just because they don't know how
00:35:19
great Eden is they they should still have a chance to uh to experience that because it's so much easier it's just easier we talk about Simplicity a lot
00:35:33
but hopefully eventually when you compose Simplicity you get easy solutions to hard problems that's the point right rapid application development
00:35:46
I just want to form right basic crud why do I have to spend all this time wiring up the the validation the authorization all of like all this work
00:35:59
just just declare what it is and Bam you're Off to the Races right now most people their relationship with data is I am going to go to some of one else and
00:36:15
they're going to hold on to my data and they're going to use it to generate more data and they may allow me to see some of that but the purpose of letting me see that isn't for my benefit per se it's so that
00:36:28
they can sell it to other people they can Market to me uh this Paradigm kind of stinks when you when when when the data can defend itself why not flip the paradigm
00:36:44
instead of bringing the data to the application bring the application to the data you give me your algorithm and I'll run it over here and I'll generate the data and maybe I'll share it with you you know
00:36:57
that's a better way at least for me as a user I like that and if joining data from many sources is cheap we don't need big data you can have
00:37:11
small data all over the place and just join it ad hoc when you need it that's better they say data is a liability when you get big data people
00:37:22
try to get at it small data is it worth it who knows and when you can share data freely cheaply you start to get a new class of problem
00:37:36
you start to say hmm where did this data come from where do who who who created this data do I trust that source so in flurry we we sign our transactions
00:37:51
and and Link them so that one you can tell that nobody's tampered with with the chain of data and two so you could tell where it came from you can query the Providence of the data because we store it as triples in in the
00:38:04
database right alongside the User submitted data and I jumped ahead to Federated computation that's what I'm talking about uh bringing that the the app to the
00:38:16
the data instead of the data to the app and this has been a wild ride making this for flurry we're currently in a
00:38:35
pre-alpha state for our Json LD support like we've got a database that works and people use it to solve real problems we're currently working on incorporating
00:38:46
Json LD into the very guts of it because we looked at this problem of sharing the data and we realized that we want to be able to do it really well and so it's
00:39:00
important to get The Primitives right like every time you see an application server a different one done a different way that's
00:39:12
like we don't build those servers just as ends in themselves the purpose is to for something else is to enable data to to work but the fact that we have to do it again
00:39:25
each time is because we lack the critical like base data infrastructure to make it cheap and easy and so we've got strong opinions about
00:39:37
that and we want it to be cheap and easy because it should be cheap and easy data is becoming more important all the time uh I want to say something about AI but I've only been paying attention for two
00:39:50
months and like they're doing stuff with data if we can't also do things with data wow who knows so foreign I said we're pre-alpha we're barely
00:40:12
pre-alpha like we're gonna have a release in like the next week or so so please come check us out um these are some links uh our website our docs you can find
00:40:22
that there this is our repo check it out closure it's written in closure it's actually written in cljc it's embeddable you can embed it into your closure programs uh it you can bet it in any jvm
00:40:35
language you can put it in the browser it runs on node closure has been amazing for this it gave us so much Leverage uh and I also want to put a shout out for
00:40:48
closure Camp um Daniel Higginbotham helped me a lot in putting this talk together it gave a lot of really useful feedback and so did my colleagues at flurry but Daniel's actually got a booth
00:41:01
in the back for closure camp for bringing people into closure which I think is a really great idea so I just want to give a shout out stop by the booth say yes I will help people learn
00:41:14
because it's friendly and fun that way um here's a bibliography if you want to learn more about containers this book The Box it's worth reading it'll really suck you in
00:41:27
um and how are we doing on time what we're out of time okay thank you very much foreign
00:41:39
[Applause]
End of transcript