Video: IPFS Connect Istanbul 2023: Event based Mutability on IPFS (DocDrop)

00:00:05

hi everyone uh my name is Joel and um today I'm going to be sharing basically a lot of experience from working on ipfs for the last six years or so um and in particular I'm going to be talking about

00:00:17

how we can build systems for mutable data on top of ipfs so isn't like if we think about ipfs like isn't immutability the point like we want to be able to know the things haven't

00:00:30

changed um that's sort of what we get when we put files into IPS we get a hash of it sometimes referred to as a CID um however this this basically means

00:00:42

that every time we update some piece of content we get a new hash and so we don't have a way to keep track of the changes that have happened over time ipns is a system Ser for like IP naming

00:00:55

system is a system that allows you to essentially have a way of have mutability um however it doesn't really go far enough there's a bunch of properties which I will mention later that we don't really get with ipns um and blockchains are great for

00:01:09

more sort of financial use cases but um they don't really do uh the type of scalability we get with something like ipfs where we can just like have data living at edges and uh we don't need

00:01:22

everyone to replicate the data of everyone else uh so what are some of the requirements we have for or mutability in this type of decentralized

00:01:33

environment well first of all we want um something called self-certifying Authority and that's basically the ability to know when I synchronize a piece of data that that piece of data actually comes from um the source which

00:01:48

it claims to come from um so for example if I author a piece of data and send that to you you can cryptographically verify that that is actually coming from from me directly

00:02:00

uh it would also be nice if we can do collaboration and conflict resolutions on top of this mutability system and finally access control so that um we can control who can update certain and

00:02:12

modify certain objects and so why do we need this mutability system well we want to be able to build different sort of databases whether that's web scale uh similar to like web 2 like databases

00:02:25

today or we want to build more peer-to-peer databases that maybe live completely in the browser and are uh sort of gossiped around nodes directly we might want have data feeds that are

00:02:37

sort of dynamic and like high traffic so that could be from capturing vehicle data or um weather data or all kinds of different types of uh places where we

00:02:50

just output a lot of data at the same time we also could use this sort of mutable system for compute pipelines so like how we can tie different computational networks

00:03:02

together and so there are actually a few efforts that have been trying to do this over time in the ipfs ecosystem uh so orbit DB is is one fairly well-known one um and query is

00:03:17

one that is not really around anymore uh to the right here you have blue sky is probably the one that's most deployed right now but it's like very focused only on like social networks it's not really General uh there was an effort called IP

00:03:30

DF um which was also trying to build this like very generalized version of a data feed uh but none of this have really like completely taken off and then they have like different

00:03:42

tradeoffs um and they don't meet all the requirements that at least I had when I was wanted to build things on top of this or create some or build with some sort of mutability layer on top of Ip

00:03:54

adds so what are those requirements well first of all authentication should be easy if a user comes to an application they shouldn't need to like set up some new kind of wallet or

00:04:07

um think about how they manage their keys uh that should just seamlessly be handled um by the application and implicitly buy the protocol

00:04:19

itself key rotation is an important aspect um and tamper resistance those are kind of like very bound together so basically what that mean means is um I

00:04:31

should be able to see when uh a key has been rotated out everything that was signed with that key before it was rotated should still remain valid while things that are assigned with a key after it was rotated should obviously be

00:04:44

considered invalid um and time per resistance means simply that uh if something was authored at some previous point in time I don't want the author to be able to go back in

00:04:55

time and change that data or add to that data um in sort of like a malicious intent way finally we need this um mutability

00:05:07

system to have data synchronization so we want to be able to synchronize it across uh nodes and that could be in very very different settings for example it could be in a peer-to-peer way

00:05:18

directly between browsers or uh users local computers or it could be uh in a peer-to-peer way between like servers and like larger scale deployments and so yeah I'm basically

00:05:31

going to dive into how we're solving these problems with ceramic and so ceramic is the protocol that I've been building for a few years um and I've basically drawn upon all these learnings that I've had over the last six years in

00:05:43

the ipfs ecosystem so first of all authoring data I think a key thing to realize here is that users generally already have wallets that are cryptographically

00:05:54

enabled uh that could be like a cryptocurrency wallet um where basically probably have some extension or some Mobile Wallet that holds a key uh but there are also things like pass Keys which is this new standard developed by

00:06:08

some of the larger companies that are making phones or uh browsers that basically allow you to do sort of a single no password sign in to any type of web 2 service but they could also be

00:06:21

used to author data in in the in the web 3 sort of context and one thing I've uh discovered recently is that newer passports actually also have a cryptographic key embedded in them that

00:06:34

you can use to author data with and so for example you could delegate a key from a passport to a session key and basically author data on behalf of your identity and you could sort of wrap that

00:06:46

up in a s knowledge proof so you don't disclose any personal information about information about who you are and in order to make this work seamlessly um if you think about if you ever used like a web three blockchain uh

00:06:59

you realize that every time you make a transaction you need to sign a message with your wallet you get a popup if you're using like a feature application you do a lot of interactions you edit content and so on you don't want to sign

00:07:11

a message every time you make a small edit or you make an up Vote or anything like that so in order to solve this problem we're using something called object capabilities or oaps for short

00:07:24

and uh we're using a format called cacal which is basically a way of wrapping different object capability formats So currently it supports signing with ethereum supports ukan uh supports Pass

00:07:36

Key authentication and it's it's like really extensible in that way um so what does the ux flow look for like when you do this Alo suppose the user have some wallet uh they come to the application the application

00:07:49

generates a random session key this session key can actually live in the secure Enclave of the browser so there is this web crypto API that allows you to create a key that's not extractable by the applic application context then

00:08:03

the application basically sends a message to the user's wallet saying hey um can you give this session key permission to uh write to this resource on behalf of uh the user and so the user

00:08:17

signs this message uh most commonly sign in with ethereum is used and this message basically contains the the resource URL that the session key is allowed to to edit and write to

00:08:31

now once the application gets a sign message back the application can use the session key together with a signed message uh and invoke and basically like make edits on that resource and as long

00:08:44

as the session key signature is valid and the signature in the object capability is valid you can sort of prove the whole chain from the uh users uh account which is a did PK in this case that controls the resource and you

00:08:58

can see it's sort of the full chck back to the session key that actually is trying to edit the resource and so this is sort of like the minimal example of this but you can see like how this can enable a good user experience uh but

00:09:09

object capabilities are really powerful you can use them to uh essentially create longer delegation change chains uh and basically delegate between multiple users and so on and and have

00:09:21

like basically ways to limit the capabilities as well uh so let's talk about the data structure of these event logs that we're using in ceramic um this is kind of the high

00:09:33

level overview uh you can see this link list here is basically hash link data structure that's stored in ipld there's an init event that's basically the event that creates this event log there's time events which I'm going to talk about a

00:09:45

little bit more later uh the data events is basically just like updates uh signed updates to this event log so let's double click on the actual data structure of this uh these events so you

00:09:56

can see here we have the inet event to the left and the uh data event to the right so it has some header basically a controller that says who is basically allowed to author data into this um

00:10:10

event streams and then there's a piece of data and then you can see there's another structure right below that that's called uh Daws and that basically is a another structure for encoding the signature and then we have a very

00:10:23

similar thing on the data event the data event also includes the ID of this event stream which is BAS basically the the hash of the initial event and has a previous pointer so you can have this like linked list type of uh

00:10:37

structure and similarly this data event also has this signature that's like detached and is separate and this points to the the payload itself okay so like this is okay like it

00:10:51

serves the purpose but uh we have it's not optimal like it's not easy to understand what's going on why are there like these two different events and so on on uh it would be nice if I just had one one piece of data with a signature

00:11:03

that used to represents an event and so we've been working on the standard called varig uh together with fish and the spruce uh and this is done in an organization called chain agnostic

00:11:15

standard Alliance and vsig is a really neat way to encode signatures it basically allows you to encode any type of signature um along with an ipld

00:11:27

object so the way it works is that the the signature itself is encoded as a by string and it has a codic for which key was used to author the signature uh and has some additional

00:11:41

sort of um parameters for that key type then it has an encoding codic which basically tells you how to take the data and encode it as a b string before you

00:11:55

apply the signature validation algorithm and then there might be some addition or or rather at the end you come you have the actual signature byes and so how this uses in practice is that for

00:12:06

example if we have a JWT signature uh we can take the JWT object and code that the payload of that J JWT as an IPL object which is just like jws are Json

00:12:21

essentially and so when someone synchronizes an object they can see it's like a pure IP um encoded object so it's like similar

00:12:32

to the say init payload here um and then there's an additional property which is basically signature and you take bytes out of this

00:12:45

varig you look at what the encoding is uh you take the iplv object and you encode that so in this case it would be like a JWT encoding and so you take this JWT encoding and code ipd object as that

00:12:58

and and then you just know how to validate the signature and the nice thing about this is we can encode essentially like any type of signed uh signature format so JW thesis one we can

00:13:10

encode ethereum signed messages so there's two different standards here one is uh 1 191 the other one is 712 um we can encode w3c verifiable credentials we can encode this this new

00:13:24

ietf also has a separate verifiable credential standard but it's really like No Limit as to which types of signature you can match this so This basically makes it possible to append a signature to any I data using any

00:13:38

signature format you want so what this means for the data structure in ceramic is that we can essentially simplify the data structure I showed before where the Ed event is just the header some data and then the

00:13:50

signature is a varig at the end and similarly the data event is just like the same data we had before plus the signature and this is is like much easier to work with and much easier to understand what's going

00:14:04

on okay so next up I want to talk about tamper resistance so how can we ensure that the data that's been added to the system has not been tampered with uh because if if it's just a single file

00:14:16

it's easy right because if we someone has tempered with the data the hash doesn't actually shake out we would get different hash and we know that someone tempted with the data but if we have this event log of data uh and someone

00:14:29

creates two alternative histories how do we actually know which of those histories is the legit one or which one was authored first um and here we can see like Yeah we actually want to know when this data was

00:14:42

published um and I mentioned the object capability before these object capabilities are normally created with an expiry date so maybe this object capability uh works right before um you

00:14:56

know 5:00 and right now it's um 430 so if I author the event right now yeah it should be fine we can we can see this valid but if I author an event using the same object capability an hour from now

00:15:09

it's obviously not like valid so uh that should not be seen as valid uh but if I synchronize this data structure you know two days from now I don't really have a way to know

00:15:23

which time the event was actually authored at if we don't have some type of external source of of time and so this is where we introduce the concept

00:15:35

of a Time stamping system um and essentially what this is is something we're calling a blockchain anchor uh so right now uh we are running in what we call an anchoring service and

00:15:48

the anchoring service takes a bunch of events that have happened in the network builds a Merkel tree and sends one transaction to the ethereum blockchain which basically timestamps all of the events on under that Merkel tree at a particular point in time and so you can

00:16:01

see that represented here here's the Merkel rout we send that to ethereum and here at the at the bottom we have the events that was actually included and so the time event basically

00:16:13

what we include in the event log that I just mentioned previously um the time event is basically points to the previous event it has a proof uh which is described here and as a path and so

00:16:26

the proof is basically this object and points to the root and the chain ID which basically points to like hey which blockchain is this proof on um so in our case it's ethereum minute transaction

00:16:39

hash which is basically just the hash that was included in the blockchain and type basically describing how we can go from the smart contract to retrieve the the the C ID of the Merkle rout and the

00:16:51

path here basically describes the path in the Merkel tree that you need to take to verify that the prev here like the previous event cad actually is the same as the one that's pointed to from the blockchain anchor so if we go back to

00:17:05

this we can see that these time events have this proof metadata that just talked about and the blockchain transaction basically through the Merkel tree points back to the previous event should which should be the same as the

00:17:16

previous event from the the time event here um and this is actually also a standard that's standardized within the chain agnostic standards Alliance uh so you can go and read I think it should be called CIP 168 or something like this

00:17:32

not sure what I didn't include that the name of that in here but yeah so this is not something that's ceramic specific it's actually something that we try to standardize that other people can use in their protocols all right so finally we get to

00:17:46

data synchronization so in a protocol like this we want to be able to synchronize one event log if we want user Centric data so uh a user comes to the application they write about as bunch of data they want to be able to

00:17:59

have that data not only the application maybe in other application or in their own local machine I should be able to synchronize only my own data uh I would also like to be able to synchronize multiple event logs so there could be a group of users that want

00:18:14

maybe are editing some information together and finally we also probably want to be able to synchronize all of the event logs so that could be like an application that just want to have an index of all of the data that the

00:18:26

application has generated over time and to solve this problem uh We've implemented something called Recon and so this is based on a paper called range-based set reconciliation um and essentially takes

00:18:41

the approach of splitting all of the events into an ordered list um and then allowing you to synchronize any subset of that range um and we've implemented this as a

00:18:55

subprotocol for lip and this essentially allows us to synchronize any growing set of data that is authenticated in in the way I that have this like self-certification

00:19:08

property um and in practice what it means for ceramic is that we can select data based on essentially schemas of of data models so if your application has a specific data model you can synchronize

00:19:20

only the data that's relevant to your application and you can even like Shard that arbitrarily as your application grows so just to give you an idea of like how we sort the data um we basically have a

00:19:36

network ID for like which ceramic network is this like the main Network or is it the test Network this sort is basically based on a header property in the event stream then we have the controller essentially the identifier of

00:19:49

the user that created the stream then an identifier for the stream itself height basically like which the height is basically the the number of events in the event stream that have been created

00:20:02

and finally you select the hash of the event itself so this allows us to um deterministically order all of the events not only in one stream but across multiple streams actually across the entire network we can order all of the

00:20:14

events that have ever taken place and so let's look at how this protocol actually work in practice so here we have two computers uh two people using uh the Recon protocol to

00:20:27

synchronize uh is names of animals so in practice what we synchronize is actually be more like hashes of of pieces of data like cids um but here we just going to demonstrate syncing these these small

00:20:39

strings so this computer has um the ape the eel the fox and new this computer has the bcat do eel Fox hog and so this computer actually wants to get to sync so it sends a message to the other

00:20:52

computer that includes AP it includes the hash of the two intermediary value values and includes the last value so you can see like the first the last and the hash of the things in between um the other computer receives

00:21:05

this message adds the thing it didn't have to its own uh store and then it sends back basically it splits it it sees that okay I don't have the same

00:21:16

hash as this for the values so I'm going to split the range in two and send back the split ranges so still start with ape but we we actually stop with hog instead

00:21:30

of new and then we send the hash in between the middle value uh which in this case is bcat and the hash in between the other values which in this case is e

00:21:42

Fox um and when uh the first participant gets this back it can see like oh here's some new values that I haven't had added to my store but I still can't compute the same hash then I split that range in

00:21:55

half again uh to to to send values back that I don't know if the other participant has so in this case um it looks like this is the values we

00:22:07

have in between here like do and hog and between ape and do we don't actually have any value here and finally uh this now we can see like the things that are missing and send those

00:22:20

back and at this point uh this computer seems to be completely synchronized with the other one so we send the first value the last value and the hash in between and to finalize everything uh the same

00:22:33

message is sent back and we're now actually completely synchronized um so this this protocol is really nice because it allows us to synchronize these data sets even as

00:22:46

they're um growing and mutating and it actually doesn't imply any intrinsic data structure internally in the clients so normally in in in

00:22:59

these peer-to-peer synchronization protocols you have like an internal Merkel tree representation to be able to like go down the tree uh this protocol doesn't actually require that this hash function that's used here is called an

00:23:11

associative hash function which means that um see if you can find a shorter example which means that here for eel Fox and G uh if you hash eel Fox and go

00:23:23

independently and just add them together you'll set you'll get a value and if you if you do that in any order you'll get the same value um and this allows you to not actually enforce a Merkel tree

00:23:37

structure on the storage for that node you can actually have different clients that Implement different tree structures for different optimizations in our implementation right now we actually just have an SQ light store and do a

00:23:51

query over that to select all the things and do use the sum function in SQL to sum all the hashes together um and so this gives just the implementers of the protocol a lot of flexibility which is

00:24:04

really cool um yeah our implementation of this right now is based in Rust um it's based on an implementation of ipfs called Beetle uh which the iro team implemented

00:24:17

earlier uh before they moved on to like a completely separate code base from ipfs um and the reason we're doing this is we're trying to migrate away from right now Ramic has an implementation on

00:24:30

uh based on Kubo and based on a JavaScript client that talks to that we want to have more control over the L PTP layer and so we actually have um now like implementing this rust layer and

00:24:44

right now we're sort of mirroring what Kubo does but we're actually going to remove probably the the abil the the function in ipfs that like publishes every object to the DHD because that's

00:24:56

not super efficient and instead just like rely on this uh Recon protocol to synchronize data across the network all right so to recap uh ceramic actually provides e EC authentication be

00:25:09

can because you can natively support wallets uh that is being extended to support things like pass Keys potentially things like passport in the future I think that would be super cool we're also looking at ways

00:25:22

to uh author data based on onchain ethereum state so for example uh one thing that we are getting a lot of requests for is basically having smart contract accounts author data into

00:25:36

ceramic and so there's there's various ways of basically taking Shane state from ethereum creating a miracle proof of the state of ethereum and using that as an object capability inside of

00:25:49

ceramic so you basically delegate from onchain to a session Key by just looking at the state of ethereum and since we do this time stamping we

00:26:01

can enable key rotation and temper resistance basically if uh yeah the example I mentioned before where um the object capability session key was only

00:26:13

valid until um 5:00 if I author it now and it get time stamped before now it would be considered valid if I author it in an hour from now and times stamp it there's

00:26:28

no way I can like time stamp things in the past because I then I would need to attack the entire ethereum blockchain and finally we have this neat um Recon synchronization

00:26:40

protocol all right so how can you use this these Primitives that we've built and so right now like we've built a a protocol on top of ceramic called um

00:26:55

compos DB essentially is like a graph Q database um that you can use to like build fairly feature applications uh but we are also want to

00:27:07

separate this like database system we built from this lower level event streaming system and essentially enable other people in the ecosystem to build different types of

00:27:19

databases uh it would be super exciting to see this event streaming interface being used not only to build this type of um larger sort of composed B is basically aimed at like applications where you

00:27:32

store all of the users data in like sort of one big index it would be cool to see people experiment with uh more peer-to-peer databases based on this event log structure where you basically

00:27:44

synchronize State between browsers and I think you can build these sort of databases on top of ceramic um but right now we're basically focusing on this specific use case um but since we have

00:27:56

this vision of like other databases being built on top we're separating ceramic event streams from these database indexes and so we're in the process of Designing a new a new API it's called the ceramic API and you can

00:28:09

comment on this API on c137 um and yeah any type of feedback would be um very appreciated all right thank you