Waiting..
Auto Scroll
Sync
Top
Bottom
Select text to annotate, Click play in YouTube to begin
00:04:21
we've come to the last lecture today and talk a little bit more about various kinds of data structures that we instantiating as cheese some of these are going to hear from some of these are
00:04:34
from other data Sciences so we'll let's get some contacts we spent past lecture talking about what art is that we're going to start talking about encoding different kinds of data structures
00:04:46
addison jeans building on topological models and then tomorrow we'll pick up with some computational invariants and then we'll pull it all okay the kinds of data structures we're interested playing
00:04:59
with we should address the following kinds of questions which is ask first of all how can I take those data structures and put them together and turn them into sheets have a sheet of I how do I
00:05:10
translate between cheese and other trees so this allows me to start talking about kinds of questions that I might have where I have data formats that we translated or filtering in some more
00:05:25
general fashion just generally speaking what can I do with this data structure okay so remind you what is a sheath so a sheep is something that allows me to
00:05:37
translate between physical sources or physical realms of data and physical regions so these are various
00:05:49
open sets or translation between them by taking a look at restrictions overlaps
00:06:02
and then inferring inferential well if I don't have some kind of consistent way to talk about translating data or
00:06:20
unifying the data that I end up with some kind of consistency checks being either ad hoc or not quite systematic or not fully useful and I ended up with the failure being able to cross without any inference
00:06:33
okay so we've mentioned previously this notion of flabbiness so if I start out the vertex weighted grabs with no further constraints into all that yes that gives me a sheep it's
00:06:46
got a non-trivial stop lying on the beach vertex all the restriction maps are zero and the resulting sheep is whether okay and that's not a whole lot to be said about what what is doing because it's not doing a lot on the
00:06:59
other hand if I haven't enjoyed graphs I showed you how to do that in a previous lecture naturally these things turned out to be cozied size and the data lives on the higher-dimensional cells extends
00:07:12
down to the lower dimensional cells in all the extension master zeros that everything matches up I think the appropriate co-chief ought to be called Co flabby I've never heard of that term
00:07:24
before but I strongly suggest people use it so this is new research it's a new so anyway we talked a little bit about Flo she is an interactive session the benefit of those online summary of what
00:07:39
a flow she is you start with a directed graph where you think of a graph a little more generally is a directed hyper graph kind of thing where I don't necessarily have to have end points on
00:07:52
all of my edges those allow me to connect to the external world either as inputs or as outputs and you look at the direction of the edges to the supply
00:08:04
some information about where a commodity is flowing so you can imagine these are amounts of water and these your pipes so I've got three units of water coming in from this edge two from that edge and what flows out is got to add up to 3
00:08:18
plus 2 which is 5 so one unit goes that way the other four must go this way this is some type of conservation law so this can be instantiated as a sheath once you realize that there are some constraints constraints being that the sum of the
00:08:32
inputs have better equal to sum of the outputs seem to be reasonable and is not so hard Doosan shape if you build this thing I see you can build it up like so
00:08:43
in the most general setting you should probably use some kind of semi ring and modulus and that all gets very nice the reason for doing that is simply to avoid negative numbers that's a lot of abstraction to avoid negative numbers
00:08:56
but it turns out to be the right abstraction you don't care about having negative flow values that's fine thank you faces real numbers so here I've got one real value input another
00:09:08
real value input output output in some subspace of that set of those four numbers instantiating the conservation law so typically what you'll see if I've
00:09:20
got actual conservation sitting right here these what these four numbers describe what's going on at this vertex this talk is four dimensional here I could in fact really say that I'm really
00:09:34
only assigning a a certain three-dimensional subspace out of this those that conserve above not so bad and you end up with restriction maps that are sometimes projections okay
00:09:47
fair enough the idea is that if I have something like this where I've got maybe three units coming in two units coming in one unit going out and five units going out with that that seems inconsistent today or perhaps four units
00:10:01
going out here three units coming in and ain't coming out that that applies five units here again that's inconsistent so that would imply that I have a local section defined on everything here here
00:10:13
we think substrate is attached so that either says either closed on conserve or there's been an every measurement it could be an element and measurement could be flow could be the wrong model
00:10:25
we don't know from the outset that it certainly is not consistent okay the neat thing with this is that you can in fact end up with different amounts of inferential ambiguities depending on how
00:10:36
much measurement can be started with so if you start with local sections that are defined on the black edges and these three different instances and you'd like to ask about extending to the red edges sometimes there's no possible inference
00:10:50
that is what we were just seeing in previous slide this from inferring from this vertex this number better before inferring from this vertex is better t54 is generally money
00:11:01
with five so in this case here there's exactly one possible inference is better before and here it's under trend for many possibilities it's the process of
00:11:13
extending sections can give you a lot of flexibility and a lot of richness and the kind of models that come out with the other end that's kind of nice yeah yes so you're right if in fact I'm
00:11:35
thinking of these is not the numbers but rather some other kind of objects then I really might be totally fine and in particular good anticipation Bayesian networks are a good way to manage this
00:11:47
but I would have some amount of uncertainty now what I have on edges are from ability distributions that sounds pretty decent so if I start with a set of random variables there they are consider the set of all joint
00:12:00
probability distributions August Sun is your spacing kind of thing then I've got these non-negative measures describing probability distributions effectively digital this is not a vector space if I add two
00:12:12
probability distributions I do not necessarily get probability distribution integral activity kind of gets in the way all right seems kind of reasonable but it's what
00:12:23
it is I said that if you have a probability distribution and another problem with a distribution if I try to add them together I don't know the distributions right that it's not
00:12:39
electricians definitely but what's kind of neat about this is that I can do other operations in particular if I have this large set of random variables I can marginalize there's a natural way to
00:12:52
translate from M plus one random variables n random variables how by integrating over that remaining random variable there's a marginalization well
00:13:04
if you start doing this if you marginalize over various other combinations of random variables and I get lots of different marshal ization maps in fact I get one for every you know possible arrangement here that
00:13:17
suggests that well I've got a lot of date in fact turns out this naturally gives me a Koshi on the complete nine simplex so let me show you what that looks like because this is a little bit
00:13:28
extra so first of all if I've got let's see there I've got my notion of let's take a look at three random variables what three grander variables do I have
00:13:46
x0 x1 and x2 okay and so what is the space of probability measures here it as soon as the space of probability measures on those three random verbs
00:14:01
jump probability random error strictly speaking these are functions from r3 bar not r3 up from that Cartesian product of these guys rather to the real numbers of
00:14:15
course non-negative real numbers such as the integral to 0 you have measurable functions of that sort okay so what I can do is I can marginalize out say what that function says I can marginalize out
00:14:28
the last one I can also marginal out marginalize out the first one or I can
00:14:40
marginalize out the middle of course now that I marginalize now that I can keep going right I can marginalize out in this case just said I've got the x1
00:14:56
remaining here just so I get the executing and here that I've got just the X 0 when you look at this diagram
00:15:08
you realize this is the attachment diagram of it completes two siblings there's zero I'll label it with the random very good question it's 2x1
00:15:24
and this probability distribution of all three joint random variables that's this building - simplex this edge the x1 x2 edge that's right here
00:15:38
the x2 x0 edge that's here the x0 x1 edge that's here so there this is a Koshi from this abstract simplicial complex of course I've got more random variables no problem I just keep building a larger simplex and build this
00:15:52
out now is this thing really does this diagram commute yes why famous theorem for Metis theorem that's right the statement that this diagram commutes is the Femina theorem if I integrate out
00:16:06
this way then the integrate along x2 and then X 0 it's the same thing as integrating along X 0 then X 2 so the fubini's theorem is actually pretty cool
00:16:21
and so you say all right that's marginalization that's sort of something that comes along for free whenever I have a collection of joint man what about if I talk about basis rule these
00:16:36
are well if you will goes the other way what does it do it says that I'm starting off with some starting out with a joint random variable distribution of n variables and I'm going to push it up to a joint random variable with and plus
00:16:50
one variables by way of some conditional probabilities as well and of course there you could write this out break generally by using functions of various sorts like this so are you going to skew nary missing a vertical bar and your
00:17:03
Center and I see C of X n given that's right in fact I'm saying that this function here is in fact OS we basically we're saying it's something that's something that's very much like this absolutely yeah
00:17:18
so the typical notation for major who uses this kind of the vertical bar saying all the things give it over here yes but really strictly speaking this is just a function of that plus one but one
00:17:32
special very special yes very special indeed so this gives me a way if you look at the same space what's going on here give me a way to reverse to go the opposite way so the result is that when
00:17:46
I specify Bayes rule on a portion of these spaces of random variables that are sitting here think of them as the stalks over some this same simplex it's defining a sheaf over a portion of it
00:17:59
make that very concrete very small example so here's two random variables I've got a given conditional probability here the two random variables are X and
00:18:11
y they live over the simplicial complex from X to Y and so when I go from the simplicial complex from X to Y here we
00:18:22
go here is the marginalization coach Eve in fact this is the marginalization function of these are binary random variables that is the marginalization function where I'm just integrating of course integrating summing here and the
00:18:36
conditional on the other hand what does it do it takes me from given X it tells me with the joint probability looks like so it's a matrix linear map that that basically multiplies out these various
00:18:49
conditional probabilities and if you notice these are all the possible conditional probabilities you might need so this particular construction is a paired sheaf coaching construction and that gives rise to what a page network
00:19:02
is then you then might ask is well what is then a solution to a Bayes net if I run this Bayes net what do I get well it's a section that is both a section of the sheath and the co-chief
00:19:15
so there's a sort of joint section I have to admit this is something that is not really discussed extensively at all in the sheaf literature factum they look for this and haven't found it much at all so I I'm I don't know whole heck a lot
00:19:28
of this construction but I have a funny suspicion that since we've got some now topological structure in the base that it might be a very useful direct to say I've got some topological structure that is maybe I'm not looking
00:19:42
at a complete simplex because I probably aren't and this is not a complete simplex here have some portion of that maybe I can use some information about I know this probability distribution I can push this direction in such a way that
00:19:56
will then allow me to build the rest of the time I don't know seems possible
00:20:18
that seems possible yes the way we have it that looks like the first thing next is P of x equals 0 and y equals 0 the
00:20:30
second and fourth is one one correct yep you're absolutely right but typically are here in the collection of orange channels and joints and you're trying to curse sorry marginals and conditionals yeah and you are trying to construct the
00:20:45
joint yes so in essence what you're trying to do is you're trying to flow upstream and there's an under constraint yes that's right algorithms they use these iterative algorithms to find some kind of they're consistent with those
00:20:58
yes so in the real world as much we miss your situation algorithms that use Junction trees which you look very much like some of the things you've shown here where you these
00:21:11
larger joint distributions or cleats sure there's gotta be a connection there were discussions actually the prior simplex meeting balls about factor
00:21:25
grounds absolutely as being very related absolutely so yeah I think there are a lot of connections here and there's a lot of good work done to link up that more topologically minded viewpoint and a lot of the existing lineage okay cool
00:21:39
any further questions or comments okay so here's a very different picture of things this is now the lying some of my signal processing heritage we're looking at filters from a topological standpoint
00:21:53
so asking the question you know what is a filter and how does it work and how could I tweak it so that it can do more topologically useful things so here's a simplicial complex model for a signal and if I want to continue with
00:22:06
signal for instance I've built a simplicial complex for the real line imagine this is going on forever up and down and I have then sitting over every vertex and over every edge some
00:22:19
functions function space collection of continuous functions and sitting over this vertex is functions that are defined over an open neighborhood of that vertex here these are functions
00:22:33
defined between 0 1 and then etc what does this mean something kind of like this it means that when I do restriction I'm chopping the function in some fashion chopping it down from a larger
00:22:46
domain over that vertex to a smaller domain over the edge and if I can paste all these things together as then they agree I end up with a continuous function in some sense this is the classical example of what a sheaf is if
00:22:59
you ask a mathematician what's a good example of a sheaf they'll say sheaf of continuous functions this is in fact one way to do that it's sort of a discrete version of that kind of Handy now what you might ask then is well what can I do
00:23:12
in this sort of thing well I don't ever work with continuous functions I mess around with them and I shuttle them about so let me tell you how do I do that how do I shuttle data around well if I have
00:23:23
data in this sheaf and data in this sheaf I have to translate them and the way to do that is to first ask on the level of topological spaces or level what do I do well maybe I map this vertex to this
00:23:37
vertex these two edges to this edge and that top vertex to the other conference and what I do is I take the data in those various stalks and make some linear maps now though a strange thing about this is that even though the
00:23:50
function on the level of topological spaces goes this way on the level of data it goes the other way and this is something that is not at all obvious but when you mess around with the various constraints it turns out this really the
00:24:03
only way to do it and so you end up specifying various linear maps to translate the data this is just a notional example so it's whatever happens to be built there now it's a
00:24:16
diagram being a homological algebra you look at this and you say well it must be a commutative diagram so every square every possible way to get from one place to another on this diagram if I follow the matrices it better commute get the
00:24:30
same answer okay cool so that that structure mapped on the level of Base spaces and then data Maps through the stalks in the opposite direction such that the resulting big diagram commutes
00:24:43
is a sheet morphism and why did I tell you chief morphisms well because they do useful things so if I have my sheaf of continuous functions here and I'd like to discrete sample my data this is
00:24:56
something people do in A to D converters and hardware what I can do is I can imagine evaluating this these continuous functions at specific points and what am i doing well I've got a map from my
00:25:08
space of sample data notice this is a flabby sheaf this is not a flabby sheath at least not as flabby in any way so the point being that what am I doing here
00:25:21
this is a linear map that evaluates this function at 1 this is a linear map that evaluates the function at 0 and the diagram more or less obviously commutes because like a lot of zeros in there this is a morphism
00:25:32
that samples this function now when you sample data you can end up with ambiguities which is kind of neat or kind of bad depending on your situation so if I take a look at each of these
00:25:44
functions this way well each of these are linear functions I can ask for what is the kernel or what what is the set of elements here that got set to zero here well those are in the case in
00:25:59
this case of this map here I'm taking this whole thing zero so everything got set to zero so that's the same space here this is a little less that gets sent to zero if you look at it graphically there the functions they'll pass through zero at the specified point
00:26:13
hmm this is seems kind of reasonable and the result is if you look at the sections of this sheaf call this the ambiguity sheaf these are all functions that cross through a zero at all the integers these
00:26:26
are functions that if you were to sample them what do you get you get just a whole string of zeros these are functions that have you add them to your data sample it this way you can't tell
00:26:37
if you've added any of these functions on there they're precisely the ambiguities of descent that's kind of a useful way to think about sampling sampling picks up the data up to the ambiguities so then any
00:26:50
kind of restrictions in this sheaf that you can place or constraints you can place on this sheath will result in fewer sections over in the ambiguity sheath okay that's kind of cool kind of useful and theorem perfect restruck
00:27:04
perfect reconstruction is possible only when I have no interesting global sections the ambiguity sheet in some sense Erving reconstruction is possible only when when there are no ambiguities this
00:27:17
sounds like Shannon Nyquist theorem in fact you can prove the Shannon Nyquist theorem through this language in essence what you do is you build instead of massif of continuous functions you build a sheaf of band-limited functions where
00:27:29
essentially what we're doing is is we just ask for these band-limited functions to be in the middle that's the chief we're going to sample and then we sample by wave and inverse Fourier transform standard way to do that
00:27:41
sampling here at the integer and using the inverse Fourier transform notice I've got a band limit there so go ahead do that I've turned this diagram on the side so here's the chief of van limited
00:27:53
functions here's the sample the sample data that I'm pulling out of it notice the sections are complex numbers because I'm doing this all in Fourier transform land that's fine these are my inverse Fourier transforms pulling out
00:28:07
the various values and then I get some ambiguities over here the ambiguities are again precisely the functions that vanish at all the integers when you ask alright how many of them given a particular band limit what standard
00:28:19
manipulation and you can prove that that band limit is too small you don't have it which means conversely that if you sample at the right frequency with band-limited functions you can do
00:28:31
perfect reconstruction standard stuff but casting language of sheets now why might you want to do this you might want to do this if I've got some more complicated data space well that's kind of interesting and it leads to the
00:28:44
question about filtering in general because sampling the kind of filtering so in fact theorem any discrete time the linear translation invariant filter standard sort of machine that people like to build in hardware well that can
00:28:57
be encoded as a sequence of two sheaf morphisms one that goes from the internal stage of the output one that goes from the into the state to the input again this is a question of the diagram the arrows looking a little funny you might think well gosh those
00:29:09
arrows ought to be going all this way but in fact what this is doing is think of these arrows as being data maps they're projecting data out got more complicated richer internal states some
00:29:22
of which projects out to give me the input some which projects out to be the output and in particular the internal state is not entirely determined by any one given input it's determined by some
00:29:34
portion of the inputs that internal state that I'm carrying around may in fact be something like a sliding window or maybe something that's been preloaded into that filter that will give you the ability to manipulate what the filters
00:29:46
doing in hardware these are typically built that you've got an input buffer and output buffer some kind of shift register and some kind of weighted sum or other nonlinear
00:29:57
fair enough okay so let me show you my construction how this thing works so you start out with the input sheaf looks kind of like this if you remember this is this is actually a simplification of
00:30:10
the queue example given in the previous lecture where instead of continuous functions were looking a time series and this is a flabby sheaf where I've got a real number at each time step and on the
00:30:23
intervening time steps or in the intervening time there's no constraints so these are what are the global sections of the sheaf they're just real number sequences output same thing just
00:30:35
real number sequences the input or the internal state rather the internal state is this cue sheet that we talked about last time so this is a 3d cube or a shift register it's storing three numbers and shifting them along as we go
00:30:49
against global sections our parametrized by sequences but they're their sequences that are grouped three at a time so I have three real numbers here I've retained two of them and add one more on
00:31:02
retained two of them add one more on etc essentially this is the contents of a sliding window now what I need to do is tie that sliding window to the input just project so this will say if I'm
00:31:16
looking for a section that is valid over this portion of this sheath what I will end up with so I will end up with the third entry of each of these time steps queues the third entry is tied to the
00:31:30
input is exactly the input now what about the output well some kind of function weight linear function on to translate these in now I could do something very simple like weight them all the same but I don't have to I could
00:31:43
weight that however I like or put some nonlinear function there or something ok great so that's nearly everything and then there's there's a bunch of 0 maps the thing to put in there and now this thing is a sheaf morphism pair of sheath morphisms and there we are and that's it
00:31:55
that's the construction so that's kind of neat of course that's that that's kind of handy and so for example you can look at the state of this filter at any given time that's the state of the
00:32:08
internal state no have ties to the input and the average of one one and nine is 3.7 cool okay and of course this extends so that's kind of
00:32:21
handy now you might say all right what why do this I mean it's a neat theorem told me need theorem well what it does for me is allows me to number one splice in nonlinear operations I didn't have to pick a linear operation yet not a sheaf
00:32:34
of vector spaces anymore even something else that's fine and it also allows me to play with non-trivial base bases and that's really very cool so rather than asking for a sliding window that looks
00:32:46
like a sliding window with time series I can pick some more rich spacing and in particular that's something I want to show so here's a very simple classical problem I've got a noisy signal and blue
00:32:58
I'd like to filter out a signal in breath of course this is a sinusoid is this totally solved problem if I've got a sinusoid that's not hard if I've got a
00:33:10
Terp so the frequency changes and I use that same filter that's obviously the wrong thing to do because it mangoes my signal badly so I do something like an adaptive filter those are great and this is knowing exactly what the
00:33:23
synchronization is if I don't know the second is Asian I get it I end up in trouble I have to now track that track that chirp along I don't necessarily know what that chirp is going to look like I've got a treat in my filter as it
00:33:36
goes and I again this is something people know how to do in some reasonable situations but it's good to think about this in a little more general way so let's see what we can do about this really what the problem is this here's
00:33:50
the signal and we're averaging along a contiguous sliding window problem is we're averaging some of the low values with the high values what's the average of plus one and minus one
00:34:01
it's not a less noisy estimate of plus one it's zero and that's kind of bad what you would really want to do is you'd really like to average all the plus ones and all the minus ones of course the problem is that I don't necessarily know where the plus ones are
00:34:15
so when they need to do is I need to figure out where I need to be looking if I can figure that out that I could safely do a lot more averaging so here's sort of the idea you take your
00:34:27
noisy signal you figure out where should I be sampling when that turns out to be estimating a topological space very much like what we talked about in the second lecture actually and then once we've done that that is the place you want to
00:34:40
build your internal state G for your filter once you've got the internal state sheaf built over this space now you can do your averaging passing through that topological filter and you get a much better output so roughly
00:34:53
speaking this is this is a character of what's going on although this is really this picture is true here's a function function over two variables and it's kind of periodic kind of looks periodic what you do is grab a neighborhood and
00:35:05
slide that neighborhood all over this function and grab some pixel neighborhood I don't exactly remember how many points it looks like they used a four by four grid and pixels and slid it around and looked at it and asked
00:35:17
that to be a vector in R 16 these are the first two principal components or three I guess what do you see well for one thing you see this sort of loopy structure this has got some prominent
00:35:31
loop in it but we're not actually even going to leverage that we're just gonna leverage the fact that this distance matrix telling me which points are we're telling me which points are nearby which other points that particular metric space it's going to work for my
00:35:44
topological space and I'm gonna start building my Feith over that using a Viator's rhibs complex when I do that here's what I get this is a distance matrix of sample to sample this is now
00:35:57
not over this picture but rather over this picture here I have a chirp so this is sample number versus sample number the diagonal this is the these are this is a distance matrix so the diagonal is
00:36:10
zero you can see that there's some periodic like structure but not completely because a given sample as you sweep through time is similar to other
00:36:22
samples these low distance points are the points that you really want to be averaging with so you build your topological space like that aggregate your data into this chief so now you're building the the
00:36:35
internal state chief built of neighbors in this topological space this is what you end up seeing as your organ this is a particular section of that sheath and
00:36:47
then we go ahead and average pass it through that topological filter and you get much better results in particular here's the output of that filter you get extremely stable output so you're able to really stabilize the output and get
00:37:01
very good filtering for comparison here's here's a filter that has been synchronized and adaptively tuned to it so see there's quite a bit of instability there yeah
00:37:23
mm-hmm that's right this this was all after the fact that's right these other guys are straight students you're right there is a difference in algorithmic framework and I won't claim that the
00:37:37
comparison is a hundred percent fair sure you have all the data yes absolutely yeah and in fact the typical
00:37:50
reason why I was going down this path is because I was using chirp sonar where I had all the data stored anyway seems kind of reasonable yeah so in which case I I could process the data at my leisure yeah yeah this works I did this on one
00:38:04
original example this works very well on two-dimensional example here that building adaptive filters is a much more dicey business here's a noisy input image I'd like to denoise it it's additive white Gaussian noise applied on
00:38:15
to some quasi periodic function if you do something stupid like a fixed still fixed frequency filter that doesn't work obviously but this topological filtering approach does a pretty decent job cleaning it out I actually have a hard
00:38:30
time finding much much of any visible noise in here of course at this point we should quantify performance if you look at since I've got the actual signal that went into this I can measure the actual
00:38:42
error as a function of noise so as I crank up the noise here's an SNR of 1 this blue is just a fixed frequency filter not necessarily the best choice but certainly a reasonable choice if you
00:38:54
don't have anything better using the topological filter I and estimating the topology as I suggested to get this green line if you happen to know the topology outright which is not
00:39:06
quite as cheating as you might suspect because I'm not knowing the synchronization or the timing but I know that I'm looking for something that is perhaps periodic or feels like it's periodic it's a deformation of periodic then I do
00:39:18
a little bit better so this quantifies the amount of topological estimation error that I've gotten you can see it grows more or less linear route noise and as you mess around with the filter sizes and say that it's basically similar linear progression this is not a
00:39:31
bad filter at all all right so questions on this yeah yeah
00:39:45
so what is what is harder about that ends up being this topological estimation stage requires getting enough data that you can be reasonably certain that you've got the right topological space and there are that the difficulty
00:40:00
right now is actually the theoretical bounds versus the practical balance on that problem are very very different fact the theoretical bounds are extremely loose then the theory says
00:40:12
that you need a heck of a lot more data and a lot cleaner data than what appears to be the case they're almost always worse case bounds and they're really bad so understanding exactly how much you
00:40:24
can rely on that performance is something that really not very well understand other questions okay well that is all I have so next up next
00:40:38
lecture tomorrow will be on some linear algebra and homological algebra and data types then goes under the name kind of category fication which is a it's a mouthful but it's a useful concept and
00:40:50
it turns out to be essential when we talk about heterogeneous data the other observation is that there is an informal social down the street we will happily
00:41:01
welcome you all to there are some references yes are there applications yeah yes in fact in fact that that is much of the
00:41:32
theory behind this house on the philtrum so we're looking at networks of signal processing yeah that that is a good question
End of transcript