Waiting..
Auto Scroll
Sync
Top
Bottom
Select text to annotate, Click play in YouTube to begin
00:00:01
all right well I think we'll go ahead and get started so hello everybody my name is Aaron standard I'm um one of the creators of aqua.net and in the sort of.net and Technology space that's kind of what I'm
00:00:13
known for uh this talk will mention it but it's not really about that it's about software architecture and some lessons that I've learned over the years primarily by doing things wrong so we're going to talk about
00:00:25
what I think is essentially a style of well not really a style but more like a philosophy of software architecture that will save you a lot of time Agony and most importantly money and risk down the road
00:00:39
so what is technical debt well technical debt is essentially when you have a baked in let's say architectural Choice made early on in the life cycle of a project doesn't
00:00:51
necessarily have to be software if anyone's tried installing CAT5 in a really old house you know what technical debt is like in the physical world um the idea behind this is that if we go
00:01:03
ahead and build let's say a large-scale piece of enterprise software could be a line of business application could be something that's customer facing all of them end up kind of in the same place which is that let's say you go ahead and
00:01:15
build a database driven application which is probably the most common style of software application that's out there you have your relational database that has data that has table schema that has
00:01:27
stored procedures and Views and then on top of that you go ahead and have let's say your reporting and your object relational mapping and then you might have your UI layer then you have your business processes that either your
00:01:39
customers or your internal stakeholders use and then on top of that you have all the historical data business value and let's say built-in systems like human run systems and expectations all resting
00:01:52
on top of that so if you want to make a significant change down here that is going to propagate all the way up through all the layers sitting on top of it that is the essence of technical debt so if you
00:02:05
wanted to do something like take a stored procedure that is 2 000 lines long and before you laugh and say that's unbelievable uh no it's very believable and happens way more often than you think and probably involves
00:02:18
horrible stuff like CLR data types and things like that so if you want to make a change down there you have to find a way to essentially price that change in to all the layers above it and that is technical debt and that's where the
00:02:30
expense really comes from it's the layering of basically additional layers on top of lower ones right so technical debt when you're beginning with the Green Field project it's like a newborn baby
00:02:44
it's totally innocent doesn't have any history and has is nothing but pure potential right that's sort of our Clean Slate that we're starting with well when you start making architectural
00:02:56
choices really early on in the project that is where technical debt is going to be born initially it begins right there interest they kind of use a financial analogy here
00:03:07
interest on your technical debt occurs in the form of layering so if I make one technical decision very early on the life cycle of a project let's say it's the decision to build for let's say we
00:03:20
take this back 10 15 years ago and I decided to build my e-commerce system on top of.net framework well that also means that I've made a decision to keep my infrastructure married to Windows it means that all my developers need to buy
00:03:33
Visual Studio subscriptions and it means that we're probably going to be you know not able to this is all different now of course these days but back then those are some of the types of technical debt that would accrue and you know other
00:03:46
ways that could potentially compound in the future is in the form of let's say unknown or unanticipated events that arrive a good example I remember really clearly of let's say unanticipated technical debt with
00:03:59
the.net platform was back in around 2010 2011 web sockets first emerged on the scene and this is one node.js was first taking off and kind of becoming a popular alternative to Ruby and asp.net
00:04:14
and other Technologies and one of the things that helps node.js really take off was it had by Far and Away the best websocket support out of any platform that existed at the time well us.net developers were thinking wow
00:04:27
we would love to have all those same benefits of server push but guess what because our infrastructure is married to Windows in order for us to get websocket support we actually had to wait for a Windows Server patch to get released and
00:04:39
that also required us to get an is patch which required us to also update the version of.net framework we're using and required us to update the version of asp.net and you kind of see that's an example of an unanticipated change and
00:04:52
just something that kind of got priced in it's still technical debt nonetheless the purpose of this talk though is to talk about technical debt you can anticipate some new technology coming
00:05:04
out of left field in the browser space and having and not knowing necessarily how closely your web stack was married to your operating system okay that's not your fault but the failure to account
00:05:15
for problems you can easily see coming down the road to five ten years in the future that is something that you as an architect need to price into your designs sooner rather than later and that's really kind of the essence of
00:05:29
optionality technical debt it's full amount is kind of expressed at the time you have to make a change if you have to do something like migrate from.net framework to dot net six you're not going to know what the full cost of that
00:05:41
is until you start getting into the weeds and doing it right it's not very easy to predict ahead of time what we're going to talk about today is this is a concept that originally is kind of in finance it's known as
00:05:52
optionality uh stock options or Futures contracts are really good examples of optionality where someone can purchase a call option it's a right to buy a stock at a fixed price at some future point in
00:06:04
time call options are really useful if you think that the value of a stock is going to go up at some point in the future or you know I'm from Houston this is like the energy capital of the United States people buy oil futures for
00:06:17
different types of petroleum products and they do that in order to try to keep their costs down they basically know I'm going to be paying a hundred dollars a barrel for a price of oil uh at you know some point in the next 12 months and if
00:06:29
the price of oil goes beyond that I save a whole bunch of money that way so the idea is that I pay a premium today I have to put in some money now to get that future and then I have the right to exercise it at you know basically up
00:06:41
until the option contract expires in terms of software with optionality we pay for optionality by planning ahead and we exercise when we know that our business and our software need to adjust
00:06:55
accordingly in order to meet some business goal so give you an example here in terms of optionality the further you are outside the circle the more optionality you have in a green
00:07:07
field project you have nothing but options no code is yet written no infrastructure is yet picked unless you have an I.T Department that's dogmatic which some of us have but no infrastructure is picked there's
00:07:19
no schema there's no customer data all we have are our requirements and the set of and basically our experience that's the only real thing you're kind of bringing into that project right well once you start moving further down
00:07:32
let's say you get a new product that has only a few users you still have a lot more options for how you can make changes to that product than you do if you're in this middle circle right here which is where you have a mission critical product
00:07:45
you know so for example uh some of my customers at Peta Bridge these are all people building large-scale event-driven applications some of their products are responsible for I've got one customer that basically has a manufacturing line
00:07:58
that is worth probably close to let's say 50 billion dollars on there and it's all being automated by awka.net and a bunch of other pieces of Technology they do not have a huge amount of options for how
00:08:11
they want to change that because they have to roll it out to a bunch of factories and they have to change all their business procedures and not to mention the amount of disruption it would cause if they did that so that's an example of a product that doesn't
00:08:22
have a lot of options unless the developers built them into their architecture which is what we're going to talk about so the big idea behind optionality is that when you're starting here with a Greenfield project and you don't have
00:08:37
any constraints it's to plan ahead for how your business might possibly change in the future and being able to go ahead and pay a little bit of Premium now in order to make sure that adjusting to those new business realities isn't a
00:08:50
tremendously expensive project later right so technical debt is basically the destruction of optionalities what that really is where if I have a whole bunch
00:09:02
of customer data and a bunch of business processes and a bunch of built-in Knowledge from let's say you know if you're running an insurance business all of the claims adjusters know how to use your internal software today if you want to radically change that you have to
00:09:15
retrain all of them that's not as simple as just a software problem anymore it's a big business issue so the technical debt that we occur typically happens when we fail to plan for how software might evolve in the
00:09:28
future now I'm going to give you some examples of that from my own personal experience in just a second um a really good example of like a rapid-fire technical debt accumulation tool is database driven development and
00:09:40
the reason why is that unless you are really confident your requirements aren't going to change or you have made a really simple schema that's very fast and very flexible you're going to
00:09:52
basically be bound by whatever the development constraints are of that database for a lot of applications that will never realistically be an issue because maybe there are auxiliary systems that don't get changed very often maybe they're very low traffic or
00:10:05
maybe they're in an area of your business that's fairly stable and you're not anticipating a lot of change in the future however if you're from startup world like I am where you're launching let's say new products and you're not totally sure what product Market fit
00:10:18
looks like meaning that your business is probably going to adjust a bit over the next couple years this can be a disaster in the making and that is exactly what happened to me and we'll get into that in the next slide
00:10:29
ah these are some of my favorite Twitter fights I get into about once a quarter whoever needs to switch databases quick show of hands in this room who has ever switched a database before in a
00:10:42
production system okay the few and the proud thank you um switching databases should be rare if you find yourself switching databases every year please fire your architect
00:10:54
um but there are cases where that has to be a viable solution and a lot of the times the reason why people get into the whoever needs to switch databases Camp is because they've painted themselves into a corner we're switching databases
00:11:08
is never a feasible option in other words it's a self-fulfilling prophecy if you believe that you should never have to switch databases you never make the you never create the options for that is even viable
00:11:20
um on top of that things like the repository pattern gets bashed all the time probably because people try to make a repository like a one-size-fits-all thing that's not a great idea but
00:11:32
abstracting your data access layer behind some common abstraction uh both that's sort of let's say tailored to a specific domain and very narrowly scoped actually can be really valuable as a former creating optionality and that'll
00:11:45
come up in my example the basic idea behind this is that future technical debt can be mitigated by making some architectural choices today so I'm going to use an example from my
00:11:56
last company marked up we did real-time analytics and marketing automation for developers who are building apps for the windows storm so Windows 8 Windows phone and eventually uh win32 desktop applications
00:12:08
so we're a real-time analytics startup um we originally this is like 2000 yeah 2012. we originally built our minimum viable product so our first go to market Solution on top of ravendb using their
00:12:21
mapreduce indices these worked great for doing real-time Analytics and I had read ayinde's blog post basically knocking the repository pattern and calling it an anti-pattern
00:12:33
you shouldn't use it blah blah blah so I bought in on that you know Hook Line and Sinker and basically went and had Raven DB calls everywhere in our system in our right pipeline in our reporting system
00:12:46
in our user registration system uh we basically said you know why bother having a repository Allende is right this is you know going to be a a way to make sure we don't have unnecessary abstractions inside our system
00:12:58
well as I mentioned down here we were way more successful with our early customer acquisition efforts than I thought we went from about ten thousand events per day and just in our analytic system to about five to
00:13:11
eight million events per day in the span of a three-day window so that's like a what four hundred percent uh like day over day increase for three days in a row and the amount of traffic that we are producing should
00:13:24
have been something that Raven DB could handle but alas it could not as a result of some of our architecture decisions our database logic was spread out everywhere and Raven DB could not basically keep
00:13:37
our reports up to date and this was during a critical window where we were raising Venture Capital money and trying to basically prove to the market that we had a viable solution so I was staying up until four o'clock in the morning and
00:13:50
getting up at 10 A.M the following day for about three weeks trying to basically furiously pay off this technical debt so we could successfully complete our fundraise and not lay everybody off
00:14:02
so the real risk here was because we hadn't taken made any effort to isolate our database this decision away from the rest of our domain logic around transforming events managing customer data and so forth and we had also
00:14:16
depended on a raven DB specific feature these mapreduce indices both of those choices coupled us very tightly to our database and there was nothing we could do to make Raven scale we even wrote Our
00:14:28
Own replication system and our own database migration system because Raven's built-in tooling utterly failed at both so we threw everything we could at that and we had a uh I think a 64
00:14:39
core database instance trying to go ahead and process all this and if I showed you the graph what we basically saw was CPU utilization at 100 percent disk utilization at 100 memory at like 1
00:14:52
12 the amount it should be using it basically wasn't using memory very efficiently so this created a really high risk situation for our business and we had to basically do an emergency migration off
00:15:03
of ravendb to something more scalable otherwise our business would fail and we would not be able to successfully complete our fundraise so here's what we did in the span of about yeah three to four
00:15:16
weeks we refactored our system uh to an event-driven processing model is what we did and I created a piece of middleware that could take all the events the clients were sending to our HTTP API that was another form of technical debt
00:15:30
but in this case it didn't really affect us too bad uh we had to make sure the same events that were already embedded in those apps could get reprocessed in a new system and produce the same reports so we created some middleware that could
00:15:42
translate those events into an analytic Delta basically it's a sort of a way of incrementing a counter without knowing what its full value is and we separated our read and write models into discrete services that were
00:15:54
abstracted away essentially repositories uh well services plus repositories internally um the other thing that we couldn't really do very well with our original implementation is because we were so
00:16:05
closely married to our database we didn't really have the ability to unit test we had to integration test everything with the Raven DB local instance up and running which again made it very difficult for us to start making
00:16:17
this change trying to move towards a database system that'd be more scalable so we were able to add unit tests back because we had effectively abstracted away our persistence layer now um we created a little DSL that allowed
00:16:30
Dynamic per user filtering of events that was super useful and greatly improved our developer throughput the database we switched to in case someone's at wondering is Apache Cassandra which is a big super right
00:16:42
heavy system that's great at real-time analytics and we were able to replace most of Raven's functionality uh there were a couple things we couldn't really do that we had to defer for a basically another project in a future day but this
00:16:55
helped us quite a bit and allowed us to get past these scaling challenges and to scale horizontally going forward so what did we do differently here well basically we decided that you know what we're not going to
00:17:08
repeat the same mistake of putting all our eggs in one basket with one database solution when we eventually would reach a scale we were doing about 100 million events today from millions of concurrent users in the span of like a three hour
00:17:21
window in the U.S so thankfully Cassandra was able to scale with that but we were hedging our bets that you know what if if Cassandra started having problems and we started reaching the 100 million billion you
00:17:33
know sort of area we wanted to have the ability to move to something else if we needed to that became a valuable option for us given the types of scaling issues we were dealing with on top of that we really wanted to have unit testing and
00:17:46
kind of keep our business logic around Computing Deltas completely isolated from the database the real issue here is that software developers in terms of what the modern consensus is around software design is
00:17:58
you ain't going to need it build for your requirements today anything I've worked with regard to Future requirements is tomorrow use problem right future you they can figure it out well that is a really catastrophically
00:18:12
stupid decision in a lot of cases and the reason why is that yagni comes with an expiration date you are it really should be you are eventually going to need it probably good example if I'm launching an
00:18:24
e-commerce startup and I go ahead and marry all of my business processes and designs to let's say stripe for doing my credit card processing and subscriptions that's probably fine for the first several years of my startups business
00:18:37
right but if at some point in the future I want to expand into India or other countries where stripe isn't the best processor or there's a local processor that works better or if I want to try to switch to a different processor to
00:18:50
reduce my charge fees so I can actually get more profit per transaction if I'm totally married to stripe and everything in my API and in my data system is all stripe Centric that is going to be an enormously expensive project down the
00:19:03
road whereas if I design my payment processor system to be inherently plugable from day one that becomes a much more achievable project in the future adding a new payment provider to service customers and this geography or
00:19:15
at or replacing stripe for new customers in the future to lower my credit card fees and that's a requirement you can easily anticipate if you're going into that business that you know what there will be some point when the business asks us to support a new credit card
00:19:28
processor that's inevitable in any sort of you know e-commerce customer facing business we need to build in the option to do that when we're first getting started not engage in a giant you know
00:19:41
emergency fire drill sort of uh hair on fire exercise like we did at marked up so let's talk about high optionality programming what are some techniques that will generally speaking preserve
00:19:54
optionality a lot better than things like crud for instance crud does not really preserve optionality so what are some techniques that will help us do that well this isn't a exhaustive list these
00:20:06
are the big ones that in my personal experience have helped me a lot and any project where I did not start with this I eventually ended up there through an expensive refactoring exercise that I'm trying to help you avoid
00:20:19
so event driven programming is probably the dominant one followed by event sourcing cqrs actors naturally and extend only design let's talk about event driven programming first and how that preserves
00:20:33
optionality well if I have let's say a traditional RPC service let's say an HTTP API or you know uh maybe some most types of grpc
00:20:44
services I basically send a request and I get a response back from the same server I was talking to and you know that's one of the Mill web application development with an event driven system
00:20:56
I have a lot more inherent flexibility in the communication models between my client and my server than I do with an RPC system with our PC you're essentially limited to request response that's the communication pattern that
00:21:08
you get maybe you could also do one way on there you could do fire and forget messaging by just returning a HTTP response before you do any processing but it's all in essence request response with an event driven system you have a
00:21:21
lot more different possible messaging patterns out there than merely request response for example down here this is what a publish And subscribe messaging pattern might look like you have a client sends one message to the server
00:21:33
and it can get a infinite stream of responses as things happen in real time later and the technology you can use to implement this is pretty diverse you could use in-service bus you could use awka.net you could use Orleans or you
00:21:47
could use something like grpc to do that if you wanted to so I have a lot more choices around how things work with an inherently event driven system gives me more possible patterns and tools to
00:21:59
leverage than a simple RPC system does now the idea behind events let me pause here for a second yeah there we go
00:22:12
the idea behind events there we go is that in essence um they have additional properties Beyond let's say just the simple payload so you do have the payload that's your datum but you also usually have a reply
00:22:25
to address now that can be explicitly exposed like it would be in a octod.net system but it might also be implicit via something like rapidmq where you might have a reply to channel on there and you don't necessarily know who's listening
00:22:37
to it um that's sort of the the first bit messaging and event driven systems are almost always asynchronous so when you go ahead and pass an event somewhere you might be given a task you can a weight
00:22:49
on uh if you want to do requests uh request response style over a message driven system but you don't have to most often these systems tend to be pretty fire and forget I write a message in a rabbitmq I get an acknowledgment that
00:23:03
the message is queued and then I don't worry about who's processing it I move on and go back to the rest of my work that gives us a lot of flexibility around how processing is done and most importantly
00:23:15
with messaging is that the messages are always stored and serialized so there's actually a real object you can point to that correlates to this request an HTTP request is a transient in-flight object
00:23:27
that has a real-time limit associated with it that's your timeout value right well with messaging you're a little bit more flexible you have this serialized message body and it can be processed immediately if you have let's say a bunch of competing consumers all trying
00:23:40
to drain a queue or it could be processed over and over again in the future if you have a tool like Apache Kafka where you have arbitrary clients and arbitrary groups that read partitions over and over again
00:23:52
it's a lot more flexible than what an RPC system can do on top of that message oriented systems have the ability to change the order and which requests are processed in a you
00:24:04
know HTTP or RPC system everything's got to be processed live right now you don't necessarily have the ability to reorder uh how those HTTP requests are processed and the message driven system this is really trivial the recipient of the
00:24:18
message can say okay before I process this message I need to wait for this one to arrive because that has the data I need in order to process this request and this is what we would call a deferral in a message processing system in a tool like akka.net you would use
00:24:32
Behavior switching and stashing to do that and then the most powerful property of an event driven system is that events can be forwarded delegated or even broadcast to multiple parties you can't do that with a request or the function
00:24:44
call so the fact that you have this artifact this message that has possibly unique ID it can be serialized into a byte array and can be shared across multiple parties is an inherently more
00:24:56
flexible programming model than what a purely procedural sort of application is going to look like and so in terms of the interaction patterns for a message driven system
00:25:08
these are going to be inherently more diverse than what you get with RPC I can have a broadcast interaction where one party publishes its message to many receivers this opens the door for all sorts of interesting communication
00:25:20
patterns in a you know distributed system I can do a proxy pattern this is basically how you delegate work inside an event driven system where I can hand off the responsibility for processing a message from one party to another if I
00:25:33
want to and then we also have relationships like publish And subscribe where I can kind of invert control and the client basically receives notifications from the server when State changes rather than the client having to
00:25:44
pull the server for those changes you know repeatedly down the road so the first sort of tool we're going to use for kind of limiting our technical debt accumulation there we go
00:25:55
yeah one-way messaging the first tool the first kind of stop is using an event driven architecture generally speaking event driven architectures scale really well with domain complexity and they buy
00:26:08
you a lot more freedom in the future if you need to change how processing is done a procedural system is going to be much harder to adjust because it's not inherently flexible in the same way that this is the other reason why we want to
00:26:21
look at event driven architectures is it lends itself really well to the second third and fourth patterns which is event sourcing why does event sourcing help us mitigate technical debt well this is how they Ock
00:26:34
it up resistance works for instance where we go ahead and we process messages in the order in which we receive them initially for a single actor and a single actor represents like one business entity Insider system so if
00:26:47
I'm keeping track of let's say session state for a user and I want to see what sorts of things that user might be looking at on our e-commerce site so I can try to personalize a recommendation for them in real time I might be
00:26:59
receiving a stream of Click events here inside the actor for that user and I have an in-memory representation of that user's click stream right well every single time we go and
00:27:11
communicate with that actor that actor has an in-memory copy of that state but that state is also being event sourced one click stream event at a time to whatever our database is and our database could be SQL could be Azure
00:27:22
table storage it's not really relevant it all kind of looks like a key value store inside this system now how does this go and preserve optionality for US versus let's say just writing rows or modifying object
00:27:35
modifying a document in mongodb or inserting a row into SQL well Yep this is how we go and replay it the reason why is that event sourcing
00:27:47
inherently lends itself to providing a complete history of how something changed it gives us the ability to see that the user did this did that did that and that and that leads to their current state whatever that is right now that
00:27:59
state could be a recommendation for what sorts of products we should show that user that state could be the account balance for a bank account that state could be the current state of a device operating a process line control system
00:28:12
in a factory doesn't really necessarily matter it works for any domain but the state can always be rebuilt by replaying previous events this means
00:28:25
that your current state the current application state of your objects is something that you can reprogram on the fly without having to touch your data if you want to if you want to make an update to the code that reconstitutes
00:28:37
your state it's really easy to do that because it is effectively separate from the data the data is all those past events they were immutable they are at rest inside your database somewhere and you can't change what they mean right
00:28:49
but you can go ahead and change how you basically constitute those events in order to represent the state of your object so a good example if we change the types of let's say um
00:29:02
types of bank accounts that we support in our banking system and I want to have the ability to show pending transactions and pending account balances as separately from the posted balance I could reasonably do that by adding a new
00:29:14
event type that represents pending operations and replaying all my old types and I'll have in one section the current balance and then the pending balance and that doesn't require me to do a dangerous data migration or
00:29:27
anything else at all because all those old events that represent the user's account history are still there inside the database so the way the state is built can be changed without changing the events themselves that is a very powerful
00:29:39
option another powerful option again I'll use a financial example is that historical events can be replayed and reused in new forms aside from what your application
00:29:51
does you can use them for things like simulations or predictions or you can even use them for validating a future version of the application for instance one of the ways they do testing of
00:30:03
really complex pieces of software like massively multiplayer online RPGs is by replaying a saved game over the new client the reason why is it's not possible to go ahead and write unit tests for every possible interaction
00:30:16
that can happen between players so you want to go and take all the different let's say events that occurred over the course of a game session and replay them through the new client and look for unanticipated changes regressions that
00:30:28
might occur there really easy to do that with an Event Source system we use this for regression testing future versions of marked up actually we use it for doing a combination of load testing and also making sure that our analog system
00:30:41
work correctly these same events with the new code should still produce the same total values as the old one lastly another option that event sourcing gives us is the ability to safely introduce new
00:30:55
event types without modifying existing data the immutability of existing customer and business data is actually a really important selling point from the riskiness of a system how many of you
00:31:07
have had a automated database migration go wrong in a production system before let's all be honest it can happen and that's basically as a result of the fact that you're changing an object in a way
00:31:20
that is not intrinsically safe is what's going on there you're doing some sort of destructive action against your schema migration potentially we're going to address how to manage that issue in the final section which is extend only design we'll get to that in a moment but
00:31:34
event sourcing naturally lends itself to that type of extension as well the other sort of thing that's inherently useful about event sourcing is that typically it relies on really simple key value store architectures this means that you can actually use
00:31:47
event sourcing with pretty much any database out there even something really simple like Azure table storage will work fine so you're not basically using um super let's say bespoke database features that don't translate very well
00:32:00
to another database in the event that you needed to migrate in the future and honestly something that's pretty simple and robust like postgres will probably scale just fine for a really large Event Source system because again we're using
00:32:12
really simple constructs there's no left outer joins against synthetic tables in here right now the next pattern that we're going to get to and these all kind of uh compile on top of each other is cqrs command and
00:32:25
query responsibility segregation the idea behind this if you're not familiar with cqrs is basically to separate your read and write models from each other uh one mistake that a lot of developers often make is having their
00:32:38
read and write models be the exact same thing and the reason why that doesn't work is the way there's impedance mismatch but the other potential issue with it is is that certain databases are faster at performing reads and they are performing
00:32:51
rights and if your system becomes increasingly write heavy trying to have reads and rights use the same model is going to create a lot of inherent friction and tension inside the system so the idea behind this is essentially
00:33:02
your right models should be optimized for rights your read models should be optimized for reads now you have a lot of flexibility on how you produce a read model from your rights that's something you can do inside your application it's
00:33:15
something you could do with a database feature like like a view if you wanted to a materialized view but the basic gist behind this is that you should optimize your models separately so in the case of let's say our Event
00:33:27
Source system it's optimized for super fast super simple rights that can be done at high rates of speed even in a relational database like SQL Server which is traditionally a little bit slower handling rights than something
00:33:39
like mongodb or redis perhaps but with our read models we can go ahead and build something that's a lot richer and a lot closer to the type of requirements your business users might actually want so if you want to have a really nice reporting system or you want
00:33:53
to go ahead and use you know SQL Server analysis services to produce a nice data Cube your read models are what really handle that and they are kept separate from your right models and usually you have a processor that will go ahead and
00:34:05
either materialize the the read View at the time the shortly after the ride occurs or you might use a database feature to go and do that potentially the uh the real big benefit from an optionality point of view here is that
00:34:18
you can always change your read models independently from the data that's at rest you can always essentially rerun your projection process and recreate those on the Fly there's synthetic data in other words right the thing we're
00:34:31
trying to avoid doing with event sourcing and cqrs is taking the valuable business data we've already recorded and touching it in a potentially destructive way that's what we're trying to avoid from a business risk perspective
00:34:43
gqrs helps us do that by making sure that all of our read models can essentially be reproduced on the fly when we need to because all the data that was written is still there and it's not inherently modified as part of our
00:34:56
projection process here the next option that we get out of this is that the performance characteristics for read and write models can be tuned separately if you need to we might need to have a super efficient right model
00:35:09
for being able to handle lots of you know let's say millions of operations per second potentially but our read model needs to be optimized for maybe being able to take a fairly large amount of data and compress it into a
00:35:22
really small HTTP response size or a really small analytic Delta we serve up over signalr something like that you have the ability to kind of performance tune each of these individually which is really useful on top of that
00:35:34
not just tuning the performance but also tuning how human friendly they are your right model should probably be machine friendly if you care about performance if you care about not getting fired your read model should be human friendly
00:35:46
that makes sense all right and then yeah on top of that you can actually potentially use separate databases for reads and rights if you want to a good example of that we had a customer that does uh really super
00:35:59
detailed financial reporting for like government compliance and they use basically an Event Source system on top of postgres for doing all of their inbound inbound rights and we used
00:36:11
basically kind of a don't hate me for this we use an Entity framework sort of schema for actually going and producing the real reports Auditors would use inside their system and that was done all on top of postgres as well but I could very easily have
00:36:24
done that on SQL Server if I needed to there would have been very little cost to doing that the last thing I go ahead and mention as far as the event as the sort of event driven part is actors actors are Dynamic
00:36:38
and they give you the ability to kind of partition how you process streams of events that are inside your system so for instance I can go ahead and have one actor per business entity that's being updated in real time I can have
00:36:50
stateless actors that perform tasks like sending transactional emails or writing to the database or calling a web API they're inherently flexible pieces of code that are designed to be run in
00:37:03
parallel with lots of other instances of themselves in order to achieve maximum throughput and CPU utilization well what makes actors useful from an optionality standpoint is their dynamism it's the fact that we can basically
00:37:17
change where work is happening and how work is done on the fly as we're receiving events in real time this is what a simple akadona actor looks like for instance we have a little base type this receive actor and then we
00:37:30
have some state in this case my state is just our logging system handle and then we have the different types of message handlers that we're processing and these messages are can be sent in memory or they can be sent over the network akka.net doesn't care and that's kind of
00:37:42
invisible to you as the end user and then you have some C sharp code or F sharp code optionally in here for being able to do processing and in this case I'm scheduling a delayed reply back to the sender here if I just called
00:37:55
sender.tel I'll be replying to them back in real time one of the things that this actor can do I'll skip this part one of the things the actors can inherently do is change their behavior
00:38:07
at runtime as they're processing a message I can say instead of processing we'll go back here instead of processing this ping message using this function changed using this function instead until we get this
00:38:20
critical event that we're waiting for so imagine building something like a state machine for some part of your business if you're doing something like transactional processing for an e-commerce system you might say okay the first thing we need to do is submit the
00:38:33
payment information to our payment Gateway and see if that result comes back successfully if that result doesn't come back successfully we have a whole error flow we need to go through where we let the user know why their card was declined we probably have to send them
00:38:45
an email we probably still have to preserve everything in their cart or maybe we might schedule a quick retry to see if it goes through a second time right whatever the case may be and if the credit card transaction goes through
00:38:57
successfully then we have a fulfillment process the actor is going to start running through where we might talk to our fulfillment server and make sure that there's a entry for getting this product loaded into our shipping partner and getting that out the door and actors
00:39:10
can basically switch Behavior really quickly with a very minimal a code in order to handle these types of cases this is a lot simpler because it's all self-contained inside one object that
00:39:21
owns this unit of work that it is trying to coordinate that across a whole bunch of different microservices or a whole bunch of different procedural classes that are all spread out so the ability for us to kind of dynamically shift How We Do processing
00:39:35
in real time is very powerful exists the ability to compress what could be an enormous amount of business logic into a relatively small amount of code so actors and optionality
00:39:47
well one of the things they also make possible is the ability to query your live application State at runtime if you want to if I want to basically know what is the total amount of orders that are being processed right now that is very
00:40:00
trivial to implement with actors doing that with database driven development would probably require you to maintain a whole separate set of calls to redis where you need to keep update or increment a counter decrement it then
00:40:13
you have to go ahead and basically add an error handling and retrying if that query doesn't go through whereas with actors you can just run a quick um there's actually a number that gets exposed in aqua.net you can just pull to
00:40:25
see how many actors are alive right now it also makes stateful server-side applications viable one of my biggest complaints about database driven development is it's inherently stateless all of it which is fine for probably I'd
00:40:38
say the vast majority of applications but the most critical ones in your business will often need to have some measure of state in order to do things like keep request processing times low if you want to build an application that
00:40:51
can build let's say something like a real-time Banner ad exchange or real-time chat system or real-time you know let's say Fleet Management System you need State inside your application
00:41:02
to make that achievable actors are a Pitch Perfect way of doing that inside your system whether you use akka.net Orleans doesn't really matter it's just the general Paradigm is really useful at giving you that set of tools to do it
00:41:17
the other thing is that like we talked about event processing can become Dynamic rather than having a static set of functions that are inherently stateless handling our business logic and our work we have entities that are responsible for recovering their own
00:41:29
State making decisions about what to do with events in real time based on what their state is and they have the ability to dynamically do things like reroute a message somewhere else stash it and process it later once we get a critical
00:41:41
event that arrives or broadcast it to multiple parties over our Network all these different event driven paradigms you discussed earlier are all very inexpensive to implement with actors and don't require very much infrastructure either
00:41:54
um the other thing is that actors have the ability to basically be distributed over a network with very with essentially no code changes um actors tend to be location transparent which means that if an actor
00:42:06
moves from one process onto another as a result of let's say the other process being shut down that's not going to have a tremendous amount of impact on your code that's a routine thing actors can handle it's just like you know basically
00:42:20
if you were to rebalance a Kafka partition or add a new web server under your load balancer it's more or less the same sort of automated process for doing that the last subject I'll touch on for being
00:42:32
able to preserve optionality is what we call extend only design now I have a full blog post that goes into a lot more detail on how to do this on my personal website and I'll talk about that at the end but if you're doing database driven
00:42:45
development this is the one pattern you can Implement today that will help you a lot in terms of preserving optionality in your system this does not require you to do an event driven architecture this does not require you to use actors you
00:42:57
can do this with SQL server today extendedly design is a methodology for making sure that there are no incompatible changes ever made to your SQL schema at any point in the present
00:43:09
moving forward it's basically a way of preserving backwards and forwards compatibility the idea is that your schema your wire format so if you're doing serialization that's what we're referring to there and
00:43:21
apis are frozen for updates or deletes if you want to make a change to your HTTP endpoint you're going to have to either introduce a new method where you're going to need to introduce a new version where that has a separate URI
00:43:33
prefix than what you had before which is basically how people version public web apis typically so no destructive changes are allowed you're not allowed to change how something worked you're not allowed to rename or repurpose something you're
00:43:47
not allowed to delete stuff anything that is being used by live stakeholders or being used by live clients stays Frozen as is new things can always be added this is the extension part you can always add
00:44:00
new stuff that wasn't being used before you can add a new HTTP endpoint you can add a new message type to your event Source system you can add a new table you can add a new column to an existing table you just can't go back and change the
00:44:11
past you can basically change something that's currently being used or currently has data in it that you know will be used so old schema will be gradually made obsolete as the software updates if we
00:44:24
have some old SQL schema that we want to get rid of we can't delete it but we can gradually stop using it and then maybe we could go back and delete it if we wanted to but that process takes a little while you have to kind of age out of your system
00:44:36
so why is extend only design useful it eliminates an entire risk category for updating your software in the future and an entire area where technical debt gets created The Accidental destruction of
00:44:49
business value and on top of that the unknown unknowns of gee what happens if I go through and change the schema on this table how many different calls are there to that table that I can't trace
00:45:00
inside all the various applications that talk to it that's something that can be to a degree unknowable inside your application therefore it's risky to make those destructive changes with extend only design you avoid that entire
00:45:14
problem because you're not fundamentally changing the stuff that's already in use you're just adding new things that updated clients and updated consumers will use down the road so a good example of like how we manage
00:45:26
versioning and like occodontist internal message formats we use Google protobuf for all of our internal message types and protobuf lends it really well to this type of extend only design so for instance I might have a little
00:45:39
protobuf message that looks like this where okay I've got these five properties here and I want to add a sixth property for figuring out if this user made this type of stock trading operation and ask I want to see this
00:45:51
person did this ask using margin meaning they borrowed money from us in order to buy that stock uh okay did this person make this actually not to buy the stock to sell it in this case so they might be shorting or something
00:46:03
so this is a new field this new field can be added to this protobuf message and that protobuf message can be recompiled into C sharp without breaking wire compatibility if I go and I'm running this node in the cluster using
00:46:16
this new version of the message all the older nodes that don't have that definition for that field will see some unrecognized property and just ignore it now that's not great from a data loss standpoint but it's a lot better than
00:46:28
the alternative which might be bricking the rest of the entire cluster as soon as that first node joints right so this ability to extend all this extend only design ability gives us both forward and backwards compatibility
00:46:40
in the backwards Direction if my new client gets an old message from one of the old node types it can substitute a default value for that new optional property okay if I'm getting a
00:46:52
trade order from an old client that doesn't support margin guess what that trade can't be done with margin therefore we're going to say that property is false Insider application our serializer will go ahead and just use a safe default value there if you're
00:47:05
building something like a extend only designed with SQL schema you might have a default value you specify for for basically pre-existing for a rows that didn't have that new column you're adding so it might be null might be a
00:47:18
good example or maybe if you're using an integer the value is 0 or negative one whatever kind of makes sense for your use case but the idea behind this is that by using extend only design we preserve our old schema and we don't
00:47:31
have to account for all the different parts of our application that might be talking to it we can go ahead and add the new functionality we need without destroying the old functionality that other clients might use and because the
00:47:43
new client knows how to substitute a safe default value for areas where that new data may not be available old clients and new clients can continue to interact with each other safely over a longer period of time
00:47:56
on top of that this also means you can actually update your database schema independently from your application I can roll out my schema update well in advance of the application that uses it so I don't have to have an Entity
00:48:09
framework migration running live in my CI CD pipeline our dbas can stage it execute it see it roll out and then the application can get deployed you know that day or the following day if you want to it effectively decouples those
00:48:21
two activities together and lowers the risk of a deployment failing or a customer and business data being destroyed uh yeah on top of that extend only design is a
00:48:33
great way to guarantee zero downtime deployments um I imagine that a lot of you have the ability to take your systems offline and have downtime when you do a really big deployment but if you work in Industries like software as
00:48:45
a service or maybe doing things like you know manufacturing you want to try to avoid downtime to the extent that it's possible because that represents a business outage and lost revenue and potentially mad customers extendedly
00:48:58
designed is an absolute must-have if that's important to you being able to essentially eliminate downtime in your deployments so extend only down extend only design will help you tremendously
00:49:10
now what do these patterns all have in common these high optionality patterns what's the essence of this programming methodology well immutability is probably the foremost concern here which is that one state is written somewhere
00:49:23
it can't its meaning can't be changed and it can't be destroyed unless you're being really intentional about it no accidental side effects on data is what we're trying to avoid here so immutability kind of sits at the
00:49:34
Forefront of all these patterns it's all about trying to conserve the basically the old datum in perpetuity for future use we're basically assuming data storage is cheap and honestly compared to software development time it really
00:49:47
is so we're going more than happy to go ahead and trade a larger SQL Server instance in exchange for our developers not having to spend hundreds of man years you know rewriting a piece of
00:49:59
production code dynamism is another thing that we're trying to preserve here uh we want to go ahead and dynamically route process and react to State changes in real time systems that are more static are
00:50:10
inherently less flexible and require a lot more effort on the part of the developer to update systems that are inherently Dynamic from the get-go like event driven architectures or actors are going to be easier to do on kind of an
00:50:22
ad hoc basis over time and lastly we kind of separated our concerns to some degree each of these patterns kind of addresses different facets of software actors are all about how we process the system events are all
00:50:34
about how we organize interactions between domains event sourcing is how we write cqrs is how we read Etc these are all different sort of facets of our of our application programming models but
00:50:46
when we put them all together we end up with a system that's going to make be easier to change down the road and easier to evolve there is cost to doing this for instance extend only design requires a lot more planning and
00:50:58
enforcement from a CI CD perspective than YOLO crud or whatever people do by default so there is a cost to doing this and that's the premium when it comes to options but the value is you get that
00:51:11
flexibility to evolve your system naturally in the future in a way that's going to be much less expensive and more importantly much less risky than what you might be doing today so just to recap
00:51:23
technical debts the destruction of options that's really what it is when technical debt gets created you're basically destroying a viable future option as a result of making a choice that is basically not flexible it's the
00:51:34
idea behind it um High optionality architectures yeah I just mentioned this they tend to cost a bit more to develop up front that is absolutely true that is the trade-off basically that you're basically spending more time and money initially in your
00:51:48
design but they'll pay for themselves very quickly if your business evolves and then on top of that you're really high optionality architectures are things you should do if you anticipate change being likely in your business
00:52:00
over a long enough time Horizon change is inevitable that will happen but there are you know cases where the application you're working on is probably pretty stable and the likelihood of it changing significantly is low in those cases you
00:52:13
should feel free to use whatever you think is going to be the most expedient to getting the job done but most really critical business pieces of business software more than likely they're going to change and if you want to enable your
00:52:25
business to be agile and to be able to react quickly to those changes and you as a software developer if you want to be happy and not bitching about Legacy code all the time high optionality architecture is a really good investment
00:52:36
and I would start by learning the event driven part of it first is probably where I'd begin or if you want to get started with something right away think about freezing your schema and applying extend only design to it that's something you can do without
00:52:48
re-architecting your software so let's start with that I think about changing how your cic D processes and your deployment systems might look if you did that so that's it for my talk today if you go to petabridge.com you can see my
00:53:01
original uh articles I wrote about high optionality architecture and then my handle is Aaron on the web I tweeted out some links to all of my more detailed articles on things like extend only design you can go ahead and find that on
00:53:14
there as well so thank you very much for your time and I'll be happy to take some questions so you show hands yes absolutely so we use like it up persistent in my in our system we use Aqueduct resistance
00:53:43
query to do projections where essentially I have actors that tail the events that are being persisted into a materialized view and how granular that view might be can really really kind of depends on the domain I'm working in
00:53:56
um I might do a per entity let's say rock it up persistence query that spins up tails the events live as they come in writes it all out into let's say a document or a set of SQL rows and then
00:54:08
if I make a really significant change the way our view models work I might go and introduce a totally new set of projection actors to do that any other questions yes yes
00:54:31
so the question just to repeat it for everyone here is how do you sell a upfront higher cost design methodology like optionality to startups the companies the fewest resources and the greatest likelihood of change
00:54:44
um so I have made two really crucial mistakes with products I've owned where marked up was one of them and I've got a second one that we're currently still struggling with actually where we basically did a minimum viable product
00:54:57
so get something to Market quickly that meets all the basic requirements of your customers and we were a little bit too minimal and not enough viable basically it was the kind of the issue there the argument you should make for high
00:55:08
optionality architecture is that not doing this is betting against your success that's the the line you got to say there look if you think you're going to be successful build it like you mean it if you don't think you're going to be successful why are you in this business
00:55:22
you know go do web you know cryptocurrency or AI or something right next question pivot oh yeah yeah basically that's another good argument is that high optionality increases the likelihood of
00:55:40
successful pivot you know when you do make that transition from um you know nfts to AI tools you know that'll be really a really good selling point for that
00:55:52
um I hate that I keep knocking the startup industry those poor guys are having a rough time right now any other questions yes so do we um okay that's a good question so extend
00:56:13
only design is what you're referring to right so does extend only design introduce technical debt and my answer to that question is no it doesn't what it does leave behind is a lot of cellular waste some to some extent some
00:56:25
old tables and old code that may not be used anymore and once you're certain that that's not being used actively anywhere in your application you can safely get rid of that stuff it's just that the bet you're making is you know what in a large enough company with a
00:56:38
large enough application with a lot of different services using it the moment I deploy a new piece of schema I am not 100 certain what all the interactions of that table or that API or that piece of
00:56:51
data look like so I'm going to make the bet that there are some systems out there still using it the old way therefore I want to go ahead and take eliminate the risk that that's going to cause a you know category five shitstorm
00:57:03
inside our Ops Department that day right so the extended only design is basically a way of lowering the risk on a per deployment basis um the one thing that extendedly design will do is it'll force you to basically
00:57:16
try to enforce some rules in your CI CD pipeline to eliminate really destructive actions so I'll give you an example of how we do that in the occodontic project we use a tool called a verify which is basically a way of doing snapshot
00:57:29
testing and we do an entire printout of what our public API looks like we render it as a giant string basically and that gets written to a text file that gets checked into Source control verify will let us know if someone went through and
00:57:43
let's say added a new argument to a Constructor that is actually technically a breaking change right even if they made that argument optional and that's not binary compatible and that means that every plugin built on older versions of akka.net will break until it
00:57:56
upgrades which is something we explicitly do not allow in our versioning system except under very special circumstances well that allows us so we basically have a habit as software developers are going and checking like ah nope can't do that
00:58:09
not allowed to make that change unless you do it in a way that's safe we have instructions written down on how to do that you would need to check in a snapshot of something like your SQL schema for instance to go and make sure okay this person did
00:58:21
not drop the orders table by accident or this person did not do whatever tools like an automatic Entity framework code first migration should seem terrifying to you because you have no idea if you
00:58:33
don't emit the output what it's really doing under the covers until it's starting to happen right so that's sort of something you're going to have to incorporate into your build system a little bit uh ditto with things like managing your wire format if you're
00:58:46
using a tool like akka.net or Kafka where you're doing a serialization of message types that's another thing you'll want to check into Source control and make sure there's a step where someone has to review that before it gets merged in uh any other questions
00:59:00
yes go ahead yes so what do I mean the question was when I talk about partitioning of events using actors or really you could do it with any sort of Q anything like even a Kafka client can be partitioned uh what
00:59:17
does that mean partitioning means if you have a giant fire hose of events rather than having a single class basically responsible for processing all of them you have the ability to basically divide that giant let's say fire hose of events
00:59:30
into smaller streams that are organized by maybe the entity type or maybe by the entity ID itself so let's say if I have a thousand users on my website I might partition The Click stream for all
00:59:42
thousand users into a thousand little streams one for each user anaka.net can route those messages to the single entity actor that owns that individual user that's really what we mean by partitioning is sort of breaking up the
00:59:55
big stream into like manageable Parts essentially any other questions well hey thanks for the great questions and thank you everyone for attending I really appreciate it uh I'll be I'll be around I've got another talk on Friday
01:00:10
morning on.net systems programming if you're interested but otherwise thank you very much for your time I really appreciate it [Applause]
End of transcript