I feel like I had misaligned expectations of this book, but I'm not sure if it's because I didn't know enough about the to**spoiler alert** Review
I feel like I had misaligned expectations of this book, but I'm not sure if it's because I didn't know enough about the topic to understand what to expect or I just went to the theater to watch a thriller but got a romantic comedy instead, where the title and movie trailer guided me to make the wrong assumptions. In any case, this review might feel harsher than it actually is since there were parts I did actually enjoyed. Following are the bad parts, for the ones I found good are in the notes section.
Too much detail on high level concepts like port and connectors types between components.
Some diagrams feel too technical and unnecessarily detailed, like several relationship types between components, i.e. if two components are connected, I'm not sure it makes sense in many occasions to highlight the fact that the connection is sync or async, directly to the local file system or to a remote location, client or server port; by having a different dotted line to represent each one.
Most of the book feels somewhat academic and not matching the type of language you'd typically encounter in meeting rooms at work.
Very brief focus on the code model, not meaning I would've liked to see actual code, but, for instance, more concrete examples of how to apply the different strategies of dominant decomposition (folder structure). Same goes for the architectural styles.
The same concepts are repeated over and over again on most chapters from slightly different perspectives: quality attributes, module viewtype, runtime viewtype, components, ports, connectors. It feels like every chapter has a piece of the puzzle of each topic, rather than being covered in full in a continuous segment. Which innevitably leads to repetition.
Notes
Legend, vignette levels:
- 1st * 2nd ^ 3rd
2)
- Risk-driven: design as much architecture to mitigate the biggest risks. * If you need to process data in under 50ms, add that into the architecture. - Apart from the features you're working on, always know what risk failures are you solving. - There are problems to find and problems to prove. * "Is there a number that when squared equals 4?", is a problem to find since you can test a solution pretty easy. * "Is the sequence of all prime numbers infinite?", is a problem to prove. * Finding is easier than proving since you need to demonstrate something is true for all posible cases.
4)
- Design intent will always be lost between design and code. * We debated design decisions, discovered tradeoffs, imposed constraints; yet none of these is expressed in code. - Code makes a good job communicating the code model, poor job communicating the runtime and allocation models (where and how is it deployed). * Runtime model: what components and connectors exist at runtime. ^ How to represent it: Components and Responsibilities table, component assembly diagram, functionality scenario. - Functionality scenario includes: Name, Initial state, Participants (components), Steps. - Quality Attributes: describe here the tradeoffs, architecture drivers and design decisions. - When integrating third-party components, make a list of failure risks. * Make a diagram and write a functionality scenario of the risks mitigation strategies. - After laying out a domain model, test it with a concrete example to detect potential problems. * i.e. if you write a relationships model for Artist -> Song -> Album, the generic model might not highlight the potential shortcoming of handling bands or ex-band members going solo or artists changing names. - Make rational architecture choices: when 2 ideas seem as good, figure out which quality attribute they help achieve, e.g. usability and testability, then pick the one you value higher.
7)
- Models help you solve problems and mitigate risks, but some problems are better solved directly, not through models. - Every project should document its architecture: false. * You should plan before going on a road trip, but you don't plan your commute every morning. - "Your point of view is worth 80 IQ points", Alan Kay. - Domain model: expresses enduring truths about the world relevant to the system, if something is 'just true', belongs in the domain. - Design model: the system to be built makes appearance here, set of boundaries and commitments.
8)
- Domain models express details of the domain not related to the system's implementation. - Stop domain modelling when it provides less value than other activity, such as prototyping. - The most valuable part of this model is a list of types and definitions. - Invariants: things in the model that must always be true. - Functionality scenarios here include actors (real people or real world objects) not computer records or hardware as in the design model versions.
9)
- Module viewtype: things that exist only at compile time. * Modules, layers, dependencies, db schemas, interfaces, classes, component types. * Source code itself (code model) is this viewtype. - Runtime viewtype: object and component instances, functionality scenarios, component assemblies. * Hard to envision by reading the source code since you have to mentally animate the runtime instances.
11)
- Create levels of abstraction by hierarchically nesting elements, limit the number at any level, maintain encapsulation by not revealing unnecessary detail. * Create a story at many levels. - Dominant decomposition: the problem is that a single organisation system must be chosen. * Like books in a library organised by size, helps finding the largest books, not so much finding by author. * Decomposing the system into modules and components imposes an organisation on it. - Encapsulation: rather than just grouping together related code, hide details likely to change inside the module. * If you're unsure about design alternatives A and B, have the module's public interface support both, in case you need to swap.
14)
- Pipe-and-filter style: continually and incrementally processing data, (e.g. Node). - Batch-sequential: complete all processing before moving to the next state. - Client-server: is asymmetric, only clients can request the server to do work. - Peer-to-peer: any node can be client or server, no hierarchy....more
Even if the book describes some OO concepts, in order to get the most out of it I suggest reading a couple of OO related books beforehand and get a goEven if the book describes some OO concepts, in order to get the most out of it I suggest reading a couple of OO related books beforehand and get a good grasp of things like encapsulation, polymorphism and specially objects collaboration. As the authors state it: one of their main focus when constructing OO software is to think about how objects will send messages to each other rather than their roles, in other words, drive the design from how the moving pieces would collaborate with each other.
It's also recommended to read a few of the books from "The Addison-Wesley Signature Series". Apart from having a similar structure, it feels like each one builds on top of the others which makes them easier to follow along after you've gone through a few of them.
From the book: we spend much more time reading code than writing, so we should optimise for that, writing tests it's a great way of having a coherent up-to-date documentation of the system.
Unit testing doesn't impact much of the external quality of the code (from the user's perspective), it impacts the internal quality (from the developer's perspective), which in terms determines how easy it is for the code to be extended and bugs to be fixed.
Use the builder pattern to not burry tests intentions in setup calls and improve readability.
Don't strive for testing individual functions but rather your expected behaviour, start each feature by writing an acceptance test, then write unit tests while you incrementally implement the feature, usually the acceptance test will be the last to pass, this can be an end-to-end test if possible but it can also be less than that, any type of test that you think satisfies the definition of done....more
Considering this book was written before many of the most the popular OO languages like Java, C#, Ruby, Python; even existed, it feels pretty contempoConsidering this book was written before many of the most the popular OO languages like Java, C#, Ruby, Python; even existed, it feels pretty contemporary. It talks heavily about arguably suboptimal concepts like multiple inheritance and getting the upfront design right but I think we can take most of the ideas and apply them to Agile environments, that is, we still do planning on Agile even though incremental, iterative development is at the heart of the process. We don't necessarily have to write a full blown upfront design, but we can use the elaborated techniques described in this book to come up with good, supple and flexible feature/system designs easy to refactor down the road.
Most of these concepts have stood the test of time.
You're presented with notions like who controls what, spreading system intelligence as evenly as possible rather than encapsulating most behaviour in a single class, wrap classes inside subsystems (modules) to reduce coupling and increase cohesion.
The authors thoroughly break down the building blocks of OOP in a way you wish you had learned before you've ever saw an OO language:
- Inheritance is just about factoring out common functionality into a single place. - Objects communicate with each other by sending messages. - Messages have a name and may include arguments. - Objects handle messages by executing methods. - The set of messages an object can respond to is known as behaviour. - When two or more kinds of objects can respond to the same message by executing different methods which have the same message name, you get polymorphism.
It was great to read a book from the time when a lot of the stuff we now take for granted had to be factored into the design like: classes for handling Arrays, graphic operations, fixed/floating point numbers, etc. Things we today get from programming frameworks with a rich library of built-in functionality, it gives you an idea of where we're coming from being a refreshing reminder of what things the underlying platform is doing for you....more
High level overview of how to design highly scalable systems, each component is described in detail and broken down into pote**spoiler alert** Review:
High level overview of how to design highly scalable systems, each component is described in detail and broken down into potential use cases so it's easy to understand how it fits in the architecture: benefits, caveats, challenges, tradeoffs. Feels like a great combination of breadth and depth that also includes a vast reference list of books, white papers, talks and links to expand on each topic.
Notes:
#1: Core Concepts - Avoid full re-writes, they always cost and take more than expected and you end up with similar issues. - Single-server configuration means having everything running off one machine, good option for small applications and can be scaled vertically. - Vertical scaling gets more expensive in time and has a fixed ceiling. - Isolation of services means running different components off different servers like DNS, cache, data storage, static assets, etc., which can then be vertically scaled independently. - CDNs bring things like geoDNS for your static assets so requests will be routed to the geographically closest servers. - SOA is an architecture centred around loosely coupled and autonomous services that solve specific business needs. - Layered architectures divide functionality into layers where the lower layers provide an API for and don't know about the upper ones. - Hexagonal architecture asumes the business logic is in the center and all interactions with other components, like data stores or users, are equal, everything outside the business logic requires a strict contract. - Event-driven architecture is a different way of thinking about actions, is about reacting to events that already happened while traditional architectures are about responding to requests.
#2: Principles of Good Software Design - Simplicity. - Hide complexity behind abstractions. - Avoid overengineering. - Try TDD, use SOLID. - Promote loose coupling, drawing diagrams helps spot coupling. - DRY and don't waste time by following inefficient processes or reinventing the wheel. - Functional Partitioning: dividing the system based on functionality where each subsystem has everything it needs to operate like data store, cache, message queue, queue workers; eg.: profile web service, scheduling web service. - Design for self healing.
#3: Building the FE Layer - You can have your frontend be a traditional multipage web application, SPA, or a hybrid. - Web servers can be stateless (handle requests from every user indistinctly) or stateful (handle every request of some users). - Prefer stateless servers where session is not kept within the servers but pushed to another layer like a shared session storage which makes for better horizontal scalability since you can add/remove clones without affecting users bound sessions. - HTTP is stateless but cookies make it stateful. - On the first HTTP request the server can send a Set-Cookie:SID=XYZ... response header, after that, every consecutive request needs to contain the cookie Cookie:SID=XYZ... - Session data can be stored in cookies: which means every request (css, images, etc) contains the full body of session data, when state changes, the server needs to re-send a Set-Cookie:SDATA=ABCDE... which increases request payload so only use if session state is small. As an advantage you don't have to store any session data in the server. - Keep session state in shared data store: like Memcached, Redis, Cassandra; then you only append the session id to every request while the full session data is kept in the shared store which can be distributed and partitioned by session id. - Or use a load balancer that supports sticky sessions: this is not flexible as it makes the load balancer know about every user, scaling is hard since you can't restart or decommission web servers without breaking users sessions. - Components of scalable frontends: DNS, CDN, Load Balancer, FE web server cluster. - DNS is the first component hit by users, use a third-party hosted service in almost all cases. There are geoDNS and latency based DNS services like Amazon Rout 53 which can be better than the former cause they take into consideration realtime network congestion and outages in order to route traffic. - Load Balancers help with horizontal scaling since users don't hit web servers directly. * can provide SSL offloading (or SSL termination) where the LB encrypts and decrypts HTTPS connections and then your web servers talk HTTP among them internally. * there are providers like Amazon ELB, software-based LB like Nginx and HAProxy and hardware based that can be more expensive but more easily to vertically scale. * apart from an external LB routing traffic to FE web servers, you can have an internal LB to distribute from FE to web services instances and gain all the benefits of an LB internally. * some LB route the TCP connections themselves allowing the use of other protocols apart from HTTP.
#4: Web Services - The monolithic approach: where you can have a single mvc web app containing all the web application controllers, mobile services controllers, third-party integration service controllers and shared business logic. * Here every time you need to integrate a new parter, you'd need to build that into the same mvc app as a set of controllers and views since there's not concept of separate web services layer. - The API-first approach: all clients talk to the web application using the same API which is a set of shared web services containing all the business logic. - Function-centric web services: is the concept of calling functions or objects in remote machines without the need to know how they're implemented like SOAP, CORBA, XML-RPC, DCOM. * SOAP dominates this space, it uses WSDL files for describing methods and endpoints available and provide service discovery and an XSD files for describing data structure. * Your client code doesn't need to know that is calling a remote web service, it only needs to know the objects generated on web services contract (WSDL + XSD files). * dozens of additional specifications were created for SOAP (referred to as ws-* like ws-security) for features like transactions, multiphase commits, authentication, encryption, etc., but this caused reduced interoperability, it made it difficult to integrate between development stacks as different providers had different levels of support for the ws-* specifications, specially in dynamic languages like PHP, Ruby, Perl, Python; this led to the alternative and easier to implement JSON + REST. * You couldn't either cache HTTP calls on URL alone since it was the same every time due to request params and method names being part of the XML document itself. * Some of the ws-* specifications introduced state, preventing you from treating web service machines as stateless clones. * Having a strict contract and the ability to discover functions and data types, provides a lot of value. - Resource-centric web services treat every resource as a type of object, you can model them as you like but the operations you can perform on them are standard (POST, GET, PUT, DELETE), different from SOAP where you have arbitrary functions which take and produce arbitrary values. * REST is predictable, you know the operations will be always the same, while on SOAP each service had its conventions, standards and ws-* specifications. * Uses JSON rather than XML which is lighter and easier to read. * REST frameworks or containers is pretty much just a simple HTTP server that maps URLs to your code. * It doesn't provide discoverability and auto-generation of client code that SOAP has with WSDL and XSD but by being more flexible it allows for nonbreaking changes to be released in the server without the need to redeploy client code.
#5: Data Layer - Replication scales reads not writes. * On MySQL you can have a master and slave topology, there's a limit on the slave count but you can then have multilayer slaves to increase the limit. * All writes go to master while replicas are read-only. * Replication happens asynchronously. * Each slave has a binlog file with the list of statements from master to be executed and a relay log with the already executed ones. * Promoting a slave to master is a manual process. * You can have a master-master topology for a faster/easier failover process but still not automatic in MySQL. In this topology, you write to either one and the other replicates from its binlog. * The more masters you have, the longer the replication lag, hence worst write times. - Active Data Set: amount of data accessed within a time window (an hour, day, week). Its the live data that's more frequently read/written. * A large active data set size is a common scalability issue. * Replication can help increasing concurrent reads but not if you want to increase the active data size since an entire copy of the dataset needs to be kept in each slave. - When sharding, you want to keep together sets of data that will be accessed together and spread the load evenly among servers. - You can apply sharding to object caches, message queues, file systems, etc. * Cross-shard queries are a big challenge and should prob be avoided. * Cross-shard transactions are not ACID. - Try to avoid distributed transactions. - NoSQL databases make compromises in order to support their priority features. - The famous "pick two" phrase related to the CAP theorem is not entirely true. * If you're system is still operational after a number of failures, you still have partition tolerance. * Quorum consistency provides consistency while still having availability at the price of latency. - Quorum consistency means the majority of the nodes agree on the result (reads and/or writes), you can implement this for some of your queries which means you're trading latency for consistency on eventually consistent systems, meaning you can still enforce read-after-write semantics. - Eventual Consistency: property of a system where nodes may have different versions of the data until it eventually propagates. * These systems favor availability, they give no guarantees that the data you're getting is the freshest. * Amazon Dynamo (designed to support the checkout process) works this way, it saves all conflicts and sends them to the client where the reconciliation happens which results on both shopping cart versions being merged. They prefer showing previously removed item in a shopping cart than loosing data. - Cassandra topology: all nodes are the same, they all accept reads and writes, when designing your cluster you decide how many nodes you want the data to be replicated to which happens automatically, all nodes know which has what data. * It's a mix of Google's BigTable and Amazon's Dynamo, it has huge tables that most of the times are not related, rows can be partitioned by the row key among nodes and you can add columns on the fly, not all rows need to have all the columns. * When a client connects to a node, this one acts as the session coordinator, it finds which nodes have the requested data and delegates to them. * It implements a self-healing technique where 10% of all transactions trigger an async background check against all replicas and sends updates to the ones with stale data.
#6: Caching - Cache hit ratio is the most important metric of caching which is affected by 3 factors. * data set size: the most unique your cache keys, the less chance of reusing them. If you wanted to cache based on user's IP address, you have up to 4 billion keys, different from caching based on country. * space: how big are the data objects. * TTL (time to live). - HTTP caches are read-through caches. * For these you can use request/response headers or HTML metatags, try to avoid the latter to prevent confusion. * Favor "Expires: A-DATE" over "Cache-Control: max-age=TTL", the latter is inconsistent and less backwards compatible. * Browser cache: exists in most browsers. * Caching Proxies: usually a local server in your office or ISP to reduce internet traffic, less used nowadays cause of cheaper internet prices. * HTTPS prevents caching on intermediary proxies. * Reverse Proxy: reduces the load on your web servers and can be installed in the same machine as the load balancer like Nginx or Varnish. * CDNs can act as caches. * Most caching systems use Least Recently Used (LRU) algorithm to reclaim memory space. - Object Caches are cache-aside caches. * Client side like the web storage specification, up to 5 and 25MB. * Distributed Object Caches: ease the load on data stores, you app will check them before making a request to the db, you can decide if updating the cache on read or write. * Caching invalidation is hard: if you were caching each search result on an e-commerce site, then updated one product, there's no easy way of invalidating the cache without running each of the search queries and see if the product is included. * The best approach often is to set a short TTL.
#7: Asynchronous Processing - Messages are fire-and-forget requests. - Message queue: can be as simple as a database table. - Message Broker | Event Service Bus (ESB) | Message-Oriented Middleware (MOM): app that handles message queuing, routing and delivery, often optimised for concurrency and throughput. - Message producers are clients issuing requests (messages). - Message Consumers are your servers which process messages. * Can be cron-like (pull model): connects periodically to the queue to check its status, common among scripting languages without a persistent running application container: PHP, Ruby, Perl. * Daemon-like (push-model): consumers run in an infinite loop usually with a permanent connection to the broker, common with languages with persistent application containers: C#, Java, Node.js. - Messaging protocols. * AMQP is recommended, standardised (equally implemented among providers) well-defined contract for publishing, consuming and transferring messages, provides delivery guarantees, transactions, etc. * STOMP, simple and text based like HTTP, adds little overhead but advanced features require the use of extensions via custom headers which are non-standard like prefetch-count. * JMS is only for JVM languages. - Benefits of message queues: * Async processing. * Evening out Traffic Spikes: if you get more requests, the queue just gets longer and messages take longer to get picked up but consumers will still process the same amount of messages and eventually even out the queue. *Isolate failures, self-healing and decoupling. - Challenges. * No message ordering, group Ids can help, some providers will guarantee ordered delivery of messages of same group, watch out for consumer idle time. * Race conditions. * Message re-queuing: strive for idempotent consumers. * Don't couple producers with consumers. - Event sourcing: technique where every change to the application state is persisted in the form of an event, usually tracked in a log file, so at any point in time you can replay back the events and get to an specific state, MySQL replication with binary log files is an example.
#8: Searching for Data - High cardinality fields (many unique values) are good index candidates cause they narrow down searches. - Distribution factor also affects the index. - You can have compound indexes to combine a high cardinality field and low distributed one. - Inverted index: allows search of phrases or words in full-text-search. * These break down words into tokens and store next to them the document ids that contain them. * When making a search you get the posting lists of each word, merge them and then find the intersections.
#9: Other Dimensions of Scalability - Automate testing, builds, releases, monitoring. - Scale yourself * Overtime is not scaling * Recovering from burnout can take months - Project management levers: Scope, Time, Cost * When modifying one, the others should recalibrate. * Influencing scope can be the easiest way to balance workload. - "Without data you're just another person with an opinion". - Stay pragmatic using the 80/20 heuristic rule....more
Let's be honest, the book was published 15+ years ago, it's highly unlikely that each and every one of the 51 patterns described in it would be compleLet's be honest, the book was published 15+ years ago, it's highly unlikely that each and every one of the 51 patterns described in it would be completely appropriate for writing software today.
There's no silver bullet in software development (as Fred Brooks said it), so every design/architecture advice should be taken with a grain of salt.
In that regard, if you'd like to know how your everyday frameworks work underneath, understand how they wire everything together, or perhaps write a better one yourself, this is a book for you.
I started my dev journey as a .NET web developer and remember having heated conversations with other peers doing Java, I now realise how easier stuff would've been should I've known that .NET and Visual Studio made easier for you to pair Table Module(125) with Record Set(508) for the business layer so you'd do that more often than not, while on Java you many times used Domain Model(116) which is a more OO approach.
For the view, .NET web forms used Template View(350), and often Two Step View(365) through what was called master pages, the controllers were Page Controller(333), so you had a controller for each server page called the Code Behind file which was then referenced inside the markup page, the page itself was a subclass of that Code Behind so you could access your protected methods on the view.
While Java Server Pages (JSP) were a bit similar to .NET web forms, if you were using them through a Java framework like Struts (from Apache), you probably had a Front Controller(344) which handled all incoming requests for every page and forwarded them to the appropriate helper class depending on the application flow logic.
Being a JavaScript developer now, I see similar patterns applied in this ecosystem as well.
I really enjoyed the definition of MVC(330), the model is an object that represents some of your domain information, in terms of OO, an object holds knowledge and should own the operations on that knowledge, that is, it should know how to access and modify the data it contains, which is encapsulation. The view is then the visual representation of the model, a collection of UI widgets that render the data inside the model. The controller is the link between the two, it takes inputs from the view and forwards them to the right model to be handled. In that regard, the dependency graph should look something like: view knows about the model (and often times about the controller), the controller knows about the view and the model. One of the main benefits of MVC is separating the presentation logic (View) from the business logic (Model), which in terms is one of the fundamental traits of OO design....more
This book is presented as the connection between Design Patterns [GoF] and Refactoring [Fowler]. Overall content is good and thoroughly explained. I wThis book is presented as the connection between Design Patterns [GoF] and Refactoring [Fowler]. Overall content is good and thoroughly explained. I was expecting it to have a structure like: refactoring ABC from Fowler can be matched with patterns X, Y, Z from GoF. However, it feels more like a paraphrased version of Fowler's Refactoring book, which makes it feel repetitive.
And even if my expectations would've been met, I now realise you wouldn't need a book to describe this relation between patterns and refactoring, that is, I think it's easy to figure it out on your own after thoroughly grasping both concepts individually....more