This book introduces a novel approach to the design and operation of large ICT systems. It views the technical solutions and their stakeholders as complex adaptive systems and argues that traditional risk analyses cannot predict all future incidents with major impacts. To avoid unacceptable events, it is necessary to establish and operate anti-fragile ICT systems that limit the impact of all incidents, and which learn from small-impact incidents how to function increasingly well in changing environments. The book applies four design principles and one operational principle to achieve anti-fragility for different classes of incidents. It discusses how systems can achieve high availability, prevent malware epidemics, and detect anomalies. Analyses of Netflix’s media streaming solution, Norwegian telecom infrastructures, e-government platforms, and Numenta’s anomaly detection software show that cloud computing is essential to achieving anti-fragility for classes of events with negative impacts.
My rating should be interpreted more like: 4 stars for the (initial) chapters that
introduce the concepts of anti-fragility and resilience to complex software-intensive systems and their risk management
stress the differences between simple change and adaptability,
and link high-level discussion to concrete examples and methodologies from Netflix (chaos engineering, etc.).
The chapters on malware propagation weren't of much use to me: I understand the author's focus because of his particular research topics, but I don't think they add much to the general discussion. In addition, I found the focus on HTM (Hierarchical Temporal Memory) and Numenta a little unexpected. Again, I have doubts this particular focus will be of great use to practitioners whose daily job is to design and operate complex systems that should be anti-fragile, that is, systems that can get more and more resilient after being subjected to stressors, attacks, errors, etc. Anomaly detection will of course be one of the central challenges for such systems, it's just that I doubt Numenta and HTM will be the central aspects.
Overall, I found the perspective and framework provided by the author valuable for thinking about complex software systems, and I can recommend the first half of the book, and the general lessons regarding anomaly detection.