STAREAST 2022 Concurrent Session : Embracing Collaborative Chaos: Running Chaos Days on Large Platforms


Thursday, April 28, 2022 - 1:30pm to 2:30pm

Embracing Collaborative Chaos: Running Chaos Days on Large Platforms


Chaos Engineering reduces the impact of component failure. Chaos Days (aka Game Days) are one practice within this field, whereby controlled failures are used to learn and improve system and team response. We will describe how to run a Chaos Day on a large microservices platform, using our experience of doing this across 60 teams, 1000 microservices. The session will explore why you’d run a Chaos Day, and how to know when you and your platform are ready to do so. We’ll share our learnings of the actual mechanics of running one: how do you plan, execute and retrospect a Chaos Day. We’ll also share what’s not worked so well, and areas we’d like to focus on in the future. When real (unplanned) failures occur, they provide excellent opportunities to learn and improve a system’s resilience. We’ll explore how to make the most of these events by running effective postmortems, and how Chaos Days can further refine your postmortem approach. The session will conclude by discussing how Chaos Engineering could be applied to attendees own context, through presenting various possible starting points.

Equal Experts

An engineer at heart, Lyndsay Prewer has over twenty years experience of helping individuals, teams and organisations improve their software delivery. For the past 7 years I've been a consultant with Equal Experts, leading a range of engineering teams in the delivery of large-scale IT services. Problem domains have ranged from public sector organisations (UK’s tax and passport offices) to global, online retailers. I love helping bridge the gaps between user and business needs, engineering disciplines and service operation.