Written by Murtaza Laghari
Devops Sales Development
Chaos Engineering. Applying Chaos Engineering on infrastructure, application, and network layers can be risky, but technical experts highly recommend it. Their reasoning – to learn the faults and weaknesses of the systems by injecting deliberately controlled experiments to prevent them from occurring in real-life situations. We at Royal Cyber have covered the basic principles and benefits of Chaos Engineering that help to build a more robust system.
As cloud-based systems become increasingly complex, Chaos Engineering is growing to be an essential part of software testing to help uncover surprises, fix problems, and infuse reliability in every feature. Engineering teams are also debating whether one should build their own Chaos Engineering tools or use one of the existing tools available. This article highlights some of the most popular open-source and commercial Chaos Engineering tools that are currently available. The main objective is to look at each tool's features, platform and system support, and extensibility in greater detail.
Chaos Monkey was one of the first open-source tool of the Simian Army launched by Netflix used for fault injection. It was released in 2012 as an open-source tool and created using Golang. The function of Chaos Monkey is simple – to kill virtual machine instances, which is possible only through Spinnaker. Chaos Monkey specializes in killing instances and nodes. Killing a random Virtual Machine is an important aspect as it can provide unexpected reactions – the main goal of Chaos Engineering.
The process of using Chaos Monkey includes registering an EC2(Amazon Virtual Machine Instance) which requires an instanceID, MachineIP, name, and other details. Once the virtual machine is added you can complement it with services such as MongoDB, RabbitMQ, etc. Multiple virtual machine instances with different type of services can be added to create a network of instances that allow to terminate a virtual machine thereby conducting Chaos Engineering.
Released in 2019, Chaos Blade was also created using the same Golang language. Alibaba created Chaos Blade, and it can run on Docker, Kubernetes, or Cloud platforms. The primary function of this tool is to provide attacks that include packet loss, process killing, and many other network specific attack functions.
Chaos Blade can be used through the Command Line Program. The advantage of using Chaos Blade is its ability to attack C++ applications, Java applications, and even perform attacks on cloud environments. The commands include:
Released in 2020, Chaos Mesh is the newest of the Chaos Engineering tools. Launched by PingCAP, it is written in the Golang language. This tool can run on Kubernetes, and its primary function supports 17 types of attacks that include network attacks, bandwidth attacks, and system time manipulation.
Chaos Mesh is a tool for Kubernetes. A new experiment can be created through the Chaos Mesh dashboard. These experiments range from killing pods, network attacks, system I/O injection, and latency. All the experiments are written in a YAML file where the parameters must be specified, after which Chaos Mesh is deployed. This tool also includes dashboards that help for easy viewing of the analytics and reports.
Litmus is one of the most famous tools of Chaos Engineering. Created by Maya and released in 2018, Litmus is written in TypeScript. It is one of the main tools that provides a list of controlled experiments. Litmus uses Kubernetes as its platform.
Litmus can be used by installing an Administrative mode with YAML and downloading a Chaos experiment. Chaos engines can be created and applied to conduct litmus experiments for the applications. Once the Chaos engine is applied, Chaos is created thereby delivering results in the form of metrics for further analysis. The Chaos experiments can also be configured using the Litmus Portal – a web-based user interface. This helps to make the planning and implementation of Litmus as a Chaos Engineering tool effortless.
A unique aspect of Toxiproxy is that it can run on any platform. Created in 2016 by Shopify, it is written in Golang and is the most used tool in networking attacks due to its capabilities in Resiliency Testing. Toxiproxy helps to inject failure into production traffic and explore the faults that are disguised in the network.
Toxiproxy can be used for mainly three attacks: Injecting Latency, Blackholing Data or Rejecting Connections. A proxy server is required to use ToxiProxy which works between the application and database layers of the network. This proxy server contains information like routes between applications, and conducts experiments called “toxics”, which is used to create Chaos experiments.
Created in 2017 by leading tech companies such as Google, IBM, and Lyft, Istio is created and written in Golang. Istio's primary function is to support fault injection.
Istio is considered a Kubernetes Service Mesh. It is mainly used as a service mesh to secure the workload through a mutual Transport Security Layer where one service is communicating with the another. It is also used to configure how services within the cluster are connected. Istio helps to observe how an application is doing in its entirety as it injects proxies next to containers, and the proxies run in the same Kubernetes pod.
Chaos Toolkit is one of the leading tools used for Chaos Engineering with a focus on Infrastructure. Created in 2018 by ChaosIQ, it is written in Python and can run on Docker, Kubernetes, or Cloud Platforms. The advantage of the Chaos Toolkit is that it helps to define controlled experiments.
Chaos Toolkit runs on the Command-Line. Experiments are created in JSON files and then run through the Chaos Toolkit. The JSON files operate in three parts:
Experiments can be conducted as an input to the Chaos Toolkit, after which the tools begin to create Chaos and provide the results. This tool has different drivers that can be used to connect to AWS, Kubernetes, etc.
While Gremlin is not a tool, it is more of a platform used to run Chaos Engineering practices Gremlin helps to inject attacks into hosts and containers to see how the system will react. These attacks can range from DNS attacks to traffic changes. Gremlin can also be termed as a service for Chaos Engineering. We at Royal Cyber use Gremlin as a Chaos Engineering platform to perform attacks.
While the goal of any Chaos Engineering tool is to help the system achieve greater reliability. The question remains as to which of these benefits to achieve that goal faster and more efficiently. However, this depends on the engineering team and the time dedicated to testing and evaluating each tool. We've put together a comparison that shows how each of these tools stacks up against each other:
While operations burden the building and maintaining of a system – downtime is inevitable and expensive. While more than just one Chaos Engineering tool may be required to adopt and execute reliability, these tools will also help to buy time and availability to start making the system more dependable and robust.
We at Royal Cyber have been in the industry since 2002 and provide superior solutions that create exceptional business value, guarantee success, and transform businesses with adaptable solutions that satisfy today’s needs and unlock tomorrow’s opportunities. We will conduct free demos and a complimentary assessment to evaluate which tools fit your organization best. So, what are you waiting for?