Discover TestMetry

The world of IT is evolving every day, and it is essential to learn about the latest industry trends and be relevant in your workplace. I am here to share my experience and learnings around RPA, Test Automation, UX, and DevOps that I have gained in the last two decades by working in both the product development and the IT services world!


Updated: May 1

Organizations are looking at increasing revenue, business growth, and operational excellence by scaling their digital capabilities. In the present tech-savvy world, random glitches in systems have become harder to predict and nearly impossible to afford by companies. These random failures impact a company’s bottom line, making downtime a key performance indicator for engineers. These glitches can be a networking glitch in one of the data centers, a misconfigured server configuration, an unexpected failure of a node, or any other kind of failure that propagates across systems. These outages usually bring catastrophic results and impact an organization from a financial and reputation standpoint.

One single hour of an outage can cost millions of dollars to a company. As per Gartner, the average cost of IT downtime is $5,600 per minute. Since there is a difference in how each business operates, the cost of downtime can vary between $140,000 per hour to $540,000 per hour. As organizations cannot wait for an outage to happen, one should look at proactively identifying system weaknesses and applying chaos engineering practices to mitigate the risks.

Chaos Engineering studies how large scale systems respond to all the random events. It is a disciplined approach to identify failures before they become outages. By testing how a system responds under stress, engineers can quickly identify and fix faults. The frequency of releases to production has increased drastically, but it is important to maintain applications' reliability by adhering to SLA’s like application availability and customer satisfaction. The ultimate purpose behind chaos engineering is to limit the chaos behind outages caused by random events by carefully investigating ways to make a system more robust. Traditional reliability engineering practices of incident management, service recovery procedures, etc., may not deliver the required outcomes to minimize the impact of failures. While practicing chaos engineering, planned experiments are performed on the systems to check the system's response when such a situation occurs. According to Gartner, by the year 2024, more than 50% of large enterprises will utilize chaos engineering practice against their digital capabilities to approach 99.999% availability.

Originally, Chaos Engineering was Netflix’s rationale as they needed to be resilient against random host failures while migrating to AWS (Amazon Web Services). This resulted in the release of Chaos Monkey by Netflix in the year 2010. In the year 2011, the Simian Army added additional failure injections on top of Chaos Monkey that allowed testing of more states of failures and building resilience. Netflix also decided to introduce a new role called Chaos engineering in the year 2014. Then, Gremlin announced Failure Injection Testing (FIT) tool built on the Simian Army concepts to build resilience in the systems against random events. With many organizations moving into cloud and microservice architecture, the need for chaos engineering has increased in recent years. Many larger technology companies like Amazon, Netflix, LinkedIn, Facebook, Microsoft, Google, and a few others are happily practicing Chaos Engineering to improve their systems' reliability. Chaos Engineering works on the principle of running thoughtful experiments within the system, which brings out insights on how the system responds in case of failures. The chaos engineering processes are similar to how a flu vaccine works. A flu vaccine stimulates your body’s immune system to generate antibodies that will help to attack the flu virus. There are three steps involved: -

Step 1 To begin with the process, an application team consisting of architects, developers, testers, and support engineers to prioritize a few things. The first step is to identify a fault that can be injected and hypothesize on the expected outcome by mapping IT or Business metrics. One may need to look at finding answers to questions like “What could go wrong,” “What if X component fails,” “What will happen if my server runs out of memory” to arrive at possible scenarios. One will have to approach this with a bit of pessimism to improve the overall scenario coverage. One should create a hypothesis backlog that includes details on how the application will fail, the impact, measurement criteria, restoration procedures, etc. For creating the hypothesis backlog, techniques like brainstorming and analysis of incident logs can be adopted. The backlog items can be further prioritized based on the likelihood of occurrence and impact of failure as it might be practically impossible to invest time and budget to avoid all types of failures. Step 2 It involves the execution of an experiment to measure the parameters around the availability and resilience of a system like service level, mean time to repair, etc. The experiments are focused on creating a failure by increasing CPU Utilization or inducing a DNS outage. During the initial stages of chaos engineering implementation, the experiments are performed on a sandbox or in a pre-production environment. It is also important to restrict the blast radius to minimize the impact of an experiment on the application. As confidence improves, the blast radius can be improved, and one can move the experiments to a production environment. One may need to document the plan for each experiment that would include a) Steady State measurement b) Activities that you will perform to trigger a failure c) Activities that will take to monitor the application d) Measurements to analyze the impact of the failure e) Actions to roll back the system to a steady-state

Step 3 This is the last step and determines the success of the experiments. The experiments are halted if there is an impact on the metrics, and the failures are analyzed. The chaos experiment is considered successful only if a failure occurs. The changes required in the application, if any, are also added to the product backlog. The experiments are repeated by increasing the blast radius if the system is found to be resilient.

After completing the experiment, the insights obtained provide information on the system's real-world behavior during random failures. This helps engineering teams to fix issues or define roll back plans. Introducing Chaos Engineering in the organization brings in both business as well as technical benefits. For the business, Chaos Engineering helps prevent significant losses in overall revenue, improves the incident management response, and improves on-call training for engineer teams and the resiliency of the systems. From the technical point of view, data obtained from Chaos experiments results in increased understanding of system failure modes, improved system design, reduction in repeated incidents, and on-call burden.

Many tools are available in the market for letting companies practice Chaos Engineering. Chaos Monkey, Gremlin Inc., Simian Army, Jepsen, Spinnaker are a few famous tools to name, easily implemented in the organization. Using Jepson on a distributed system, you can easily identify chaos events like killing components, network issues, and generating random load. At the same time, Chaos Monkey will randomly terminate instances in production to improve the services' resilience implemented to instant failures. Similarly, other tools mentioned also have a particular way to experiment and improve the products' resilience. Depending on your requirements and budget, you can use any of them. Organizations can also build their own Chaos Engineering tools using code from open source tools. The process may be time-consuming and expensive but gives complete control over the tool, options to customize it, and more security.

One should not look at chaos engineering as a one-time activity performed on an application as applications are undergoing frequent changes to meet the demand from the business and end consumers. The possibility of vulnerability that was previously fixed to resurface is also high, and it is important to validate the application by implementing continuous chaos tests. The team can create a regression pack comprising of prioritized chaos experiments that can be used to validate the resiliency of the system. If fully automated can be integrated along with the DevOps pipeline, these experiments can be executed as part of the weekly build to identify failures early in the life cycle.

Predicting system failures have become difficult due to complex application architectures. As the cost of downtime is high, the organization should take a proactive approach to prevent crashes by applying chaos engineering practices. Organizations should implement chaos engineering as part of the DevOps, invest in chaos engineering tools, and improve competency to improve application reliability.

12 views0 comments

Updated: May 1

A quality engineering team has a crucial role in improving the time to market for services and products and ensuring customer delight. Architects have been developing new practices to automate the life cycle processes to reduce test cycle time and improve product quality. Scriptless test automation is one such practice that is an alternative to the traditional automation, which was programming language-dependent. The adoption rate of Scriptless frameworks has improved in the last couple of years, but there are still apprehensions around its capabilities.

Difference between the Working of Scriptless and Scripted Automation Tools

Scriptless automation tools use auto-generating test scripts instead of an automation engineer writing a test script manually utilizing a programming language. The language used in the tool might be proprietary or open, depending on the tool strategy of the vendor. The Scriptless technique is slightly different from the record and playback feature in the first generation automation tools like Silk Test, Rational Robot, or Winrunner. These tools' recording and playback feature worked as an accelerator that helped engineers enhance the recorded scripts by using programming languages like Test Script Language or SQA Basic. Some Scriptless tools do not reveal the source code of the automation scripts to the end-user, whereas some provide enhancement flexibility.

Challenges Faced while Using Scriptless Test Automation Tools

The current test automation coverage levels of most of the organizations are low and need improvement to enjoy all the benefits of Agile or DevOps. Organizations are now focusing on improving the regression automation coverage levels to implement continuous regression testing on their daily and weekly builds.Talent availability is a significant challenge faced by most organizations. Most of the engineers in today's market started their careers as manual testers, and many did not have a programming background. The developer community also does not prefer to shift their career towards automated testing as they feel that automated testing kills their creativity.

Way to Address the Challenges Faced by Companies While Using Scriptless Test Automation Tools

One way to address the shortfall is to upskill the manual testing engineers on programming and automation. Many organizations have succeeded in upskilling the mid and junior level engineers falling with a success rate of around 70%.

The highly experienced manual testing community comprising of mostly test managers is still not very open to learning a programming language or an automation tool as they are nearing the end of their IT Career. The manual testing engineers should understand that manual testers' existing testing team cannot be replaced with automation engineers as domain/application expertise is also a key ingredient in ensuring quality and trying to learn new skills.

Benefits of Using Scriptless Test Automation Tools

One of the significant benefits of Scriptless test automation is that it bridges the gap between technical knowledge and domain expertise by enabling business analysts, manual testers, or subject matter to participate in the testing automation activities. The technique helps quality engineering teams reduce the overall cost of automation and improve collaboration and automation coverage.

In scriptless test automation, the time required to design automation scripts is reduced by 40% compared to scripted test automation as there is no need to write scripts or programs. This helps the organizations which are running regression test automation initiatives to reach their coverage goals faster. It also allows teams engaged in-sprint test automation to improve the sprint automation coverage as the automation throughput increases due to reduced effort. We have seen that the Agile model of software development causes frequent changes to both the application under test and the regression test packs. Agile teams are often seen struggling to keep their automation regression test packs up to date due to the high effort involved in maintaining scripted automation packs. The scriptless test packs are relatively easy to maintain and cut down the overall maintenance efforts by half.

Limitations of Scriptless Test Automation Tools

There are also some limitations associated with Scriptless tools that influence the testing tool or automation approach's decision. The Scriptless tools have a complex underlying coding that must be maintained and updated regularly. This maintenance is quite expensive, and this is one of the reasons which make Scriptless automation tools a bit pricey. The open-source world also does not provide many choices when it comes to Scriptless automation tools.

Many Scriptless tools do not offer support to port automation assets to a format supported by another tool and create a vendor locking. It also does not provide much flexibility to testers to write custom functions required to automate complex scenarios. The alternate approach to getting additional capabilities implemented in the tool is to have the vendor implement the feature, often time-consuming and expensive.


The Scripted or Scriptless automation approaches have advantages and limitations, which might differ from organization to organization. The decision to adopt an automation approach should be made after considering the team's appetite for the upskilling, methodology used, availability of budget, and the complexity of applications selected for automation.

17 views0 comments

Updated: May 1

With the advancements in technology, we shifted from traditional computers to their digital version, which included zero’s and one’s and now, to the latest and fastest upgraded computers, i.e., Quantum Computers. Built on the pioneering ideas of physicists Richard Feynman and David Deutsch in the 1980s, Quantum Computers leverage the unique properties of matter at the nanoscale. Quantum computing uses quantum physics to solve the problems which today’s computers can never tackle.

There are two characteristics of a quantum computer that make them the computers of the future. First, quantum computing is built on qubits that can be overlays of zero and one, i.e., half part of a zero and a half part of a one at the same time. Second, qubits become entangled and exist in groups.

Because of its ability to solve problems, high speed, and accuracy, quantum computing has immediate pharmaceutical, cryptography, machine learning, and search applications. Realizing the potential, it is expected that in the coming 10-15 years, there will be a global investment of more than $50 billion in quantum computing, and companies will generate logical qubits for the same.

The practical implementation of quantum computing has already begun with tech giants like IBM, Microsoft, Alibaba, Honeywell, and a few more. IBM and Microsoft have and are developing quantum computing simulators, quantum computing communities, and tools which leverage quantum processing techniques to the developers. With the availability of program languages, quantum algorithms, and quantum-processors-as-a-cloud-services to the developers, it is expected that developers will incorporate them into software solutions. Quantum computing can also be applied to accelerate search and machine learning algorithms used with exponentially growing datasets.

This will help in unlocking the value of data for the companies. Not only this, because of the potential of tackling problems and providing rapid solutions, countries like the US and China have already invested heavily in quantum research for secure communication. China is also launching its first satellite implementing a quantum communication channel.

Quantum computing is moving quickly from research labs to the real-world, and shortly, it will be soon approaching supremacy in providing robust solutions in every field.

10 views0 comments

Get in Touch!

Thanks for your interest in TestMetry. For more information, feel free to get in touch and I will get back to you soon!

Thanks for submitting!

Image by Windows