Operational Resilience Regulations Demand a New Approach to Testing

March 21st, 2025 by · Leave a Comment

This Industry Viewpoint was authored by Anil Kollipara, vice president of Product Management, Spirent Communications

Two decades ago, “operational resilience” in telecom was mostly about keeping the lights on—keeping voice and data services available in the event of outages or equipment failures. Today, as we conduct more of our personal and professional lives online, the stakes are much higher. Resilience now means ensuring that people can continue transacting, trading, and accessing the critical digital services that keep the global economy running. Delivering on that commitment, however, has never been more challenging.

Modern networks are increasingly virtualized and software-driven, with more vendors, more software upgrades, and many more moving parts across an increasingly complex supply chain. With so many interdependent components, it’s never been harder to predict the impact of a network change or security incident. And as we continue to see, a single flawed update can trigger widespread outages, impacting millions of users and causing billions in financial losses.

Now, regulators worldwide are taking action. In the European Union, for example, the Digital Operational Resilience Act (DORA) requires organizations responsible for important digital infrastructure to conduct ongoing resilience testing—including proactively testing cybersecurity defenses—with major penalties for failing to comply. DORA’s primary focus is financial industries, like banking and insurance. But as the backbone of connectivity for almost every business, service providers fall under these mandates too.

New automated testing methodologies can help operators stay ahead of evolving regulatory requirements, improve service quality, and gain other important benefits.  But first, telecom must move beyond yesterday’s manual approaches and build testing that’s as agile and efficient as modern networks themselves.

Navigating Complexity

For organizations seeking to address operational resilience regulations, the first change is philosophical. Telecom operators (and every other business that provides important digital infrastructure) must recognize that they are now accountable for any failure that affects their users, regardless of how it occurred. Security breaches have been treated this way for years: vulnerabilities might originate with any vendor component in the stack, but it’s the party closest to end-users that’s responsible for keeping their environment secure. Regulations like DORA extend this model to operational resilience too.

The second step: building the capacity for continuous, automated testing. Operators need to go beyond basic functional testing and proactively, continually verify the resilience of their infrastructure and cyber defenses. But this is easier said than done. Too many operators still rely on largely manual testing approaches designed for yesterday’s vertically integrated, appliance-based networks. Current network environments, however, are radically different, characterized by:

  • Continuous change: Network functions and security solutions now include a broad mix of physical, virtualized, and cloud-native components, often from different vendors, each with its own update cycles and dependencies. Each new release or security patch represents a network change—and another opportunity for customer-impacting problems.
  • Third-party software and APIs: Modern network environments incorporate diverse interdependent services and software components from multiple vendors, making it more likely that a problem in one layer can cascade across the stack. The CrowdStrike outage, for example, started with a flawed update to CrowdStrike security software but ended up breaking the Windows OS on 8.5 million devices.
  • Growing attack surface: As networks evolve to incorporate more software components from more sources, all in a near-constant state of flux, the risk of new security gaps and vulnerabilities grows.

The only answer is to adopt a “zero-trust” approach to operational resilience. Just as zero-trust security demands that operators “never trust, always verify” that an environment is secure, zero-trust resilience applies this principle to network changes and cyber defenses. Operators can’t trust that any update is safe. They must thoroughly validate every change in the network, and continually verify that security mechanisms are working as expected, as a basic business requirement.

Reimagining Resilience Testing

If operators want to meet mounting calls for stronger security and resilience, network testing must evolve to become more pervasive and automated. Testing should be:

  • Comprehensive: Testing should address the full network and lifecycle, including third-party software components, software updates, and the cybersecurity attack surface. Operators should be able to execute millions, even tens of millions of test cases as needed to achieve full coverage.
  • Automated: Instead of human beings deciding when and what to test, testing should be fully integrated with DevOps software processes and tools, so that it happens automatically whenever anything changes. 
  • Realistic: It’s not enough to perform just basic functional testing or test only under lab conditions. Operators should use synthetic traffic and emulation tools to test resilience under “rainy day” scenarios too, such peak traffic loads, cyberattacks, and failure conditions.
  • End to end: Too often, current testing approaches are siloed, with one team responsible for security testing for example, while another performs activation testing, and still another troubleshoots the live network. To protect end-users and the business, testing should span all functional areas, so operators can capture the full user experience.

The following figure illustrates what a comprehensive, automated approach to continuous testing can look like.

This model combines four basic components:

  • Abstraction Layer for Infrastructure Access, so that testing tools can dynamically access all servers, routers, switches, and security appliances
  • Library of Operational Resilience Test Methodologies that can apply millions of resiliency and security test cases, including emulating peak loads and rainy-day conditions
  • Lab Automation, where lab equipment itself becomes a shared virtualized resource that any team can access, and that can spin up and fully configure testbed topologies for any test case, in minutes
  • Test Automation, so that testbeds can execute millions of tests across diverse resilience and security categories and automatically notify the appropriate team of any issues discovered

Together, these capabilities can help operators comply with stringent operational resilience mandates, without sacrificing speed or innovation. But implementing a modern, automated testing framework can drive many benefits beyond just compliance. By automating labs and testing, operators can make better, wider use of expensive testing tools, improving capital efficiency alongside resiliency. They can improve quality, fixing most problems before they impact customers and quickly isolating root causes of issues in the network. Most important, when operators tell regulators and customers that they’re committed to nonstop availability, they can continually verify that they’re delivering it. 

If you haven't already, please take our Reader Survey! Just 3 questions to help us better understand who is reading Telecom Ramblings so we can serve you better!

Categories: SDN · Software

Discuss this Post


Leave a Comment

You may Log In to post a comment, or fill in the form to post anonymously.





  • Ramblings’ Jobs

    Post a Job - Just $99/30days
  • Event Calendar