Nov 10, 2025 · 1918 words · 10 min read

1. The Automation Imperative#

“To automate or not to automate — that is the question.”

Since the rise of Software-Defined Networking (SDN) and DevOps, engineers have debated whether network automation is necessary, a luxury, or an overcomplication. The answer depends on network scale and organizational needs. For hyperscalers, automation is essential for speed, reliability, and consistency (in fact, they began this journey in the early 2010s out of necessity). For small businesses, full automation may be unnecessary or even counterproductive. Most networks fall in between, with adoption shaped by culture, skills, tool maturity, and business priorities. Today, these factors are converging, making automation inevitable.

1.1. The Perfect Storm#

The beginning of this decade has confirmed the need for automation. For hyperscalers, explosive Artificial Intelligence (AI) growth pushes data center networks to their limits, interconnecting hundreds of thousands of Central Processing Unit (CPU)s and Graphics Processing Unit (GPU)s with high-speed, low-latency Ethernet fabrics (e.g., RDMA over Converged Ethernet (RoCE)). Additionally, enterprises and service/content providers must evolve legacy infrastructure, enable new services, and integrate cloud, on-prem, and edge domains under cost constraints but with higher operational requirements.

Other engineering domains have embraced API-first, self-service models and now expect the same from networking. Programmable infrastructure and automation-ready workflows are becoming the norm. Artificial Intelligence (AI)- and Machine Learning (ML)-driven operations require structured network data, while security and compliance demand automated, auditable responses.

Across all environments, the reasons to adopt network automation are more compelling than ever. The question is no longer “Should we automate?” but “Why haven’t we automated yet?

Despite clear benefits, several barriers have slowed adoption — many still persist:

  • Lack of intent-driven data models — Networks have traditionally been described through device configurations rather than a clear definition of how the network should behave. Without consistent and capable intent data, automation remains fragile and device-centric.
  • Inconsistent and unrepeatable designs — Automation thrives on predictability. Networks built with exceptions, ad-hoc architectures, or one-off implementations are inherently difficult to automate. The simpler and more standardized the design, the better.
  • Heterogeneous infrastructure — A mix of vendors, platforms, and service types complicates tooling and increases integration overhead.
  • Limited cross-domain expertise — Historically, few engineers possessed both networking and software engineering skills. This skills gap has hindered effective automation design and implementation.
  • Risk aversion and change management concerns — The critical nature of network infrastructure often leads to conservative approaches to change, making automation initiatives harder to justify or approve.
  • Lack of proper testing environments — Many organizations lack adequate lab environments that mirror production, making it difficult to test automation safely before deployment.

By 2025, most of these barriers have started to decrease, and more companies and vendors are starting the journey, as highlighted in the State of Network Automation Survey by Chris Grundemann (Network Automation Forum), which describes current adoption and trends. However, there is no single magic formula to approach all scenarios, so understanding the proper mindset is the first step in the right direction.

1.2. How to approach network automation#

This book presents the fundamental architecture concepts for successful network automation. It’s important not to associate automation with a single tool—no such thing exists yet. It is the combination of different dimensions that determines the success of a network automation project. We can group them into three pillars: People, Process, and Technology (in this exact order).

1.2.1. The three pillars of success#

Like Maslow’s pyramid (i.e., you need to secure a fundamental stage to grow above), understanding and solving each of the pillars supports a solid and successful network automation strategy.

flowchart LR
    A[People] --> B[Process]
    B --> C[Technology] 

    style A fill:#ffcccc
    style B fill:#ffe6cc
    style C fill:#ffffcc
  • People — Automation depends on those who design, build, and operate platforms. Understanding their needs and empowering teams through training and collaboration ensures sustainability.
  • Process — Organizational alignment is essential. Link automation outcomes to measurable value—cost reduction, faster delivery, or improved reliability.
  • Technology — Tools exist; the challenge is selecting and integrating them within a sound architecture.

Balancing these three dimensions transforms automation from a technical initiative into an organizational capability.

This book provides a foundation across:

Change is iterative—progress comes one step at a time. One recurring question to answer (with different responses at different times) is the classic ‘buy versus build’ dilemma, which we also tackle throughout the book.

1.3. What the reality looks like#

Every organization follows its own maturity path, often starting with small scripts and expanding to broader tasks like configuration management, compliance validation, or troubleshooting.

1.3.1. Understanding the automation spectrum#

Automation maturity progresses from manual operations to autonomous networks:

graph LR
    A[Manual Operations] --> B[Scripted Tasks]
    B --> C[Workflow Automation] 
    C --> D[Intent-Based Systems]
    D --> E[Autonomous Networks]
    
    style A fill:#ffcccc
    style B fill:#ffe6cc
    style C fill:#ffffcc
    style D fill:#ccffcc
    style E fill:#ccccff
  • Manual Operations — Traditional CLI-based configuration and troubleshooting
  • Scripted Tasks — Individual scripts that automate specific, repetitive tasks
  • Workflow Automation — Orchestrated sequences of operations with some decision logic
  • Intent-Based Systems — Declarative automation that translates high-level goals into network changes
  • Autonomous Networks — Self-healing systems that detect, diagnose, and remediate issues independently

Understanding where your organization currently stands and where you want to go helps set realistic expectations and plan appropriate investments.

These initiatives may evolve into closed-loop or self-healing frameworks. Achieving full automation is a long-term goal. Automation is not about replacing people, but amplifying expertise—allowing engineers to focus on design and problem-solving. Cost savings may follow, but the true benefits are consistency, reliability, and speed. Automation also enables capabilities impossible to achieve manually at scale, such as real-time network optimization or instant compliance validation.

A hidden benefit of network automation is that it motivates you to simplify your network architecture as much as possible to facilitate automation.

This book explores automation practices for large-scale and smaller networks. You’ll learn to recognize which ideas fit your context and how scale introduces unique challenges. To be humble, most of the ideas are not unique to networking. They come from lessons learned in software engineering over the years and are adapted to the challenges and requirements of network operations.

To give you a general overview of what types of solutions fit into this bucket, here are a few examples of automation in different environments:

Hyperscalers

  • Expansion from a design into all the necessary data to depict the network intent (e.g., racks, devices, cables, Internet Protocol (IP)s, overlay, networks) that powers the creation of the Bill of Materials (BOM), and renders the bootstrap configuration that is served via Zero Touch Provisioning (ZTP) as soon as the device is connected in the datacenter.
  • Correlation of observability data (e.g., metrics, logs, and flows) to provide real-time events that are enriched with contextual information and trigger workflows that mitigate end-user problems, draining connections while still keeping capacity under SLA limits.

Service Providers

  • Full-mesh testing of Internet links via different transit providers to ensure packet loss and latency remain within tolerance levels, and to detect issues that trigger remediation actions such as draining traffic from suspicious links that, once fixed, can be brought back into operation.
  • Constantly checking for circuit maintenance notifications from providers (via email or webhooks) that are converted into structured data, which can be used to mute alerts or proactively react to planned operations to mitigate their impact.

Enterprises

  • Offer a self-service portal to end-users to define security policies that are then converted into actual firewall rules, following an enforced security policy and enabling a rule lifecycle that cleans up unused rules.
  • Support device refresh and lifecycle, detecting when a device is End of Life (EOL), when the software has vulnerabilities and how to upgrade them, or facilitating configuration changes from one platform to another.

The key is identifying which processes in your environment are most time-consuming, error-prone, or critical. Understanding how these processes support your business is crucial to evolving them—transforming them into more efficient and effective ones powered by automation.

These automation solutions can be simple or complex, but all share similar patterns that this book will analyze and digest for you, ending with some sophisticated real-world use cases in Part 5 – Patterns and Use Cases.

However, even with the best intentions, things can go wrong. You must be aware of common pitfalls that will impact the success of the project.

1.3.2. Common pitfalls to avoid#

There are multiple things to take into account. In this book, you will discover many of them because, for better or worse, I have experienced them myself. These are just a few common ones to keep top of mind:

  • Trying to automate everything at once — Start small with high-impact, low-risk use cases to build confidence and expertise
  • Neglecting the human element — Technical solutions without proper change management and team buy-in (i.e., trust) often fail
  • Underestimating data quality requirements — Automation is only as good as the data it operates on; invest in data accuracy and consistency early in the process.
  • Building without testing — Implement robust testing and validation processes before deploying automation to production
  • Creating automation silos — Ensure different automation initiatives can work together rather than creating isolated, incompatible systems

Finally, before closing this chapter, let’s remember a universal truth: let your work speak for itself. How? By defining and implementing measurements to objectively explain the benefits of network automation solutions and how they positively impact the business.

1.3.3. Measuring automation success#

There are two groups of metrics to focus on: technical and business metrics. Both are important and relevant for leadership teams.

Technical Metrics:

  • Mean Time to Recovery (MTTR) — How quickly can you detect, diagnose, and resolve network issues?
  • Change Success Rate — What percentage of network changes are deployed without causing incidents?
  • Configuration Drift — How consistent are device configurations across the network?
  • Deployment Velocity — How quickly can you implement new services or configuration changes?

Business Metrics:

  • Service Availability — Are automation-managed services more reliable than manually managed ones?
  • Engineering Productivity — Are teams spending more time on strategic work versus operational tasks?
  • Compliance Posture — How quickly can you validate and remediate compliance violations?
  • Resource Utilization — Are you making better use of network capacity and performance?

Regularly tracking these metrics justifies continued investment and identifies areas for improvement. We will explore what and how to measure in Chapter 14 - Automation as a Product

1.4. Summary#

Network automation is now a necessity, driven by scale, complexity, and rising expectations. Every organization faces the challenge of doing more, faster, and with higher reliability. The path to automation is not universal; each organization matures at its own pace, from small scripts to large-scale, intent-driven, and self-healing systems. Success requires alignment across Process, People, and Technology.

Automation’s greatest value is consistency, reliability, and speed—not just cost savings. Building a sustainable practice demands investment, thoughtful design, and collaboration.

This chapter sets the foundation for understanding why automation matters, the challenges it addresses, and how organizations can evolve toward architecture-driven automation.

💬 Found something to improve? Send feedback for this chapter