Glossary

This glossary provides definitions of key concepts used throughout the book.

Artificial Intelligence (AI)
The simulation of human intelligence processes by machines, especially computer systems, enabling them to perform tasks such as learning, reasoning, and problem-solving.
Artificial Intelligence for IT Operations (AIOps)
The application of artificial intelligence and machine learning to IT operations for automated insight generation, anomaly detection, and operational decision-making.
Address Resolution Protocol (ARP)
A protocol used to map IP addresses to physical MAC addresses on a local network.
Border Gateway Protocol (BGP)
The standardized exterior gateway protocol used to exchange routing information between autonomous systems on the Internet.
BGP Monitoring Protocol (BMP)
A protocol that provides real-time monitoring of BGP routing information by streaming BGP updates and events from routers to monitoring stations.
Bill of Materials (BOM)
A comprehensive list of components, parts, and materials required to build, deploy, or maintain a network system or device.
Brownfield
An existing environment with legacy systems, manual processes, and established infrastructure. Automation is introduced gradually into these systems.
CI/CD
Continuous Integration/Continuous Deployment. Automated processes that test, validate, and deploy code changes whenever they are committed to version control.
Circuit Breaker
A distributed systems pattern that stops requests to a failing service to prevent cascading failures. Similar to electrical circuit breakers, it 'trips' to protect the system.
Compensation Logic
Error handling mechanism that undoes or recovers from failed operations by applying compensating transactions or reverting to previous states.
Central Processing Unit (CPU)
The primary component of a computer that performs most of the processing, executing instructions from programs and managing system operations.
Declarative
An automation approach where the desired end state is defined, and the system determines the steps needed to achieve it. Contrast with imperative.
Dry Run
An execution mode that validates and previews what changes would be made without actually applying them.
Domain-Specific Language (DSL)
A specialized programming language designed for a specific application domain, such as querying or configuration management.
Extended Berkeley Packet Filter (eBPF)
A technology that allows custom programs to run in the Linux kernel for high-performance networking, observability, and security use cases.
End-to-End Test
A test that validates complete automation workflows in near-real scenarios using lab environments or virtualized network infrastructure.
End of Life (EOL)
The stage in a product's lifecycle when it is no longer supported or sold by the manufacturer, often requiring migration or replacement planning.
Extract, Transform, Load (ETL)
A data pipeline pattern that extracts data from sources, transforms it into a desired format, and loads it into a destination system.
gRPC Network Management Interface (gNMI)
A modern network management protocol based on gRPC that provides streaming telemetry and configuration capabilities using YANG data models.
Graphics Processing Unit (GPU)
A specialized processor designed to accelerate graphics rendering and parallel processing tasks, widely used in AI, ML, and high-performance computing.
Graceful Degradation
A system design principle where components continue operating with reduced functionality when failures occur, rather than failing completely. Enables resilience in distributed systems.
Greenfield
A new environment or project built from scratch with automation-first design principles from day one.
gRPC Remote Procedure Call (gRPC)
A modern, high-performance RPC framework that uses HTTP/2 for transport and Protocol Buffers as the interface definition language.
Hypertext Transfer Protocol (HTTP)
An application-layer protocol for transmitting hypermedia documents and serving as the foundation of data communication for the World Wide Web.
Infrastructure as Code (IaC)
A practice of managing and provisioning infrastructure through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. Enables version control, testing, and automation of infrastructure changes.
Idempotency
A property of automation where repeated operations yield the same result. Running the same operation multiple times produces the same final state with no unintended side effects.
Imperative
An automation approach where specific steps and commands are defined in exact order. Contrast with declarative.
Integration Test
A test that validates how multiple components work together in a simulated environment, introducing third-party systems and interdependencies.
Intent-Driven
An automation approach where engineers define the desired end state of the system, and the automation determines how to achieve it (declarative approach).
Interface Contracts
Formal definitions of how systems communicate, including schemas, validation rules, and backward-compatibility policies. Enable independent evolution of components and clear integration boundaries.
Internet of Things (IoT)
A network of physical devices embedded with sensors, software, and connectivity that enables them to collect and exchange data.
Internet Protocol (IP)
A set of rules governing the format of data sent over the Internet or other networks, enabling devices to communicate and route packets across interconnected networks.
IP Flow Information Export (IPFIX)
An IETF standard protocol for exporting IP flow information from routers and switches, used for network traffic analysis and monitoring.
Intermediate System to Intermediate System (IS-IS)
A link-state routing protocol used in large service provider networks to route IP and other network layer protocols.
Log Query Language (LogQL)
Grafana Loki's query language, inspired by PromQL, designed for querying and filtering log data.
Management Information Base (MIB)
A database used for managing entities in a network using SNMP, defining the structure and meaning of management data.
Machine Learning (ML)
A subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed, using algorithms and statistical models.
Mean Time To Resolution (MTTR)
The average time required to resolve an incident or problem, measured from the time the issue is detected until it is fully resolved.
NETCONF
Network Configuration Protocol. An IETF standard protocol that provides network device configuration and data retrieval capabilities with support for atomic operations and rollback.
Network Operations Center (NOC)
A centralized location where IT professionals monitor, manage, and maintain an organization's network infrastructure.
Network Operating System (NOS)
The software that runs on network devices, providing the functionality to configure, manage, and control network operations.
Observability
The ability to measure and understand a system's internal state through its external outputs. Includes logging, metrics, and tracing to detect failures and optimize automation behavior.
OpenConfig
A collaborative initiative developing vendor-neutral data models and programmatic interfaces for managing network devices using YANG.
OpenTelemetry
A vendor-neutral observability framework providing a unified set of APIs, libraries, and tools for collecting, processing, and exporting telemetry data (metrics, logs, and traces).
Open Shortest Path First (OSPF)
A link-state routing protocol used within an autonomous system to determine the best path for routing IP packets.
OpenTelemetry Protocol (OTLP)
The native protocol of OpenTelemetry for transmitting telemetry data (metrics, logs, traces) between applications and observability backends.
Packet Capture (PCAP)
A standard format for capturing network traffic data, commonly used by tools like tcpdump and Wireshark for packet analysis.
Predictable
A quality of trustworthy automation where operations produce consistent, deterministic outcomes that engineers can anticipate.
Prometheus Query Language (PromQL)
A functional query language for Prometheus that enables selection and aggregation of time series data in real time.
Packet Sampling (PSAMP)
An IETF standard framework for packet sampling and filtering for network monitoring and measurement.
Root Cause Analysis (RCA)
A systematic process for identifying the underlying causes of problems or incidents to prevent their recurrence.
Reliable
A quality where automation handles errors gracefully, recovers from failures, and completes operations safely even under unexpected conditions.
RESTCONF
A protocol that provides a programmatic interface for accessing YANG-defined data using HTTP-based RESTful APIs, designed for web-based network management.
RDMA over Converged Ethernet (RoCE)
A network protocol that enables Remote Direct Memory Access (RDMA) over Ethernet networks, providing high-throughput, low-latency networking for data centers and high-performance computing.
Software-Defined Networking (SDN)
A network architecture approach that enables the network to be intelligently and centrally controlled, or 'programmed,' using software applications. This helps operators manage the entire network consistently and holistically, regardless of the underlying network technology.
sFlow
A network monitoring technology that uses packet sampling to provide visibility into network traffic patterns and performance.
Service Level Agreement (SLA)
A commitment between a service provider and a client defining the expected level of service, including performance metrics and guarantees.
Service Level Indicator (SLI)
A quantitative measure of some aspect of the level of service being provided, such as response time, error rate, or availability.
Simple Network Management Protocol (SNMP)
An Internet Standard protocol for collecting and organizing information about managed devices on IP networks, widely used for network monitoring and management.
Source of Truth (SoT)
An authoritative, centralized system holding the intended state of the network, used as the single reference point for automation decisions.
Switched Port Analyzer (SPAN)
A network switch feature that mirrors traffic from one or more ports to a monitoring port for analysis and troubleshooting.
System Logging Protocol (Syslog)
A standard protocol for message logging that allows separation of the software generating messages from the system storing and analyzing them.
Test Access Point (TAP)
A hardware device that provides access to network traffic by creating a physical copy of data flowing through a network link.
Telegraf-Prometheus-Grafana (TPG)
A popular open-source observability stack combining Telegraf for data collection, Prometheus for storage, and Grafana for visualization.
Transactional
A property where multiple changes are grouped so they either complete fully or fail safely with no partial, inconsistent, or half-applied states.
Time Series Database (TSDB)
A database optimized for storing and querying time-stamped or time series data, commonly used in monitoring, IoT, and real-time analytics applications.
Understandable
A quality where automation systems expose intent, steps, results, and decisions transparently, building human confidence.
Unit Test
A test that validates a single component or function in isolation, typically using mocks or fake systems to focus on specific behavior.
Usable
A quality where automation provides interfaces that allow engineers to validate, reason about, and control behavior without excessive complexity.
Version Control System (VCS)
A tool that tracks and manages changes to code and data over time, enabling collaboration, rollback, and audit trails. Examples: Git, Mercurial.
Versioning
The practice of tracking and managing different versions of code, data, or configurations using version control systems. Enables rollback, audit trails, and collaboration.
Yet Another Next Generation (YANG)
A data modeling language used to model configuration and state data for network devices, providing a standardized way to describe network device capabilities.
Zero Touch Provisioning (ZTP)
An automated deployment process that allows network devices to be configured and brought online with minimal manual intervention, typically using pre-defined scripts or configuration files.

Powered by Buttondown.