Nautocon and AutoCon4 Session Notes (Unofficial)
These are my personal reflection notes from the AutoCon4 and Nautocon sessions I attended. I took notes during the sessions and expanded them the same day while everything was still fresh, trying to pull out the high-level concepts I wanted to remember.
I wrote most of this in fast shorthand during the talks, then turned those notes into fuller summaries later. That means there is always a chance I captured something imperfectly or interpreted a point differently than the speaker intended. Any mistakes are mine.
This is not a transcript or an official summary. It is simply what stood out to me and what I thought might be useful for revisiting ideas from the conference. NAF plans to release full talk recordings on their YouTube channel, which will be a more complete source.
Day 1 - Nautocon
Nautocon Notes
- Many teams are moving toward Golden Config for standardization and compliance.
- Single Source of Truth Synchronizer plugin is gaining traction for keeping data aligned across systems.
- Design Builder is becoming a key companion to Golden Config for managing structured inputs.
- Data validation is now included in Nautobot core.
- Nautobot v3 released with updated navigation visuals and a new favorites section.
- Nautobot is establishing a certification program.
Real World Stories
Transaction Network Services (Kirstin Nickerson)
- Global organization with ~20k devices today, targeting ~90k in Nautobot.
- Vendors: Cisco, Juniper, Fortinet, F5.
- Heavy use of:
- Golden Config
- Device lifecycle management
- Operational compliance (pre and post change checks)
- Engineers have self service automation available.
- May be hiring.
Intel (Greg Botts)
- Uses open source Nautobot as their Source of Truth.
- Small automation team of 2, with network engineers handling the rest.
- Nautobot Jobs used to support a large brownfield migration (~5k devices).
- Mostly uses standard plugins.
- Custom plugin built to sync their internal IPAM nightly.
Arizona State University (Todd Simmons)
- ASU recognized as a highly advanced technical organization.
- Full CI and CD pipeline managing Nautobot.
- Pulls configuration
- Watches for changes with webhooks
- Lambda functions for automation workflows
- ZTP pipeline (true ZTP for greenfield)
- Able to bring up a 50 switch site in 37 minutes.
- Does not use Golden Config.
Key Takeaways
- Senior leadership support is critical. Show cost savings and measurable impact.
- Focus on defining the desired state, not automating legacy processes.
- Involve engineers directly. Let them input and shape the data.
- Ownership increases adoption. People are more willing to use tools they helped influence.
Day 1 - AutoCon4
Jeff Gray (CEO, Gluware) - Building a Network Automation Business Case that Wins at the Top
Speaker: Jeff Gray (CEO, Gluware)
Core ideas
- Automation is how modern business gets done. Without it, progress is just more heroic manual work.
- The future belongs to teams that can explain automation in CXO language, not just teams that automate the hardest.
- You need an impact model that ties automation to:
- Strategic business alignment
- Competitive analysis
- Business case math
- Clear “why now”
Financial concepts to understand
- Net Present Value (NPV): Present value of benefits minus costs over time.
- Payback period: How long until the investment pays for itself.
- Internal Rate of Return (IRR): Profitability of an investment, accounting for time value of money.
Inputs to your impact model (top variables)
- Number of network devices
- Annual downtime and cost of outages
- Device growth per year
- Admin tasks per device per year
- Network operations, architecture, and network dev staff counts
- Fully loaded FTE cost
- Existing annual tooling cost
- Extra credit: impact on revenue acceleration
Building the case
- Learn the financial lingo and math.
- Gather the top variables.
- Manage assumptions:
- Define current baseline.
- Assume a five year ramp.
- Define realistic efficiency gains.
- Give yourself a “haircut” so you under promise and over deliver.
- Build the business case:
- Increased revenue is the strongest benefit.
- Validate with a rollout plan:
- What is achievable, by when, and in what phases.
Competitiveness and defensibility
- Focus on lowest cost that still meets capability needs.
- It is not about pushing a specific vendor, it is about getting the job done.
- Be very careful with AI assumptions:
- Do you trust it?
- In production?
- What are you willing to sign up for, and why now?
Exit: how to sell it internally
- Move money from their pocket to your initiative by answering “What is in it for me?”
- Go to finance early and make them rationally greedy:
- Tie automation to current pains or crises.
- Buying motives to appeal to:
- Desire for gain
- Fear of loss
- Comfort and convenience
- Security and protection
- Pride of ownership
- Emotional satisfaction
NABCDE pattern
- Need
- Approach
- Benefits
- Competitiveness
- Defensibility
- Exit
Andy Lapteff - Confessions of a CLI Lifer Who Learned to Love Automation
Speaker: Andy Lapteff
Core ideas
- Resistance to automation is often emotional, not technical.
- Identity stories like “I am not a developer” block progress more than tool gaps.
- Small wins in automation can completely change someone’s internal narrative.
- Vulnerability and storytelling help others see themselves in the journey.
Mindset shifts
- Start extremely small:
- Example: one Python script that logs in and runs a show command.
- Reframe the story:
- “I cannot code” to “I am an engineer, I can learn this.”
- Learn in public:
- Share the struggle, not just polished results.
- Treat emotional resistance like technical debt:
- Fear of failure, past academic trauma, attachment to CLI.
Cognitive biases to address
- Negativity bias (one bad comment outweighs many positive).
- Identity anchoring (“I am a CLI person”).
- Loss aversion (“automation will replace me”).
- Past failure imprinting.
Adoption techniques
- Create safe, small automation wins for hesitant engineers.
- Use narrative reframes:
- Automation as amplification, not replacement.
- Be the “first wildebeest”:
- Go first, show it is survivable, and others follow.
- Build community:
- Study groups, internal chats, shared experiments.
Dinesh - The NAF Network Automation Framework: A Modular Reference Architecture
Speaker: Dinesh (NAF Working Group)
Core ideas
- NAF provides a map, not a mandate:
- Shared vocabulary and mental model for everyone in automation.
- Focuses on six universal building blocks that exist in every automation system.
- Tool, vendor, and protocol agnostic:
- Emphasizes characteristics over brands.
- Good automation still depends on strong networking fundamentals.
The six NAF building blocks
- Intent – What do I want?
- Desired outcomes, vendor neutral, extensible, declarative.
- Operational truth (observability) – What do I have?
- Multiple truths, needs to be timely, structured, and queryable.
- Executor – How do I change things?
- Pushes config or changes via SSH, APIs, NETCONF, gNMI, etc.
- Collector – How do I gather state?
- Telemetry and show commands, method is secondary to accuracy.
- Orchestrator – How do I coordinate?
- Orders tasks and multi device workflows, optional but valuable.
- Presentation layer – How do humans interact?
- CLI, UI, dashboards, whatever makes the system understandable and usable.
Principles
- Keep things functional, simple, and modular.
- Focus on intent and state rather than imperative step lists.
- Use NAF as a shared language:
- “This project touches collector and intent” is clearer than “some scripts.”
The Operating Model for AI Infrastructure
Speaker: Itential (didn’t catch the speaker’s name)
Core ideas
- AI is a tool in the toolbox, not a universal solution.
- Many devices and processes will never be “AI native.”
- Safe AI for infrastructure must be:
- Secure
- Governed and accountable
- Traceable and auditable
- Practical model has three layers:
- Instrumentation
- Deterministic execution
- AI reasoning
Three layers
- Instrumentation layer
- Existing scripts, APIs, telemetry, configuration methods.
- Keep investing here regardless of AI.
- Deterministic execution (orchestration)
- Workflows that run the same way every time.
- Clear inputs, defined steps, predictable outputs.
- AI reasoning layer
- Uses context plus models to suggest or select actions.
- Calls deterministic workflows rather than touching devices directly.
Evolution of operations
- Human in the loop:
- AI recommends, humans approve.
- Human on the loop:
- AI triggers workflows, humans monitor and hold the kill switch.
- Selective autonomy:
- For bounded problems, AI operates mostly on its own with strong logging and guardrails.
Design implications
- AI does not replace automation. It rides on top of well designed automation.
- Build small, single purpose workflows that are safe for agents to call.
- Log what changed, when, and why so operations stay auditable.
Mark Prosser - Break Down Barriers Before you Build Up Platforms
Speaker: Mark Prosser
Core ideas
- Tool wars (Ansible vs Python, etc.) distract from real blockers.
- Most networks are not stuck because “Ansible does not scale” on 15 routers.
- Automation and platforms are socio technical problems:
- People, process, data, and culture matter as much as code.
- You must understand constraints before you invest in platforms.
Constraints first, platform second
- Walk the “assembly line” of how services are delivered:
- Tickets, approvals, documentation, handoffs, monitoring, not just router CLI.
- Map signal driven workflows:
- Every actor, system, and signal from request to delivery.
- Look for bottlenecks in:
- Change management
- Notifications
- Data ownership
- Human queues
Platforms and risks
- A platform is a collection of working subsystems with a unified interface.
- Vendors already sell many platforms.
- Building your own platform too early:
- Can consume years
- Needs many skills
- Often misses the real organizational constraints
Data and ownership
- Data is scattered:
- Ticketing, CMDBs, monitoring, spreadsheets, internal tools.
- Ask:
- Do we have API access?
- Who owns this data?
- Can we normalize it?
- Do not assume “single source of truth” is always the right first step.
Skills and Moneyball analogy
- You probably cannot hire a full set of platform unicorns.
- Assemble skills in aggregate:
- Network, software, DB, process, and communication across teams.
- Use Gall’s Law:
- Start from simple systems that work and grow from there.
Greg Botts - Overhauling Data Center Network Automation: Intel’s Data-Centric Journey
Speaker: Greg Botts (Intel)
Core ideas
- Intel moved from a vendor turnkey automation stack to a data centric system built around Nautobot.
- The key pivot was “design for data, not for config.”
- Small team (about two people) supporting thousands of devices by focusing on:
- Modeling
- Standardization
- Open tooling
Old world
- Vendor turnkey system with:
- ZTP
- Inventory
- Templates and rendering
- Worked well at first, but:
- Hit scale limits
- Required many clusters
- Accumulated technical and operational overhead
Transition
- Pulled templates and logic into repos.
- Stored device data in internal DBaaS with custom schema.
- Built an internal engine to combine data and templates.
- Eventually decided to start again using Nautobot as the network source of truth.
New architecture
- Nautobot at the center, hosted on Kubernetes with DBaaS.
- ZTP integrated with Nautobot for identity, images, and base configs.
- Data modeling:
- VLANs, PVLANs, MLAG, BGP, roles, and more modeled explicitly.
- Two ingestion paths:
- Brownfield jobs to import from existing devices.
- Greenfield jobs for new builds.
Config generation and deployment
- GraphQL queries pull modeled data.
- Templates render intended config.
- Golden config style patterns provide diff and compliance.
- Custom executor tool:
- Gets intended config from Nautobot.
- Shows diff.
- Pushes changes.
Lessons
- Data is king:
- Once modeled, many tools become simpler.
- Use migrations to remove “landmines” and snowflake configs.
- Design hosting, logging, and maintenance up front.
- Democratize network data via standard APIs and queries.
Cat Gurinsky - Lifecycle Automation: Troubleshooting, Upgrading & More
Speaker: Cat Gurinsky
Core ideas
- Big wins in automation are often in the middle of lifecycle:
- Troubleshooting
- Maintenance
- Upgrades
- Most incidents and changes follow repeatable patterns that can be automated.
- Small teams can run large networks if they automate these patterns carefully.
Troubleshooting
- After an alert, most checks are predictable:
- Interface state and light levels
- Neighbors
- Logs and counters
- Automation can:
- Run commands
- Parse and store structured output
- Enrich tickets with findings
- Generate work orders for onsite tasks
Maintenance
- ACLs and large repetitive changes:
- Model rules in data formats (for example YAML).
- Generate device specific fragments.
- Extension management (for example Splunk on switches):
- Track OS and extension compatibility.
- Enforce install order, prerequisites, certificates, connectivity checks.
Upgrades
- Constraints:
- Daytime upgrades, dual connected servers, tight maintenance windows.
- Pre upgrade automation:
- Gather facts
- Check flash and stage images
- Plan one step or two step paths
- Validate LACP, trunks, and MLAG
- Capture pre check commands and configs
- Execution:
- Set boot image
- Remove extensions if needed
- Reload with confirmation
- Watch for device return
- Post upgrade:
- Reinstall extensions
- Run post checks
- Compare states
- Summarize results
Implementation style
- Modular Python:
- Small modules for facts, flash, LACP, etc.
- Wrappers to build full workflows.
- Emphasis on:
- Skipping unsafe devices
- Logging
- Handling weird edge cases gracefully
Chris Grundemann - 2025 State of Network Automation Survey Results
Speaker: Chris Grundemann
Core ideas
- Network automation is still early and patchy across the industry.
- Many orgs automate only a small portion of their lifecycle.
- People and organization issues are bigger blockers than technology.
Who responded
- Hundreds of respondents from many countries and industries.
- Roles:
- Network automation engineer emerged as a top role for the first time.
- Many network engineers also doing automation.
State of automation
- Most orgs have less than half of lifecycle automated.
- A notable portion have little or no automation.
- Many jump straight to pushing config, often without automated validation.
Commonly automated functions
- Backups
- Device deployment and onboarding
- Firmware and OS upgrades
- Service provisioning
Tools and approaches
- Homegrown scripts and open source tools dominate.
- Python is the primary scripting language.
- Popular tools include:
- Ansible
- NetBox
- Nautobot
- Nornir
- Terraform
Staffing and skills
- Many orgs have zero dedicated automation engineers.
- Most rely on training existing network staff.
- Skills gap:
- High confidence in networking skills
- Lower confidence in automation skills
AI and LLMs
- Only a very small percentage are using AI with production networks today.
- More are experimenting or considering it.
- Almost half have no plans yet.
Barriers
- Top barrier: skills challenges.
- Combined people barriers (org, culture, personality) outweigh technical barriers.
- Measurement is weak:
- Many do not formally measure automation impact.
- Management often has limited or misaligned visibility into reality.
Day 1 Themes and Takeaways
- Business outcomes and impact models
- Strong push to link automation to money, risk, and revenue (Jeff, survey).
- Framework thinking
- NAF building blocks, AI operating models, and data centric architectures provide maps for designing systems (NAF, Itential, Intel).
- Lifecycle focus
- Troubleshooting, maintenance, and upgrades are high value areas that many agree should be automated but often are not (Cat, survey).
- Data and modeling
- Model first, config second:
- Nautobot and similar systems as central sources of truth (Intel, NAF).
- Model first, config second:
- Skills and collaboration
- Industry wide skills gap in automation.
- Success stories involve small, cross functional teams and deliberate skill building.
Day 2 - AutoCon4
Joseph Nicholson - Scaling Network Operations with Modular Ansible: A Multi-Environment Automation Framework
Core Themes
- Ansible modularity scales repeatable NOC workflows.
- Dynamic inventory tied to the real SoT is essential.
- Automation supports humans, it does not replace them.
- Modular tasks prevent duplication across vendors and playbooks.
Key Ideas
- Use Ansible when a human is in the loop. Use Python for scheduled or continuous tasks.
- AWX adoption is gradual. Team still uses CLI but shifting to AWX pipelines.
- Dynamic inventory script runs on each play, pulling metadata from GUMS (their 25-year internal SoT).
- Inventory is millisecond fast and always fresh.
- Modular repo structure:
- Shared tasks
- Vendor-specific folders
- Common upgrade tasks
- Maintenance snapshots
- Router upgrade workflow:
- Validate change ticket
- Pre-snapshots
- Auto fetch image if not staged
- Traffic diversion
- Upgrade
- Post validation
- include vs import:
- import for static tasks and start-at-task
- include for dynamic decisions
- Pipeline:
- “FixMe” detector to catch commented-out debugging code
- AWX sync when GUMS data changes
Takeaways
- Modularity multiplies efficiency and risk.
- Dynamic inventory avoids stale data.
- Testing is hard without Molecule or containerlab.
- Maintenance windows require mindful timeouts.
- Humans validate diffs before changes.
Munachimso Nwaiwu - From CLI to Model: Building a Foundational Source of Truth for Brownfield Networks
Core Themes
- Brownfield is the norm. Greenfield is rare.
- You cannot automate what you cannot model.
- A simple, versioned SoT can be built with almost no budget.
- Start with discovery and modeling before intent or templates.
Key Ideas
- Revelo project:
- Bootstrap inventory from CSV or from discovery (nmap).
- Generate Netmiko inventory automatically.
- Use free labs (dCloud, CML, Juniper vLabs).
- SoT capture pipeline:
- Napalm for common facts.
- Netmiko for CLI.
- Genie for structured parsing.
- JSON snapshots per run with version history.
- Diff engine:
- Compare runs (for example sot_1 vs sot_5).
- Detect drift over time.
- AI opportunities:
- LLM generated parsers to expand command coverage.
- Use large datasets to infer templates later.
Takeaways
- You can bootstrap SoT without NetBox or Nautobot.
- Simple JSON folders with versioning go a long way.
- Version history enables drift detection.
- Modeling comes before intent.
- This is step 1 of a longer roadmap.
Erich Crosswhite - From Manual to Marathon: My Automation Journey at HEB
Core Themes
- Small repeatable steps lead to big automation.
- AVD and IaC simplified operations dramatically.
- Pipelines with validation and documentation create trust.
- NetBox front end reduces YAML complexity.
Key Ideas
- IaC mandate required programmatic configs and code review.
- Lab:
- EVE-NG with EOS VMs
- Homemade CloudVision lab
- AVD pipeline:
- Git MR triggers AWX
- AVD builds configs
- CVP deploys diffs
- Docs auto published to Confluence
- NetBox integration:
- Script pulls objects and builds AVD YAML
- Eliminates manual port variables
- Safety:
- “state: absent” wipes switches into ZTP
- Need validators (my suggestion) to prevent this
- Pipeline stages:
- YAML validation
- AVD build for main and feature
- Diff plus Batfish checks
- Deployment with CVP tasks
- Auto docs
Takeaways
- AVD plus CI/CD removed years of tech debt.
- Documentation in Confluence keeps teams aligned.
- Batfish adds strong guardrails.
- Access ports are the hardest part of automation.
- Many small commits look like training logs.
Scott Robohn - Toward Autonomy
Core Themes
- Autonomy is a continuum built on automation and orchestration.
- Closed loops, telemetry, and trust are mandatory.
- Autonomy already exists in networking (OSPF, BGP, LSPs).
- You do not need full autonomy. You need targeted autonomy.
Key Ideas
- Automation: deterministic, human driven.
- Autonomy: goal driven, adaptive, feedback based.
- Ingredients for autonomy:
- Telemetry signals
- Control loops
- Error handling
- Human-in / on / out-of-the-loop transitions
- Measurement of outcomes
- Examples of existing autonomy:
- Telephone switching
- MPLS LSP signaling
- BGP state machine
- AI is a “how,” not a “what.”
- Control Plane 3.0:
- More distributed intelligence
- Richer telemetry
- AI-driven reasoning
- Business value:
- Reduced toil
- Faster service enablement
- Lower operational cost
Personal Note
- Asked about “why now?” in resistant orgs.
- Scott reframed it as “why not now?” and encouraged continuous exploration.
Marc Koerner - ESnet’s new Micro-Service based Discovery Service
Core Themes
- Need a reliable, unified discovery system.
- Documents are the core data unit.
- Microservices allow modular, scalable discovery.
- Normalization is essential for multi-vendor networks.
Key Ideas
- Discovery Service:
- Fetches device configs, XML/JSON docs
- Stores compressed versions
- Publishes updates via Kafka
- Keeps history and versioning
- TAL model:
- Abstract topology representation
- Data normalizer:
- Converts raw documents into normalized data
- Architecture:
- Agents fetch data
- Controller schedules discovery
- Anneal service stores compressed docs
- Kafka broadcasts new versions
- Filtering removes high churn fields to avoid noise.
- Kubernetes deployment with:
- GitLab CI
- Harbor registry
- ArgoCD continuous delivery
Takeaways
- Normalization allows consistent application behavior.
- Microservices outperform earlier NSO-based attempts.
- History and versioning are crucial for trust.
- Open sourcing is planned once ESnet specific code is cleaned.
Karl Newell - HAWAT: An AI Assistant for Network Troubleshooting
Core Themes
- AI can run commands, interpret state, and summarize health.
- Tool calling enables safe, controlled automation.
- LLM acts as a troubleshooting assistant rather than an autonomous agent.
Key Ideas
- Use cases:
- Engineer queries
- ServiceNow ticket enrichment
- Looking Glass for members
- Architecture:
- Streamlit UI
- PydanticAI tool calling
- Containerlab topologies for testing
- LLM loops:
- Calls tools
- Gets results
- Summarizes insights
- Troubleshooting examples:
- OSPF neighbor checks
- Timer mismatch detection
- Packet loss scenarios
- Looking Glass integration:
- LLM reads API spec
- Executes REST calls
- Multi-domain pilot:
- Combine Internet2 + member networks
- Cross-domain diagnostics
Takeaways
- Tool calling with guardrails solves real troubleshooting.
- LLM provides reasoning, not direct device access.
- Refactors done with AI save massive time.
Brandon Cavett - From Band-Aids to Best Practices: USAA’s Network Automation Journey
Core Themes
- Multi-year journey moving from scripts to AVD.
- Standardization is the real unlock.
- Day 2 operations improve the moment AVD is adopted.
- Culture changed once engineers saw reduced toil.
Key Ideas
- Early era:
- Datacenter rack build automation
- Patch matrix generation
- IPAM and cabling pipelines
- Day 2 pain:
- Single-threaded pipelines
- Drift in CVP design configs
- Limited Ansible expertise
- AVD transition:
- YAML models
- Fast rebuilds
- Multi-threaded pushes
- Two full datacenters now fully AVD
- Refactoring provisioning:
- Old pre-provisioning replaced by AVD inputs
- Next steps:
- CICD pipelines
- Digital twins
- Brownfield rebuilds
- Automated validation and testing
Takeaways
- AVD created more progress in the last year than prior three.
- Standardization transformed change management.
- Leadership support grew after seeing deployment speed.
John Capobianco - From CLI to GPT: How AI Is Rewriting the Rules of Network Automation
Core Themes
- AI is mandatory for future network operations.
- Complexity exceeds human scale.
- AI simplifies workflows that automation struggles with.
Key Ideas
- Historical skepticism repeats with AI.
- Agents, RAG, and MCP will shape network operations.
- Documentation generation from show commands and APIs is powerful.
- AI and automation reinforce each other.
- Air gapped advice:
- Use local models (for example Ollama)
- Walled gardens or secure cloud partitions
- Local RAG stores
- Packet Buddy and ACI tools show AI assisted troubleshooting.
Takeaways
- Curiosity and experimentation will define future engineers.
- AI adoption curve mirrors past major tech shifts.
- Engineers who embrace AI will build the next era of operations.
Day 3 - AutoCon4
Closing Keynote: Greg Freeman - The NetDevOps Journey: Manual Firefighting to Agentic Autonomy
Core Themes
- Transform traditional tiered operations into an automation and orchestration organization.
- Human wisdom first, AI second: encode workflows, then let AI select and drive them.
- Aim for a highly orchestrated network, not just a highly automated one.
Key Ideas
- Org pyramid inversion: automation engineers on top, network engineers in the middle, techs on the bottom through natural shrinkage and upskilling.
- Created “first followers” group with heavy investment, training, onsite collaboration, and cultural alignment.
- Required process discipline:
- PDD (Process Description Document)
- SDD (Solution Design Document)
- Three operational principles:
- Do not let it break.
- If it breaks, fix it fast.
- Communicate clearly.
- Platform as a service:
- Northbound ticketing and portals.
- Workflow engine with APIs in the middle.
- Southbound Ansible stacks, TL1, NETCONF, multi vendor support.
- At scale:
- 355+ workflows
- ~10 executions per minute
- 80 percent of interactions are machine to machine
- AI selects deterministic workflows, not direct freeform device interaction.
Takeaways
- Selling leadership on automation works best when framed as a reliability investment, not just cost savings.
- PDD + SDD ensures shared ownership, avoids ad hoc scripts, and makes workflows reusable.
- Track success by volume of workflows and percent of machine to machine interactions.
- A powerful narrative slide: “Do not let it break. Fix it fast. Communicate clearly.”
Vincent Phelan - The Secret to Network Config & Security Compliance at Scale
Core Themes
- Architecture and automation must be designed together.
- Compliance requires intentional design, strong standards, and a clean data pipeline.
- Avoid accidental “Winchester House” networks. Aim for deliberate “Fallingwater” design.
Key Ideas
- Anti patterns:
- Too many platforms and tools
- Loses data through premature optimization
- Tool specific implementations
- Manual glue steps that inevitably fail
- Forcing YAML or Git too early
- Compliance pipeline:
- Gather configs and state (CLI scraping where needed)
- Parse to structured JSON
- Security defines controls in Excel
- Translate Excel to manifests to OPA policies
- Evaluate policies in batch
- Remediate and update inventory
- Uses “probability of truth” across monitoring, CMDB, inventory, ticketing.
- Measured scale rather than guessed:
- Data volumes
- OPA evaluation models
- Repository layout
- Batch vs per device performance
Takeaways
- You can start compliance automation without a perfect inventory or SoT.
- Excel is a perfectly valid frontend; automation should translate the data.
- Keep raw data so you can apply new parsers or validators later.
- Easy storytelling slide: Winchester House vs Fallingwater architecture.
Message from Accelerating Sponsor Gluware
Core Themes
- Engineers spend too much time rebuilding foundational plumbing instead of high value business logic.
- 80 percent of the work is foundational; only 20 percent delivers most of the value.
- A platform can handle the 80 percent so engineers can focus on state and behavior.
Key Ideas
- Foundational concerns:
- Credential vaulting
- RBAC, DR
- Source of truth plumbing
- Device onboarding and patching
- Engineers’ value:
- Decisions about resiliency, latency, security behavior
- Large scale patch and remediation strategies
- RPA and event driven flows:
- Self remediation for failing circuits, bad interfaces, degraded sites
- AI integrations:
- Agents that work with NetBox, ServiceNow, workflows
- Validation layer grounded in known good configs from your own environment
Takeaways
- Use the 80/20 framing to justify investing in an automation backbone.
- Emphasize validated automation when pitching internal improvements.
- Engineers should focus on policy and behavior, not boilerplate infrastructure code.
Senad Palislamovic - Building AI with AI
I missed the first half of this talk so my notes may be incomplete.
Core Themes
- Effective AI for networking is an engineered system, not a single model.
- RAG must be constrained and purposeful.
- All changes must be validated through a lab/sandbox pipeline before production.
Key Ideas
- Pipeline structure:
- Intent → retrieval pipeline → MCP/tool layer → proposed config → validation → production
- LLM alone will not reach 100 percent provisioning accuracy.
- Real power comes from:
- LLM + constrained RAG + prompts + deterministic tools + validation loop
- MVPN analogy:
- Many knobs and components combine to create a robust system, not one feature.
Takeaways
- Treat LLM as one module among many; design retrieval and validation deliberately.
- Adopt “lab before prod” for any AI driven changes.
- Messaging line: “LLM + RAG + validation, not LLM alone.”
Eric Chou - Battle of the Bots: Which AI Models Actually Work for Network Engineering Tasks
Core Themes
- Do not ask which model is best. Ask which model is best for your task.
- Evaluate AI tools using probability, payout, and expected value.
Key Ideas
- Build decision trees:
- Options, probabilities, payouts, expected value
- Hardest part is estimating probability of success:
- Human review
- Config validation
- Trusted model evaluating another model
- Capture traces:
- Prompts, tool calls, outputs, reasoning steps
- Involve the whole team:
- Define “good” outputs together
- Make evaluation transparent and repeatable
Takeaways
- Use EV based evaluation when comparing AI models or tools.
- Build an AI evaluation pipeline before adopting AI tools.
- Good phrase: “Expected value beats hype.”
Dave Duggal - A Harmonized, Standards-based Ontology for Telco-grade Agents
Core Themes
- GenAI and agentic AI are not safe enough for telco grade networks alone.
- Telco automation requires deterministic behavior built on ontologies and graphs.
Key Ideas
- LLM fixes like RAG and chain of thought are still probabilistic and costly.
- Networks are graph shaped:
- Knowledge graphs and ontologies provide meaning and strong typing.
- Hypergraphs enable multi dimensional relationships and policies.
- Knowledge graph as control plane:
- Northbound for agents
- Southbound for deterministic network operations
- Context is a dynamic reasoning pipeline, not a static blob.
Takeaways
- Use ontology language when talking to leaders who demand safety and determinism.
- Position NetBox plus inventory as steps toward a network knowledge graph.
- AI must sit on top of strong, typed domain models.
Joshua McNamara - Network Troubleshooting Automation
Core Themes
- Automate early triage so engineers start with clarity instead of scrambling.
- Integrate monitoring, ITSM, and device workflows into a single troubleshooter.
Key Ideas
- Alarm → ticket → workflow → commands → unified report
- Features:
- Normalized show commands
- Historical comparisons
- Change management pre/post checks
- Future:
- Runbooks tied to alarm types
- Self service workflow changes
- AI assistant (NotBot) with vendor aware RAG
Takeaways
- Troubleshooter apps are high ROI and approachable as a first automation milestone.
- AI assistants should suggest, not execute, while workflows do the deterministic work.
- Map your common alarm types to potential automated playbooks.
comments powered by Disqus