How NetApp’s SRE team manages internal AI tools alongside their entire dev-critical infrastructure
Session Date: Sep 16, 2025
Time: 8:00am (PT) | 3:00pm (GMT)
NetApp is the leader in intelligent data infrastructure. Their platform enables customers to store any data type and run any workload, simply and seamlessly across any environment. NetApp’s Engineering Tools and services Site Reliability Engineering team is continually refining how they support internal build environments, test infrastructure, and automation systems. Recently, they’ve expanded their scope to include internal AI tools for developers, introducing new, resource-intensive workloads that require close monitoring. Learn how NetApp has advanced their time series platform implementation to proactively detect performance trends before system failures, and how their practices have changed to incorporate AI tools as part of their stack.
Key takeaways:
- NetApp’s evolved approach to monitoring SRE team metrics - including advanced SLO and SLI strategies
- The unique considerations for monitoring AI tools
- How they’ve optimized InfluxDB and Telegraf to detect trends faster and coordinate more effective responses