How NetApp’s SRE team manages internal AI tools alongside their entire dev-critical infrastructure
Session Date: Sep 16, 2025
Time: 8:00am (PT) | 3:00pm (GMT)
NetApp is the leader in intelligent data infrastructure. Their platform enables customers to store any data type and run any workload, simply and seamlessly across any environment. NetApp’s Engineering Tools and services Site Reliability Engineering team is continually refining how they support internal build environments, test infrastructure, and automation systems. Recently, they’ve expanded their scope to include internal AI tools for developers, introducing new, resource-intensive workloads that require close monitoring. Learn how NetApp has advanced their time series platform implementation to proactively detect performance trends before system failures, and how their practices have changed to incorporate AI tools as part of their stack.
Key takeaways:
- NetApp’s evolved approach to monitoring SRE team metrics - including advanced SLO and SLI strategies
- The unique considerations for monitoring AI tools
- How they’ve optimized InfluxDB and Telegraf to detect trends faster and coordinate more effective responses

Dustin Sorge
Lead Site Reliability Engineer, NetApp
Dustin currently resides in Pittsburgh, Pennsylvania and is the Site Reliability Engineering Technical Lead for NetApp's ONTAP Engineering organization. His team has been using InfluxDB for 8+ years and continues to leverage it for the support of critical services. He is a proud alumni of both the University of Pittsburgh and Carnegie Mellon University. Prior to joining NetApp, he was a High Performance Computing Operations Engineer and Software Engineer for the Pittsburgh Supercomputing Center.