Lead Site Reliability Engineer, Network Assurance Data Platform, Cisco ThousandEyes, Cisco

India, Bangalore

March 30, 2026

Full Time Hybrid

Job Description

Cisco ThousandEyes is seeking a Lead Site Reliability Engineer. The engineer will establish and maintain reliability and scalability and security standards for cloud platforms and big-data platforms which will support AI and machine learning operations. The engineer will work with engineering teams and product teams and operations teams and security teams to create enterprise-ready software as a service solutions.

Job ID: 1440127

Date Posted: February 25, 2026

Expiration Date: NA

Apply: Apply Now

Main Duties

Design, build, and optimize cloud and data infrastructure for high availability and reliability of AI/ML systems.
Implement SRE principles including monitoring, alerting, error budgets, and fault analysis.
Collaborate with software development, product management, and security teams to support ML/AI workloads
Troubleshoot production issues, conduct root cause analysis, and drive performance improvements.
Lead architectural vision, define technical roadmap, and balance immediate and long-term goals.
Mentor engineering teams and foster a culture of operational excellence.
Engage with stakeholders to understand use cases and influence enterprise solutions.
Develop strategic roadmaps and automation for deploying software at scale.

Essential Qualifications

Demonstrate extensive practical experience with cloud technologies and they should especially focus on Amazon Web Services expertise.
Exhibit advanced knowledge of Infrastructure as Code which includes Terraform together with Kubernetes and EKS.
Demonstrate experience in building AI and machine learning infrastructure through their work with the Hadoop ecosystem which includes Spark Hive HDFS Gobblin Airflow EMR and SageMaker.
Programming proficiency in Python and Go and one other programming language.
Develop solutions which can grow efficiently and maintain high testing standards.

Preferred Qualifications:

Knowledge of Unix and Linux operating systems together with client-server networking protocols and observability tools which include Prometheus and Grafana and ELK stack.
CKA and CKAD and AWS DevOps Engineer certifications.
Experience in designing software solutions and infrastructure systems for large-scale enterprise environments.

Date Posted

March 30, 2026
Location

India, Bangalore
Expiration date

April 29, 2026
Gender

Both
Qualification

Bachelor Degree

Lead Site Reliability Engineer, Network Assurance Data Platform, Cisco ThousandEyes, Cisco

Job Description

Related Jobs

Data Engineer II, ShipTech Analytics, Amazon

Senior Technical Engineer, AI SW Development (ROCm), Advanced Micro Devices

Application Manager, Salesforce Service Cloud, Meta

Chief AI Expert, Software-Development Operations, SAP

Call us

+91 7207347492

Email

hr@analyticsinsight.net

Address

About Us

Login to superio

Reset Password

Create a free superio account

Lead Site Reliability Engineer, Network Assurance Data Platform, Cisco ThousandEyes, Cisco

Apply for this job

Job Description

Related Jobs

Data Engineer II, ShipTech Analytics, Amazon

Senior Technical Engineer, AI SW Development (ROCm), Advanced Micro Devices

Application Manager, Salesforce Service Cloud, Meta

Chief AI Expert, Software-Development Operations, SAP

Call us

+91 7207347492

Email

hr@analyticsinsight.net

Address

About Us