Job Description
As an SMTS/LMTS Distributed Systems Engineer, they will design, build, and operate resilient cloud-native platforms that power Salesforce applications on Google Cloud Platform. They will work on large-scale distributed systems, infrastructure automation, and agentic AI-enabled platforms, ensuring high availability, performance, and reliability for millions of users worldwide.
Date Posted: December 23, 2025
Expiration Date: NA
Experience: NA
Job ID: JR288530
Apply: Apply Now
Primary Responsibilities
- Creating, running, debugging and designing highly available distributed systems digitally from Public Cloud Platforms.
- Automating the process of building cloud infrastructure by using tools, workflows, validation frameworks for GCP and other cloud environments.
- Creating microservices and controlling the deployments and services of these microservices via containerization technologies such as Kubernetes and Docker.
- Using Terraform, or other IaaS Tools, to build Infrastructure-as-Code so that your deployments can be easily scaled and repeated.
- Running large scale distributed systems that span thousands of compute nodes and many Data Centers. Resolving complex Production Issues by increasing System Reliability/Performance/Toughness through Enhanced Capability.
- Routing teams of technical Staff Engineers to perform live-site operations, feature development, and reduce Technical Debt.
- By participating in an on call rotation, to ensure the continued provision of reliable service and maintaining the highest operational quality.
- Using and contributing to the Open Source Technology Ecosystems of Kubernetes, Argo, etc.
Essential Qualifications
- Comprehensive experience in the design and supervision of very large distributed systems.
- Expertise in Terraform, Kubernetes, or Spinnaker tools.
- Proficiency in programming with Java, Golang, Python, or Ruby at a high level.
- Hands-on experience with infrastructure services, such as monitoring, alerting, logging, and reporting, in the battle field.
- A proven history of taking the reins and managing production-level services.
- A thorough understanding of concurrent processing, ithe data handling, and fault-tolerant systems designing.
- A history of working with Agile methodologies and TDD (Test Driven Development) in parallel.
- Great at troubleshooting and debugging and also very good at tuning the system.
Preferred Qualifications
- Experience with public cloud services such as GCP, AWS, Azure, or Alibaba Cloud.
- Familiarity with Falcon or similar security and operational tools.
- Understanding of AI-powered, agentic, or platform-based architectures.
- Involvement in large-scale cloud or platform engineering projects.
- Ability to operate and scale systems for millions of users.
- Working with multidisciplinary engineering and product teams in challenging environments.