Overview:
Medallia is the pioneer and market leader in Experience Management. Our award-winning SaaS platform, Medallia Experience Cloud, leads the market in the management of experiences, insights, and actions for candidates, customers, employees, patients, and residents alike.
We believe that every experience is a memory that can last a lifetime. Experiences shape the way people feel about a company. And they greatly influence how likely people are to advocate, contribute, and stay. At Medallia, we are committed to creating a world where organizations are loved by their customers and their employees.
We empower exceptional people to create extraordinary experiences together.
Bring your whole self.
The Role and Team
The Site Reliability Engineering organization at Medallia brings together the infrastructure and applications that power a highly reliable global SaaS platform.
As an SRE II, you will help operate and improve the reliability, scalability, and performance of services running across Kubernetes-based environments in cloud and hybrid infrastructure. You will work closely with software engineering teams to build automation, improve operational excellence, and support production services used globally by Medallia customers.
We are looking for engineers who enjoy solving complex technical problems, automating repetitive tasks, improving system reliability, and learning modern cloud-native technologies in a fast-paced environment.
We value engineers who actively seek opportunities to improve scalability and operational efficiency through automation, AI-assisted engineering workflows, and continuous process improvement.
Please note this role participates in a rotating on-call schedule supporting production systems and services.
Engineering Leverage
At Medallia, we hire engineers who scale systems, teams, and outcomes through automation, platform thinking, and AI-assisted engineering.
We value engineers who challenge manual processes, reduce operational toil, and create reusable solutions that improve reliability and productivity for the broader engineering organization.
Successful engineers do not simply solve problems—they eliminate recurring problems through automation, simplification, and self-service capabilities.
Responsibilities:
- Collaborate with software engineering teams to improve application reliability, scalability, and operational maturity.
-
Operate and support production services running in Kubernetes environments.
-
Troubleshoot and resolve infrastructure and application issues across the full technology stack.
-
Build automation and tooling to reduce operational overhead and eliminate manual work.
-
Leverage AI-assisted engineering tools and automation platforms to accelerate troubleshooting, improve productivity, and reduce operational toil.
-
Identify opportunities to streamline operational processes through automation, AI-enabled workflows, and self-service solutions.
-
Create reusable solutions, tooling, and operational improvements that increase engineering leverage across the team.
-
Support CI/CD and GitOps-based deployment workflows.
-
Develop and maintain infrastructure-as-code configurations and operational tooling.
-
Monitor system health, availability, and performance using observability and alerting platforms.
-
Participate in incident response, root cause analysis, and operational improvements.
-
Continuously improve reliability, deployment processes, and operational standards.
Candidates based in the Mexico City vicinity will be prioritized as this role is Hybrid, 3 days per week onsite.
Qualifications:
Minimum Qualifications
-
2+ years of experience in Site Reliability Engineering, DevOps, Systems Engineering, Cloud Operations, or related roles.
-
Demonstrated experience supporting production environments running on Kubernetes or other containerized platforms.
-
Demonstrated experience with cloud infrastructure platforms such as AWS, OCI, or GCP.
-
Demonstrated experience with Linux systems administration and troubleshooting.
-
Demonstrated experience with scripting or programming languages such as Python, Bash, or Go.
-
Familiarity with CI/CD pipelines and Git-based workflows.
-
Demonstrated understanding of networking fundamentals including DNS, load balancing, TLS/SSL, and routing concepts.
-
Demonstrated experience troubleshooting distributed systems and production incidents.
-
Ability to participate in an on-call rotation supporting production systems.
-
Fluency in English, both oral and written.
Preferred Qualifications
-
Experience with GitOps and tools such as ArgoCD.
-
Experience with infrastructure-as-code tools such as Terraform.
-
Familiarity with observability platforms such as Prometheus, Grafana, Loki, or OpenTelemetry.
-
Experience operating services in hybrid-cloud or multi-region environments.
-
Understanding of release strategies such as rolling deployments, canary releases, or blue/green deployments.
-
Familiarity with incident management and operational best practices.
-
Exposure to security and compliance concepts in production environments.
-
Experience using AI-assisted development, automation, or operational tooling to improve engineering productivity and service reliability.
-
Demonstrated passion for automation, process improvement, and operational efficiency.
-
Strong communication and collaboration skills.
At Medallia, we celebrate diversity and recognize the value it brings to our customers and employees. Medallia is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age (40 and over), disability, genetic information, veteran status or military service, or any other status protected by state or local law. Individuals with a disability who need an accommodation to apply please contact us at
[email protected]. For information regarding how Medallia collects and uses personal information, please review our Privacy Policies. Applications will be accepted for 30 days from the date this role was posted or until the role has been filled.