We are seeking a Senior Site Reliability Engineer with substantial expertise in enhancing the reliability, availability, performance and scalability of production environments. The right candidate will bring a strong software engineering mindset paired with deep operational know-how, cloud expertise, automation capabilities and practical incident management experience.
This position centers on constructing dependable systems, cutting down operational toil, strengthening observability and supporting engineering teams in delivering services that meet established reliability targets.
Responsibilities
-
Architect and deploy solutions that enhance system reliability, availability and performance
-
Establish and track SLIs, SLOs and error budgets
-
Develop automation that minimizes manual operational effort and repetitive activities
-
Enhance monitoring, logging, tracing and alerting capabilities
-
Take part in incident response, root cause analysis and postmortems
-
Partner with development teams to strengthen service resilience and operability
-
Maintain production systems and assist in resolving complex technical problems
-
Contribute toward capacity planning, performance tuning and disaster recovery strategies
-
Advocate for reliability engineering practices throughout teams
Requirements
-
Substantial experience in SRE, DevOps, Platform Engineering or Production Engineering positions
-
Practical experience maintaining production systems at scale
-
Familiarity with cloud platforms including AWS, Azure or GCP
-
Deep knowledge of observability tools used for monitoring, logging, tracing and alerting
-
Background in incident management, postmortems and root cause analysis
-
Solid scripting or programming abilities in Python, Go, Bash or similar languages
-
Familiarity with Linux systems, networking and distributed systems fundamentals
-
Working knowledge of containers and orchestration platforms like Docker and Kubernetes
-
Sound understanding of CI/CD, automation and Infrastructure as Code
-
Excellent problem-solving abilities and capacity to perform under pressure
Nice to have
-
Background defining SLIs, SLOs and error budgets
-
Familiarity with Prometheus, Grafana, Datadog, New Relic, Splunk, ELK or comparable tools
-
Hands-on use of Terraform or other IaC tools
-
Exposure to chaos engineering or resilience testing
-
Background with high-availability systems and disaster recovery planning
-
Certifications in cloud technologies or Kubernetes
We offer
-
Career plan and real growth opportunities
-
Unlimited access to LinkedIn learning solutions
-
Constant training, mentoring, online corporate courses, eLearning and more
-
English classes with a certified teacher
-
Support for employee’s initiatives (Algorithms club, toastmasters, agile club and more)
-
Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
-
Flexible work schedule and dress code
-
Collaborate in a multicultural environment and share best practices from around the globe
-
Hired directly by EPAM & 100% under payroll
-
Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
-
Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
-
13 % employee savings fund, capped to the law limit
-
Grocery coupons
-
30 days December bonus
-
Employee Stock Purchase Plan
-
12 vacations days
-
Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
-
Monthly non-taxable amount for the electricity and internet bills
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
By applying to our role, you are agreeing that your personal data may be used as in set out in EPAM´s Privacy Notice and Policy.