We are building a resilient cloud platform and need a Lead Site Reliability Engineer (SRE)/DevOps to drive stability, scale, and operational excellence. You will blend software engineering with systems expertise to run large, distributed, fault-tolerant services and strengthen reliability practices across teams. Apply now to help raise availability, performance, and automation across production
Responsibilities
-
Design, build and maintain infrastructure and tooling that enables fast software development and reliable releases
-
Ensure continuous availability, performance and scalability of production systems and services
-
Implement automation tools to streamline operations and improve response to alerts and incidents
-
Collaborate with the development team to enhance system reliability and optimize performance
-
Create and maintain operational documentation and specifications for system builds and operating procedures
-
Monitor and report on service level objectives for a given application's services
-
Define key performance indicators in cooperation with business and product owners
-
Promote a culture of continuous improvement, testing and automation
Requirements
-
Bachelor's or Master's degree in Computer Science, Information Technology or related field
-
Proven track record with 5+ years of experience in an SRE/DevOps role scaling and automating large-scale systems
-
Solid understanding of cloud computing services, preferably AWS, Azure or GCP
-
Hands-on experience with scripting languages such as Python and Bash and infrastructure as code tools such as Terraform and CloudFormation
-
Strong skills with container orchestration tools such as Kubernetes and Docker
-
Working knowledge of CI/CD pipelines and tools such as Jenkins and GitLab CI
-
Practical familiarity with monitoring and alerting tools such as Prometheus, Grafana and New Relic
-
Excellent leadership and communication skills
-
English proficiency at B2 level or higher
We offer
-
International projects with top brands
-
Work with global teams of highly skilled, diverse peers
-
Healthcare benefits
-
Employee financial programs
-
Paid time off and sick leave
-
Upskilling, reskilling and certification courses
-
Unlimited access to the LinkedIn Learning library and 22,000+ courses
-
Global career opportunities
-
Volunteer and community involvement opportunities
-
EPAM Employee Groups
-
Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn
EPAM is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, age, sexual orientation, gender identity or expression, disability, protected veteran status, or any other characteristic protected by applicable law.