We are looking for a Senior Site Reliability Engineer (SRE)/DevOps who will spearhead the efforts in optimizing, maintaining, and scaling our IT infrastructure and operations. This role combines software and systems engineering to build and run large-scale, distributed, fault-tolerant systems. The ideal candidate will have a strong background in software development, system administration, and a keen interest in network operations and architecture.
Responsibilities
-
Design, build and maintain the infrastructure and tools to allow for the speedy development and release of software
-
Ensure continuous availability, performance and scalability of production systems and services
-
Implement automation tools for efficient operations and response to system alerts and issues
-
Collaborate closely with the development team to improve the reliability and performance of the system
-
Develop and maintain operational documentation and specifications on system builds and operational processes
-
Monitor and report on service level objectives for a given application's services
-
Establish key performance indicators in cooperation with business and product owners
-
Foster a culture of continuous improvement, testing and automation
Requirements
-
Bachelor's or Master's degree in Computer Science, Information Technology or related field
-
3+ years of experience in an SRE/DevOps role with a proven track record of scaling and automating large-scale systems
-
Understanding of cloud computing services, preferably AWS, Azure or GCP
-
Proficiency in scripting languages such as Python and Bash along with infrastructure as code tools such as Terraform and CloudFormation
-
Skills in container orchestration tools such as Kubernetes and Docker
-
Knowledge of CI/CD pipelines and tools such as Jenkins and GitLab CI
-
Familiarity with monitoring and alerting tools such as Prometheus, Grafana and New Relic
-
Excellent leadership and communication skills
-
English proficiency at B2 level or higher
We offer
-
International projects with top brands
-
Work with global teams of highly skilled, diverse peers
-
Healthcare benefits
-
Employee financial programs
-
Paid time off and sick leave
-
Upskilling, reskilling and certification courses
-
Unlimited access to the LinkedIn Learning library and 22,000+ courses
-
Global career opportunities
-
Volunteer and community involvement opportunities
-
EPAM Employee Groups
-
Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn
EPAM is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, age, sexual orientation, gender identity or expression, disability, protected veteran status, or any other characteristic protected by applicable law.