We are seeking a Lead DevOps Engineer with solid expertise in incident and request management, supported by hands-on experience with tools like Dynatrace, Grafana, and Splunk.
The role encompasses monitoring configuration, tool administration, and break-fix work on medium-complexity tickets.
Responsibilities
-
Drive technical initiatives by offering direction and oversight to fellow team members
-
Plan, design, develop, test, deploy, and maintain software solutions in line with project needs
-
Work jointly with other technical and non-technical teams to support successful project delivery
-
Help shape and deliver solution requirements, estimates, and project timelines
-
Maintain the quality of technical outputs while observing development standards and best practices
-
Manage incidents and requests through ServiceNow or JIRA as the tracking solution
-
Remain reachable for monitoring and escalation during off-hours and weekends, including pager duty coverage for after-hours emergencies
-
Sort tickets, update their details, and evaluate urgency levels appropriately
-
Be willing to take part in on-call rotations both in India and during CST hours on a rotational schedule
Requirements
-
At least 5 years of relevant professional background
-
A minimum of one year guiding and managing development teams
-
Practical experience with Microsoft Azure and Azure Log Analytics for cloud infrastructure and monitoring
-
Background in Dynatrace administration and Dynatrace Workflows for observability and automation
-
Working knowledge of event management, extensions, and integrations within Dynatrace
-
Proficiency in infrastructure as code deployments with Terraform
-
Competence in Kubernetes automation for container orchestration and management
-
Familiarity with supporting tools such as GitHub Actions for CI/CD, ServiceNow for IT service management, and Confluence for documentation and collaboration
-
Capability to work effectively on your own as well as within a team
-
Strong analytical and problem-solving outlook, with demonstrated experience troubleshooting under pressure
-
Strategic mindset combined with complex problem-solving and analytical skills
-
Solid organizational and interpersonal abilities, including experience instilling a culture of operational maturity
-
Aptitude for quickly adapting to new technologies
-
Excellent spoken and written English communication abilities (C2 level)
We offer
-
International projects with top brands
-
Work with global teams of highly skilled, diverse peers
-
Healthcare benefits
-
Employee financial programs
-
Paid time off and sick leave
-
Upskilling, reskilling and certification courses
-
Unlimited access to the LinkedIn Learning library and 22,000+ courses
-
Global career opportunities
-
Volunteer and community involvement opportunities
-
EPAM Employee Groups
-
Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn