We are seeking a skilled Senior AI QA Engineer with strong experience in both manual and automated testing, and extensive exposure to AI-based applications testing. The ideal candidate will test a variety of applications, including projects involving AI agents and integrations with APIs and databases. You will help ensure our solutions are reliable, accurate, and meet business requirements, while also contributing to the development of our automation capabilities.
Responsibilities
-
Research and evolve automation frameworks in line with Gen AI tooling and best practices
-
Design and automate the evaluation of Gen AI features — grounding, answer accuracy, determinism/reproducibility, precision, recall, and criteria recall
-
Build automated LLM test harnesses that scale evaluation beyond human-in-the-loop
-
Select and apply Gen AI evaluation frameworks, measuring answer quality and pipeline efficiency
-
Perform manual testing as needed to validate new features, integrations, and user stories
-
Build and maintain test cases from requirements and user stories
-
Test applications that may include AI agents, APIs, databases, and other integrations
-
Collaborate with product, engineering, and operations teams to understand requirements and deployment environments
-
Track and report test results, defects, and quality metrics
-
Assist with troubleshooting production issues; escalate risks as needed
-
Guide and support team members, including onshore and offshore consultants
Requirements
-
3+ years of experience in software QA, with at least 1 year focused on testing AI agents, agentic solutions, or LLM-based systems
-
Hands-on experience with both manual and automated testing of AI agents, including prompt/instruction testing and evaluation of agentic workflows
-
Strong programming skills in Python for test automation — pytest or equivalent, scripting, and AI/ML library integration
-
Experience with AI agent frameworks, prompt engineering, and evaluation metrics for LLM-based systems
-
Demonstrated experience in testing and evaluating Gen AI / LLM applications — grounding, answer accuracy, and hallucination/determinism checks
-
Applied knowledge of Gen AI / LLM evaluation frameworks and metrics — precision, recall, criteria recall, and efficiency
-
Experience with issue and test management tools (e.g., Jira, QMetry, TestRail)
-
Experience with version control systems and integrating tests into CI/CD pipelines
-
Experience using AI-powered tools for QA (e.g., GitHub Copilot, LLM-based test generation)
-
Understanding of cloud environments, particularly AWS
-
Excellent communication, collaboration, and leadership skills
-
Strong English communication skills (B2 level or higher)
Nice to have
-
Experience with agentic AI platforms (e.g., LangChain, OpenAI Function Calling, or similar)
-
Skills in AI safety, bias, and reliability testing
-
Background in test data generation for AI/ML systems
We offer
-
International projects with top brands
-
Work with global teams of highly skilled, diverse peers
-
Healthcare benefits
-
Employee financial programs
-
Paid time off and sick leave
-
Upskilling, reskilling and certification courses
-
Unlimited access to the LinkedIn Learning library and 22,000+ courses
-
Global career opportunities
-
Volunteer and community involvement opportunities
-
EPAM Employee Groups
-
Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn