Job Type
Full-time
About the Role
This position focuses on advancing how large language models are tested through automated AI red teaming agents. The role sits at the intersection of adversarial machine learning and automated security testing, helping organizations identify vulnerabilities in LLMs before they reach production. The employer is looking for candidates who can design and implement intelligent systems that systematically probe AI models for safety, bias, and security issues.
Key Responsibilities
- Develop and deploy AI-powered red teaming agents that automate adversarial probing of large language models
- Integrate existing attack frameworks such as Tree of Attacks with Pruning, Crescendo, and Skeleton Key into automated workflows
- Evaluate and combine open-source red teaming tools including Microsoft PyRIT, NVIDIA Garak, and Promptfoo
- Design scoring methods and prompt transforms that improve detection rates for bias, jailbreaks, and security vulnerabilities
- Collaborate with LLM developers and security teams to prioritize findings and refine testing coverage
- Research and prototype new attack techniques that address gaps in current red teaming coverage
- Document and present results to technical and non-technical stakeholders to drive security improvements
Requirements
- Strong background in machine learning, adversarial ML, or AI security research
- Experience with LLM evaluation frameworks and prompt engineering techniques
- Practical knowledge of open-source red teaming tools such as PyRIT, Garak, or Promptfoo
- Programming skills in Python and familiarity with ML libraries such as PyTorch or TensorFlow
- Ability to research and implement new adversarial attack methods
- Excellent communication skills for reporting findings and collaborating with cross-functional teams
Compensation & Benefits
- Competitive salary based on experience
- Health, vision, and dental insurance
- 401(k) retirement plan with employer match