Job Title: Research Assistant - AI Agents & LLM Prototyping
Location: London, UK- Hybrid- 3 Days onsite
Duration: Until Dec 2025- possible extension
Role Overview:
We are looking for a Research Assistant with 2+ years of experience in prototyping and testing AI agents or large language models (LLMs). You will design test prompts, experiment with prompt engineering, and debug AI agent tool calls within a Python/PHP software stack. You'll also help create internal benchmarks to evaluate AI agent performance.
Key Responsibilities:
- Create and refine test prompts to guide AI agents toward desired behavior.
- Implement and troubleshoot AI agent tool calls in a Python/PHP environment.
- Develop high-quality prompts to build internal evaluation benchmarks for AI agents.
- Test AI agents to assess their ability to perform tasks such as ordering, scheduling, or cancelling meetings.
- Analyze test outcomes, identify issues, and communicate findings for continuous improvement.
- Navigate and understand the Python codebase to correlate test results with underlying code.
- Improving and testing existing AI agents.
- Identifying where agents perform well and where they fail.
- Prompt testing and tuning to optimize agent responses.
- Flagging test results as pass/fail based on expected behavior.
Technical Skills & Qualifications:
- Bachelor's degree in Computer Science, Computer Engineering, or related field (or equivalent experience).
- 2+ years of experience in research or prototyping within machine learning, deep learning, or natural language processing.
- Proficient in Python, with the ability to understand and work with a codebase containing multiple interrelated files.
- Experience with AI agent frameworks such as Model Context Protocol (MCP) and LangChain.
- Familiarity with architectural patterns of large-scale software systems.
- Basic knowledge of SQL and data analysis is a plus.
- Experience using source control systems (e.g., Git).
- Experience in generative AI and LLM research preferred.
Essential Skills:
- Python programming
- Generative AI knowledge
- Understanding of MCP servers and LangChain frameworks
