Freelance Agent Evaluation Engineer (Remote)

Denmark
Posted 12 hours, 48 minutes ago
Research and Development

About the role

Job summary

This role involves evaluating AI coding agents by creating realistic developer environments and tasks. You will be responsible for designing evaluation criteria and writing tests to verify agent solutions.

Qualifications

  • Minimum of 5 years of experience in software development.
  • Proficient in Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, and Redis.
  • Experience in writing functional and integration tests.
  • English proficiency at B2 level or higher.

Responsibilities

  • Build realistic developer environments that simulate a virtual company with a codebase and context.
  • Design tasks based on intermediate states of these environments, defining what constitutes a 'solved' task.
  • Write tests to verify the solutions provided by AI agents, ensuring a balance between strictness and leniency.
  • Iterate on tasks and tests based on quality assurance feedback, refining them to ensure fairness and robustness.

Skills

  • Strong understanding of AI model capabilities and limitations in coding tasks.
  • Ability to create challenging tasks that effectively evaluate AI performance.

Education

  • Relevant degree in Computer Science or a related field is preferred but not mandatory.

Tools

  • Familiarity with development tools and environments relevant to the core stack mentioned.
Full Access

Ready to apply for this role?

Full Access gives you the company name, full job description, and a direct link to apply. The summary above helps you explore the role.

Share this job