Freelance Agent Evaluation Engineer (Remote)

Denmark
Posted 1 day, 9 hours ago
Engineering

About the role

Job summary

This role involves creating and evaluating tasks for AI coding agents within simulated developer environments. The focus is on building realistic scenarios that challenge AI models in coding tasks.

Qualifications

  • Minimum of 5 years of experience in software development.
  • Proficiency in English at B2 level or higher.

Responsibilities

  • Develop realistic developer environments, including codebases and contextual elements.
  • Design tasks and evaluation criteria that define successful outcomes for AI agents.
  • Write tests to verify the correctness of agent solutions, ensuring a balance between strictness and leniency.
  • Iterate on tasks and tests based on quality assurance feedback to enhance evaluation fairness and robustness.

Skills

Compensation

  • Strong knowledge of Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, and Redis.
  • Experience in writing functional and integration tests.
  • Up to $50 per hour, depending on experience and task completion pace. Tasks are estimated to take around 20 hours each, with flexible scheduling.
Full Access

Ready to apply for this role?

Full Access gives you the company name, full job description, and a direct link to apply. The summary above helps you explore the role.

Share this job