Freelance Agent Evaluation Engineer (Remote)

Denmark
Posted 1 week, 4 days ago
Engineering

About the role

Job summary

This role involves creating and evaluating tasks for AI coding agents within simulated developer environments. The focus is on building realistic scenarios that challenge AI models and ensuring fair evaluation criteria.

Qualifications

  • Minimum of 5 years of experience in software development.
  • Proficiency in English at B2 level or higher.

Responsibilities

  • Develop realistic developer environments, including codebases and contextual elements.
  • Design tasks based on intermediate states of these environments, defining success criteria.
  • Write tests to verify the solutions provided by AI agents, ensuring a balance between strictness and leniency.
  • Iterate on tasks and tests based on quality assurance feedback to refine evaluations.

Skills

Compensation

  • Strong knowledge of Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, and Redis.
  • Experience in writing functional and integration tests.
  • Up to $50 per hour, based on experience and task complexity. Tasks are estimated to take around 20 hours each, with flexible scheduling.
Full Access

Ready to apply for this role?

Full Access gives you the company name, full job description, and a direct link to apply. The summary above helps you explore the role.

Share this job