Freelance Agent Evaluation Engineer (Remote, AI Industry)

Denmark
Posted 2 weeks, 4 days ago
Software Development

About the role

Job summary

This role involves creating and evaluating tasks for AI coding agents within simulated environments, focusing on real-world developer tasks. The position is project-based and requires collaboration with AI systems to develop challenging evaluation criteria.

Qualifications

  • Degree in Computer Science, Software Engineering, or a related field
  • Over 5 years of experience in software development, primarily using Python (FastAPI, pytest, async/await)
  • Background in full-stack development with experience in React-based interfaces (JavaScript/TypeScript)
  • Experience in writing functional and integration tests
  • Familiarity with Docker containers and infrastructure tools (Postgres, Kafka, Redis)
  • Understanding of CI/CD processes (GitHub Actions)
  • English proficiency at B2 level

Responsibilities

  • Create virtual companies with realistic development environments, including codebases and infrastructure
  • Assemble and calibrate tasks from intermediate states of the virtual company, ensuring tasks are solvable and evaluations are fair
  • Design tasks in isolated environments that simulate a developer's workstation
  • Write tests that accurately accept correct solutions and reject incorrect ones
  • Collaborate with AI agents to verify task effectiveness and iterate based on feedback
  • Review code from agents and analyze outcomes to design edge cases

Skills

  • Strong understanding of software development and testing methodologies
  • Ability to reason about code across the stack
  • Familiarity with AI systems and their limitations in coding tasks

Education

  • Bachelor's degree or higher in a relevant field

Tools

  • Python, FastAPI, pytest, Docker, GitHub Actions, Postgres, Kafka, Redis
Full Access

Ready to apply for this role?

Full Access gives you the company name, full job description, and a direct link to apply. The summary above helps you explore the role.

Share this job