Freelance Agent Evaluation Engineer (Remote, AI Industry)

Denmark
Posted 1 day, 6 hours ago
Engineering

About the role

Job summary

This role involves creating and evaluating tasks for AI coding agents within simulated environments to assess their performance on real-world developer tasks. The work is project-based and requires collaboration with AI systems to develop challenging scenarios.

Qualifications

  • Degree in Computer Science, Software Engineering, or related fields
  • Over 5 years of experience in software development, primarily using Python (FastAPI, pytest, async/await, subprocess, file operations)
  • Background in full-stack development with experience in React-based interfaces (JavaScript/TypeScript) and robust back-end systems
  • Experience in writing functional and integration tests
  • Familiarity with Docker containers and infrastructure tools (Postgres, Kafka, Redis)
  • Understanding of CI/CD processes (GitHub Actions)
  • English proficiency at B2 level

Responsibilities

  • Build virtual companies with realistic development environments including codebase and infrastructure
  • Create and calibrate tasks from intermediate states of the virtual company, ensuring tasks are solvable and evaluations are fair
  • Design isolated environments for tasks, emulating a developer's workstation
  • Write tests that accurately evaluate solutions, ensuring they are neither too strict nor too lenient
  • Collaborate with AI agents to verify the effectiveness of tests and iterate based on feedback
  • Review code from agents and analyze outcomes to design edge cases and adversarial scenarios

Skills

  • Strong coding and debugging skills across the software stack
  • Ability to reason about code and understand complex scenarios
  • Experience in task creation and evaluation in AI contexts

Education

  • Bachelor's degree or higher in relevant fields

Tools

  • Python, FastAPI, pytest, Docker, GitHub Actions, Postgres, Kafka, Redis
Full Access

Ready to apply for this role?

Full Access gives you the company name, full job description, and a direct link to apply. The summary above helps you explore the role.

Share this job