Freelance Agent Evaluation Engineer (Remote, AI Development)

Denmark
Posted 5 hours, 25 minutes ago
Engineering

About the role

Job summary

This role involves evaluating AI coding agents by creating realistic tasks and assessment criteria within simulated environments. The position is project-based and requires collaboration with AI to develop challenging scenarios for testing AI capabilities.

Qualifications

  • Degree in Computer Science, Software Engineering, or related fields
  • Over 5 years of experience in software development, primarily using Python (FastAPI, pytest, async/await, subprocess, file operations)
  • Background in full-stack development, with experience in building React-based interfaces (JavaScript/TypeScript) and robust back-end systems
  • Experience in writing functional and integration tests
  • Familiarity with Docker containers and infrastructure tools (Postgres, Kafka, Redis)
  • Understanding of CI/CD processes (GitHub Actions)
  • Proficient in English (B2 level)

Responsibilities

  • Create virtual companies and realistic development environments for AI evaluation
  • Assemble and calibrate tasks, ensuring they are solvable and fairly evaluated
  • Design isolated environments for testing, including developer workstations and web application codebases
  • Write tests to validate solutions, ensuring they are neither too strict nor too lenient
  • Collaborate with AI agents to verify task effectiveness and iterate based on feedback
  • Review code from AI agents and analyze performance to design edge cases

Skills

Compensation

Contributors can earn up to $50 per hour based on their expertise and contribution pace. The estimated effort for tasks is around 20 hours, with flexibility in scheduling.

  • Strong coding and testing skills across the software stack
  • Ability to reason about code and understand complex scenarios
  • Experience with task creation that challenges advanced AI models
Full Access

Ready to apply for this role?

Full Access gives you the company name, full job description, and a direct link to apply. The summary above helps you explore the role.

Share this job