Freelance Agent Evaluation Engineer (Remote, AI Industry)

Denmark
Posted 1 week, 2 days ago
Software Development

About the role

Job summary

This role involves creating and evaluating tasks for AI coding agents, focusing on real-world developer scenarios. The position is project-based and requires collaboration with AI to develop challenging tasks that assess the capabilities of coding models.

Qualifications

  • Degree in Computer Science, Software Engineering, or a related field
  • Over 5 years of experience in software development, primarily using Python (FastAPI, pytest, async/await, subprocess, file operations)
  • Background in full-stack development with experience in React-based interfaces (JavaScript/TypeScript) and robust back-end systems
  • Experience in writing tests (functional, integration)
  • Familiarity with Docker containers and infrastructure tools (Postgres, Kafka, Redis)
  • Understanding of CI/CD processes (GitHub Actions)
  • English proficiency at B2 level

Responsibilities

  • Create virtual companies with realistic development environments, including codebases and documentation
  • Assemble and calibrate tasks from intermediate states of the virtual company, ensuring tasks are solvable and evaluations are fair
  • Design isolated environments for task execution, including developer workstation emulations
  • Write tests that accurately evaluate solutions, ensuring they are neither too strict nor too lenient
  • Collaborate with AI agents to verify the effectiveness of tests and iterate based on feedback
  • Review code produced by agents, analyze outcomes, and design edge cases for testing

Skills

  • Proficiency in Python and experience with full-stack development
  • Strong testing and evaluation skills
  • Ability to work with AI models and understand their limitations

Education

  • Bachelor's degree or higher in a relevant field

Tools

  • Python, FastAPI, pytest, Docker, GitHub Actions, Postgres, Kafka, Redis
Full Access

Ready to apply for this role?

Full Access gives you the company name, full job description, and a direct link to apply. The summary above helps you explore the role.

Share this job