Freelance Agent Evaluation Engineer (Remote, AI Development)

Denmark
Posted 3 weeks, 1 day ago
Engineering

About the role

Job summary

This role involves creating and evaluating tasks for AI coding agents within simulated environments, aimed at assessing their performance on real-world developer tasks. The position is project-based and not a permanent employment opportunity.

Qualifications

  • Degree in Computer Science, Software Engineering, or a related field
  • Over 5 years of experience in software development, primarily using Python (FastAPI, pytest, async/await, subprocess, file operations)
  • Background in full-stack development, with experience in building React-based interfaces (JavaScript/TypeScript) and robust back-end systems
  • Experience in writing functional and integration tests
  • Familiarity with Docker containers and infrastructure tools (Postgres, Kafka, Redis)
  • Understanding of CI/CD processes (GitHub Actions)
  • English proficiency at B2 level

Responsibilities

  • Develop virtual companies that simulate realistic development environments, including codebase and infrastructure
  • Create and calibrate tasks from intermediate states of the virtual company, ensuring tasks are solvable and evaluations are fair
  • Design isolated environments that emulate a developer's workstation, including necessary development tools
  • Write tests that accurately evaluate solutions, ensuring they are neither too strict nor too lenient
  • Collaborate with AI agents to verify the effectiveness of tests and iterate based on feedback
  • Review code produced by agents and analyze performance, designing edge cases and adversarial scenarios

Skills

  • Strong coding and testing skills across the software development stack
  • Ability to reason about code and understand where AI models may fail
  • Experience in creating complex evaluation criteria for AI performance

Education

  • Bachelor's degree or higher in a relevant field

Tools

  • Python, FastAPI, pytest, Docker, Postgres, Kafka, Redis, GitHub Actions
Full Access

Ready to apply for this role?

Full Access gives you the company name, full job description, and a direct link to apply. The summary above helps you explore the role.

Share this job