Freelance Agent Evaluation Engineer (Remote)

Denmark

Posted 12 hours, 48 minutes ago

Research and Development

About the role

Job summary

This role involves evaluating AI coding agents by creating realistic developer environments and tasks. You will be responsible for designing evaluation criteria and writing tests to verify agent solutions.

Qualifications

Minimum of 5 years of experience in software development.
Proficient in Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, and Redis.
Experience in writing functional and integration tests.
English proficiency at B2 level or higher.

Responsibilities

Build realistic developer environments that simulate a virtual company with a codebase and context.
Design tasks based on intermediate states of these environments, defining what constitutes a 'solved' task.
Write tests to verify the solutions provided by AI agents, ensuring a balance between strictness and leniency.
Iterate on tasks and tests based on quality assurance feedback, refining them to ensure fairness and robustness.

Skills

Strong understanding of AI model capabilities and limitations in coding tasks.
Ability to create challenging tasks that effectively evaluate AI performance.

Education

Relevant degree in Computer Science or a related field is preferred but not mandatory.

Tools

Familiarity with development tools and environments relevant to the core stack mentioned.

Full Access

Ready to apply for this role?

Full Access gives you the company name, full job description, and a direct link to apply. The summary above helps you explore the role.

Share this job

Browse more jobs

Research and Development jobs in Denmark

Full Access includes

Company name & profile
Full job description
Direct apply link
Unlimited job alerts

Similar jobs

View all Research and Development jobs

Freelance Agent Evaluation Engineer (Remote)

About the role

Job summary

Qualifications

Responsibilities

Skills

Education

Tools

Ready to apply for this role?

Similar jobs

PhD Intern (Cybersecurity, Aarhus)

Master's Thesis Collaborator (Healthcare Technology)

Lead ESG Analyst (Climate & Environment, Nordic Countries)

Senior Director, Analyst - Infrastructure & Operations AI Strategy

PhD Candidate in Uncertainty Quantification (On-site)