Freelance Agent Evaluation Engineer (Remote)

Denmark

Posted 1 week, 4 days ago

Engineering

About the role

Job summary

This role involves creating and evaluating tasks for AI coding agents within simulated developer environments. The focus is on building realistic scenarios that challenge AI models and ensuring fair evaluation criteria.

Qualifications

Minimum of 5 years of experience in software development.
Proficiency in English at B2 level or higher.

Responsibilities

Develop realistic developer environments, including codebases and contextual elements.
Design tasks based on intermediate states of these environments, defining success criteria.
Write tests to verify the solutions provided by AI agents, ensuring a balance between strictness and leniency.
Iterate on tasks and tests based on quality assurance feedback to refine evaluations.

Skills

Compensation

Strong knowledge of Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, and Redis.
Experience in writing functional and integration tests.
Up to $50 per hour, based on experience and task complexity. Tasks are estimated to take around 20 hours each, with flexible scheduling.

Full Access

Ready to apply for this role?

Full Access gives you the company name, full job description, and a direct link to apply. The summary above helps you explore the role.

Share this job

Browse more jobs

Engineering jobs in Denmark

Full Access includes

Company name & profile
Full job description
Direct apply link
Unlimited job alerts

Similar jobs

View all Engineering jobs

Freelance Agent Evaluation Engineer (Remote)

About the role

Job summary

Qualifications

Responsibilities

Skills

Ready to apply for this role?

Similar jobs

Data Center Engineer (Amsterdam)

Senior Backend Engineer (Remote)

Senior Infrastructure Engineer (Remote)

Senior Embedded Software Manager

Cloud Engineer (On-site)