Freelance Agent Evaluation Engineer (Remote)

Denmark

Posted 3 weeks, 4 days ago

Engineering

About the role

Job summary

This role involves creating and evaluating tasks for AI coding agents within simulated developer environments. The focus is on building realistic scenarios that challenge AI models in coding tasks.

Qualifications

Minimum of 5 years of experience in software development.
Proficiency in English at B2 level or higher.

Responsibilities

Develop realistic developer environments, including codebases and contextual elements.
Design tasks and evaluation criteria that define successful outcomes for AI agents.
Write tests to verify the correctness of agent solutions, ensuring a balance between strictness and leniency.
Iterate on tasks and tests based on quality assurance feedback to enhance evaluation fairness and robustness.

Skills

Compensation

Strong knowledge of Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, and Redis.
Experience in writing functional and integration tests.
Up to $50 per hour, depending on experience and task completion pace. Tasks are estimated to take around 20 hours each, with flexible scheduling.

Full Access

Ready to apply for this role?

Full Access gives you the company name, full job description, and a direct link to apply. The summary above helps you explore the role.

Share this job

Browse more jobs

Engineering jobs in Denmark

Full Access includes

Company name & profile
Full job description
Direct apply link
Unlimited job alerts

Similar jobs

View all Engineering jobs

Freelance Agent Evaluation Engineer (Remote)

About the role

Job summary

Qualifications

Responsibilities

Skills

Ready to apply for this role?

Similar jobs

Data Center Engineer (Amsterdam)

Senior Backend Engineer (Remote)

Senior Infrastructure Engineer (Remote)

Senior Embedded Software Manager

Cloud Engineer (On-site)