Code-Data Eval Author — Machine Learning Engineer (Pilot)

Posted 10 days ago Hourly Remote English
Mercor
Apply on → Mercor
$45 – $140 per hour

Code-Data Eval Author — Machine Learning Engineer (Mercor · remote contract)

Mercor partners with frontier AI labs to build the evaluations their models are trained and measured against. You’ll design ML/LLM evaluation tasks and rubrics and grade model/agent outputs — your training-side knowledge directly shapes reward and eval signals.

What you’ll do

  • Design ML/LLM evaluation tasks, rubrics, and metrics
  • Grade model/agent outputs and improve eval quality through review
  • Bring training-side judgment (SFT / RLHF / reward modeling) to eval design

You are

  • ~5+ years as an MLE at a real product organization with hands-on training/fine-tuning and evals
  • Ideally fluent in SFT / RLHF / reward modeling / eval metrics (rare, high-leverage here)
  • PyTorch/JAX, Hugging Face, experiment tracking; clear written communication

Engagement & pay

  • Remote contract, flexible 30+ hrs/week
  • Hourly rate set to your local market (e.g., US/Canada $100–140/hr; Europe and LatAm scaled to region)

Hiring process — paid
A short Mercor Technical Screen, a live Code Review Session, and a Domain Expert Interview. You’re paid $200 for completing all three, regardless of outcome.

Compensation

  • Pay: $45 – $140/hour
  • Type: Hourly contract
  • Location: Remote — Americas & Europe

3 slots remaining.

Getting Started

New to Remote Gig Work?

No fluff, no theory. The First Month Playbook walks you through profile setup, landing your first client, and building a workflow that actually sticks.

Read the Playbook
New to Remote Gig Work?
Featured Platform

Start on Outlier AI

Outlier (by Scale AI) hires writers, coders, and subject experts for AI training tasks. Flexible hours, remote-first. Affiliate link — we may earn a commission.

Join Outlier
Start on Outlier AI