LLM – Full Stack (Python & JavaScript)

Noida, India

Location: Open to candidates across Latin America & West Africa
Experience Required: 6+ Years

We’re a coding-first research team working as a trusted partner for a Frontier AI Lab. Our mission is to build high-quality coding tasks, evaluations, datasets, and tooling that directly improve how large language models (LLMs) think, reason, and write code.

This is a hands-on engineering role where precision, correctness, and reproducibility truly matter. You’ll work on real production-grade code, investigate subtle model failures, and design rigorous evaluations that shape next-generation AI systems.

If you enjoy solving non-obvious technical problems, breaking systems to understand them, and working in developer-centric environments—this role is for you.

What You’ll Be Working On

  • Writing, reviewing, and debugging production-quality code across multiple languages
  • Designing coding, reasoning, and debugging tasks for LLM evaluation
  • Analyzing LLM outputs to identify hallucinations, regressions, and failure patterns
  • Building reproducible dev environments using Docker and automation tools
  • Developing scripts, pipelines, and tools for data generation, scoring, and validation
  • Producing structured annotations, judgments, and high-signal datasets
  • Running systematic evaluations to improve model reliability and reasoning
  • Collaborating closely with engineers, researchers, and quality owners

What We’re Looking For

Must-Have Skills

  • Strong hands-on coding experience (professional or research-based) in:
    Python
    JavaScript / Node.js / TypeScript
  • Experience using LLM coding tools (Cursor, Copilot, CodeWhisperer)
  • Solid knowledge of Linux, Bash, and scripting
  • Strong experience with Docker, dev containers, and reproducible environments
  • Advanced Git skills (branching, diffs, patches, conflict resolution)
  • Strong understanding of testing & QA (unit, integration, edge-case testing)
  • Ability to overlap reliably with 8:00 AM – 12:00 PM PT

Nice to Have

  • Experience with dataset creation, annotation, or evaluation pipelines
  • Familiarity with benchmarks like SWE-Bench or Terminal Bench
  • Background in QA automation, DevOps, ML systems, or data engineering
  • Experience with additional languages (Go, Java, C++, C#, Rust, SQL, R, Dart, etc.)

Who Will Thrive Here

  • Engineers who enjoy breaking things and understanding why
  • Builders who like designing tasks, running experiments, and debugging deeply
  • Detail-oriented developers who catch subtle bugs and model issues
  • Engineers who prefer clean, reusable workflows over quick hacks

Why Join Us?

  • Work directly on systems that improve state-of-the-art AI models
  • Solve unique, non-routine engineering problems
  • Collaborate with smart, quality-driven engineers and researchers
  • Build tools and datasets that have real impact at scale