Страна: США
Зарплата: 150 000 $ – 300 000 $

+500% приглашений

Откликайтесь
на вакансии с ИИ

ГибридПолная занятость

Research Engineer - RL Infrastructure

Name: Quick Offer — сервис для поиска работы на hh.ru
Brand: Quick Offer
SKU: quick-offer-saas
Availability: InStock
Rating: 4.9 (682 reviews)

Это уникальная возможность работать над передовыми технологиями в области Open AI Infrastructure с очень высокой компенсацией. Компания предлагает работу над амбициозными задачами, гибкий формат и поддержку релокации.

Вакансия из Quick Offer Global, списка международных компаний

Пожаловаться

Сложность вакансии

ЛегкоСложно

Роль требует исключительных знаний в области системного программирования, архитектуры GPU и распределенного обучения. Кандидату необходимо уметь оптимизировать ядра CUDA/Triton и решать сложные задачи на стыке софта и железа.

Анализ зарплаты

Медиана225 000 $

Рынок160 000 $ – 320 000 $

Предлагаемый диапазон $150k–300k полностью соответствует и даже местами превышает рыночные стандарты для Senior/Staff Research Engineer в Сан-Франциско. Верхняя граница в $300k (без учета опционов) характерна для топовых AI-стартапов и Tier-1 тех-гигантов.

I am writing to express my strong interest in the Research Engineer - RL Infrastructure position at PrimeIntellect. With a deep background in distributed systems and ML performance optimization, I have spent my career pushing the boundaries of what hardware can achieve in large-scale training scenarios. My experience with PyTorch Distributed, DeepSpeed, and custom CUDA kernel development aligns perfectly with your mission to build an open superintelligence stack that unifies globally distributed compute.

I am particularly drawn to PrimeIntellect’s focus on the systems layer of RL post-training. Having worked on optimizing communication overhead and memory efficiency in multi-node GPU clusters, I understand the complexities of scaling async RL trainers and verifiable evaluations. I am eager to bring my expertise in hardware-aware optimization and distributed workloads to help PrimeIntellect make frontier-scale model training faster, cheaper, and more accessible to the global community.

+250% к просмотрам

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в PrimeIntellect уже сейчас

Присоединяйтесь к PrimeIntellect, чтобы создавать инфраструктуру для открытого суперинтеллекта и оптимизировать обучение моделей на пределе возможностей железа!

Описание вакансии

Building Open Superintelligence Infrastructure

Prime Intellect is building the open superintelligence stack: from frontier agentic models to the infrastructure that enables anyone to train, adapt, and deploy them.

We unify globally distributed compute into a single control plane and pair it with the full reinforcement learning post-training stack: environments, secure sandboxes, verifiable evaluations, and our async RL trainer. We enable researchers, startups, and enterprises to run end-to-end RL at frontier scale, adapting models to real tools, workflows, and deployment environments.

We are looking for a Research Engineer to work on the systems layer behind large-scale RL training. This role is for someone who enjoys going deep on performance: optimizing kernels, improving memory and communication efficiency, scaling distributed workloads, and pushing the throughput and reliability of training systems closer to hardware limits.

If you care about making large-scale model training faster, cheaper, and more robust, we’d love to talk.

What You’ll Work On

Build and optimize the systems infrastructure behind large-scale RL and distributed training workloads.
Improve end-to-end training efficiency across compute, memory, networking, and scheduling layers.
Design and implement low-level performance optimizations, including kernels, communication paths, and runtime improvements.
Work on distributed training systems spanning data, tensor, and pipeline parallel workloads.
Help shape the architecture of our RL training stack, including async rollout and post-training systems.
Contribute to open-source libraries and internal infrastructure used for frontier-scale model training.
Collaborate closely with researchers and infrastructure engineers to translate bottlenecks into concrete systems improvements.
Stay at the frontier of training systems, inference systems, compiler/runtime tooling, and hardware-aware optimization techniques.

You May Be a Fit If You Have

Strong systems engineering experience in AI/ML infrastructure, especially around large-scale model training or inference.
Deep familiarity with PyTorch and distributed training frameworks such as PyTorch Distributed, DeepSpeed, FSDP, Megatron, vLLM, Ray, or related tooling.
Experience optimizing training performance across kernels, memory movement, communication overhead, or parallelization strategy.
Hands-on experience with large-scale training techniques including data parallelism, tensor parallelism, and pipeline parallelism.
Strong understanding of GPU architecture, profiling, and performance debugging.
Ability to identify bottlenecks across the stack and drive improvements from first principles.
Comfort working in a fast-moving environment with ambiguous problems and high ownership.

Especially Exciting

Experience writing or optimizing CUDA / Triton kernels.
Experience with compiler or runtime optimization for ML systems.
Experience working on RL training infrastructure, rollout systems, or asynchronous training pipelines.
Experience with multi-node GPU clusters and high-performance networking.
Contributions to open-source ML systems or infrastructure projects.
Interest in publishing technical work or sharing insights through engineering blogs and technical writing.

Why This Role Matters

The next frontier in AI will not be unlocked by models alone. It will be unlocked by systems that let those models train faster, adapt continuously, and operate across real environments at scale.

That infrastructure does not exist yet in the form the world needs.

We’re building it.

Benefits & Perks

Cash Compensation Range of $150-300k, plus equity.
Flexible work arrangements, with the option to work remotely or in person from our San Francisco office.
Visa sponsorship and relocation support for international candidates.
Quarterly team offsites, hackathons, conferences, and learning opportunities.
A deeply technical, high-agency team working on infrastructure for open superintelligence.

If you’re excited about building the systems foundation for frontier-scale RL and open superintelligence, we’d love to hear from you.

+400% к собеседованиям

Создайте идеальное резюме с помощью ИИ-агента

Навыки

C++
Python
PyTorch
Ray
CUDA
Performance Optimization
Reinforcement Learning
FSDP
GPU Architecture
Distributed Training
Triton
vLLM
DeepSpeed
Megatron-LM

Возможные вопросы на собеседовании

Проверка понимания фундаментальных ограничений при масштабировании обучения.

Как бы вы подошли к минимизации накладных расходов на коммуникацию (communication overhead) при использовании Pipeline Parallelism на распределенном кластере с неоднородной сетью?

Оценка практического опыта работы с низкоуровневой оптимизацией.

Опишите ваш опыт написания или оптимизации кастомных ядер на CUDA или Triton. С какими основными узкими местами в памяти вы сталкивались?

Проверка знаний современных фреймворков для работы с большими моделями.

В каких сценариях вы бы предпочли использовать FSDP вместо DeepSpeed ZeRO-3, и как это влияет на пропускную способность обучения?

Оценка понимания специфики RL-инфраструктуры.

Какие основные сложности возникают при построении систем асинхронного сбора данных (rollout) для RL по сравнению со стандартным Supervised Fine-Tuning?

Проверка навыков отладки производительности.

Какие инструменты и метрики вы используете для профилирования использования памяти GPU и выявления причин падения производительности (stalls) в распределенных системах?

Устали искать работу? Мы найдём её за вас

Quick Offer улучшит ваше резюме, подберёт лучшие вакансии и откликнется за вас. Результат — в 3 раза больше приглашений на собеседования и никакой рутины!

Страна: США
Зарплата: 150 000 $ – 300 000 $

Откликайтесь
на вакансии с ИИ

Research Engineer - RL Infrastructure

Анализ зарплаты

Сопроводительное письмо

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в PrimeIntellect уже сейчас

Описание вакансии

Building Open Superintelligence Infrastructure

What You’ll Work On

You May Be a Fit If You Have

Especially Exciting

Why This Role Matters

Benefits & Perks

Создайте идеальное резюме с помощью ИИ-агента

Навыки

Возможные вопросы на собеседовании

Как бы вы подошли к минимизации накладных расходов на коммуникацию (communication overhead) при использовании Pipeline Parallelism на распределенном кластере с неоднородной сетью?

Опишите ваш опыт написания или оптимизации кастомных ядер на CUDA или Triton. С какими основными узкими местами в памяти вы сталкивались?

В каких сценариях вы бы предпочли использовать FSDP вместо DeepSpeed ZeRO-3, и как это влияет на пропускную способность обучения?

Какие основные сложности возникают при построении систем асинхронного сбора данных (rollout) для RL по сравнению со стандартным Supervised Fine-Tuning?

Похожие вакансии

Junior AI Engineer

AI Engineer (Agents)

Senior Python AI Developer

Ai Tech Lead

AI-разработчик / вайбкодер

Инженер по искусственному интеллекту

Устали искать работу? Мы найдём её за вас

Откликайтесьна вакансии с ИИ

Research Engineer - RL Infrastructure

Анализ зарплаты

Сопроводительное письмо

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в PrimeIntellect уже сейчас

Описание вакансии

Building Open Superintelligence Infrastructure

What You’ll Work On

You May Be a Fit If You Have

Especially Exciting

Why This Role Matters

Benefits & Perks

Создайте идеальное резюме с помощью ИИ-агента

Навыки

Возможные вопросы на собеседовании

Как бы вы подошли к минимизации накладных расходов на коммуникацию (communication overhead) при использовании Pipeline Parallelism на распределенном кластере с неоднородной сетью?

Опишите ваш опыт написания или оптимизации кастомных ядер на CUDA или Triton. С какими основными узкими местами в памяти вы сталкивались?

В каких сценариях вы бы предпочли использовать FSDP вместо DeepSpeed ZeRO-3, и как это влияет на пропускную способность обучения?

Какие основные сложности возникают при построении систем асинхронного сбора данных (rollout) для RL по сравнению со стандартным Supervised Fine-Tuning?

Похожие вакансии

Junior AI Engineer

AI Engineer (Agents)

Senior Python AI Developer

Ai Tech Lead

AI-разработчик / вайбкодер

Инженер по искусственному интеллекту

Устали искать работу? Мы найдём её за вас

Откликайтесь
на вакансии с ИИ