Страна: Канада

+500% приглашений

Откликайтесь
на вакансии с ИИ

LeadГибридПолная занятость

Senior Lead Research Scientist, Agentic AI

Name: Quick Offer — сервис для поиска работы на hh.ru
Brand: Quick Offer
SKU: quick-offer-saas
Availability: InStock
Rating: 4.9 (682 reviews)

Высокий балл обусловлен престижем компании Upwork, возможностью заниматься фундаментальной наукой (публикации в NeurIPS/ICML) одновременно с продуктовой разработкой, а также работой над самым актуальным стеком в области Agentic AI.

Вакансия из Quick Offer Global, списка международных компаний

Пожаловаться

Сложность вакансии

ЛегкоСложно

Роль требует редкого сочетания академической степени PhD, публикаций в топовых изданиях и практического опыта разработки высоконагруженных систем (MLOps/Production). Ожидается глубокая экспертиза в специфических методах обучения (RLEF, DPO) и автономных агентах.

Анализ зарплаты

Медиана220 000 $

Рынок180 000 $ – 260 000 $

Указанный диапазон (180k-260k USD) соответствует рыночным стандартам для позиций уровня Senior Lead Research Scientist в крупных технологических компаниях Северной Америки, учитывая высокую востребованность специалистов по LLM Agents.

I am writing to express my strong interest in the Senior Lead Research Scientist (Agentic AI) position at Upwork. With a deep background in autonomous systems and a proven track record of peer-reviewed publications in venues like NeurIPS and ICLR, I am particularly drawn to Upwork's unique 50/50 split between novel research and productionalization. My experience in developing Reinforcement Learning from Execution Feedback (RLEF) and building robust benchmarking suites for long-horizon tasks aligns perfectly with your mission to push the frontier of tool-using AI.

Throughout my career, I have focused on bridging the gap between theoretical ML models and reliable, scalable agentic platforms. I have extensive experience in implementing SFT, DPO, and RLHF pipelines, as well as designing sandboxed environments for agent evaluation. I am excited by the prospect of leading cross-functional initiatives at Upwork to translate complex research into intuitive, AI-enabled solutions for millions of users. I look forward to the possibility of discussing how my expertise in agentic reasoning and alignment can contribute to your team's success.

+250% к просмотрам

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в upwork уже сейчас

Присоединяйтесь к Upwork, чтобы определять будущее автономных агентов и внедрять передовые исследования в продукт мирового масштаба!

Описание вакансии

Upwork Inc.’s (Nasdaq: UPWK) family of companies connects businesses with global, AI-enabled talent across every contingent work type including freelance, fractional, and payrolled. This portfolio includes the Upwork Marketplace, which connects businesses with on-demand access to highly skilled talent across the globe, and Lifted, which provides a purpose-built solution for enterprise organizations to source, contract, manage, and pay talent across the full spectrum of contingent work. From Fortune 100 enterprises to entrepreneurs, businesses rely on Upwork Inc. to find and hire expert talent, leverage AI-powered work solutions, and drive business transformation. With access to professionals spanning more than 10,000 skills across AI & machine learning, software development, sales & marketing, customer support, finance & accounting, and more, the Upwork family of companies enables businesses of all sizes to scale, innovate, and transform their workforces for the age of AI and beyond.

Since its founding, Upwork Inc. has facilitated more than $30 billion in total transactions and services as it fulfills its purpose to create opportunity in every era of work. Learn more about the Upwork Marketplace atUpwork.com

We’re seeking a Senior Lead Research Scientist (Agentic AI) to push the frontier of autonomous, tool‑using AI and ensure that innovations make it into production. You’ll split your time between novel research (benchmarks, learning algorithms, publications, and thought leadership) and building the tools, datasets, and systems required to run rigorous experiments and ship results into our agentic platform. You will partner closely with ML engineers, product, platform, and safety teams to translate research into reliable, scalable capabilities for customers and developers on Upwork.

Responsibilities

50/50 Split between research and engineering/productionalization.
Advance agentic benchmarking. Define and maintain a rigorous evaluation suite for agents (task success, reliability, recovery, safety, latency, and cost). Establish protocols, datasets, and reproducible metrics aligned to best practices in agentic evaluation; continuously harden benchmarks against loopholes and overfitting.
Invent and publish. Lead novel studies on agent planning, tool use, reflection/memory, safety, and multi‑agent coordination. Publish at top venues (e.g., NeurIPS/ICML/ICLR/ACL) and present learnings internally and externally.
Explore RLEF for agents. Develop Reinforcement Learning from Execution Feedback (RLEF) approaches that ground agent behavior in environment/run‑time signals (e.g., execution traces, tool results, test outcomes), comparing to RLHF/RLAIF on agent tasks.
Continuous/online learning. Design safe, measurable loops for continual improvement (data selection, drift detection, reward model updates, policy refresh), with guardrails that protect quality and cost.
Human‑in‑the‑loop systems. Partner on data strategy, labeling protocols, and reviewer tooling for RLHF and workflow‑level judgment; instrument quality controls and reviewer calibration.
Build research tooling. Stand up agents‑at‑scale experiment infrastructure: simulators, sandboxes, and orchestration for long‑horizon tasks; evaluation harnesses; offline/online A/B; and dashboards for longitudinal tracking.
Train & align models. Implement high‑quality pipelines for SFT, DPO, RLHF/RLAIF/RLEF; manage data provenance, safety filters, and automated red‑teaming; integrate eval signals into CI/CD.
Ship to production. Collaborate with platform teams to graduate prototypes into reliable services (APIs/SDKs, auth, observability, rate limiting) and to integrate agents with developer protocols (e.g., MCP) and runtime services.

What it takes to catch our eye

PhD or equivalent research track record with peer‑reviewed publications in relevant venues; strong empirical methodology and scientific writing/presentation skills.
Demonstrated contributions to agentic evaluation/benchmarks or long‑horizon reasoning (e.g., designing tasks, metrics, robust protocols).
Hands‑on experience adapting LLMs for tool use and multi‑step plans; fluent in prompting, function/tool calling, and memory/critique patterns.
Practical mastery of alignment methods (SFT, DPO, RLHF, RLAIF, and RLEF) and reward‑modeling; you know when to prefer each and how to evaluate them.
Proficiency in Python and one or more of PyTorch/JAX; experience with distributed training (e.g., DDP/Ray), dataset curation, experiment tracking, and reproducibility.
Ability to build research‑grade tools that evolve into production‑grade services (APIs/SDKs, data stores, streaming/messaging, tracing/metrics).
Comfortable building end‑to‑end eval pipelines (offline + online), defining pass/fail gates, and quantifying trade‑offs (quality, safety, latency, cost).
Experience with safety testing and red‑teaming for agents; familiarity with risk taxonomies for autonomous systems.
Proven success mentoring senior ICs, leading cross‑functional initiatives, and educating internal/external audiences (talks, tutorials, blog posts, open‑source).

Come change how the world works.

Upwork is establishing its first international operational hub in Lisbon, Portugal. The new office is expected to be fully operational by Q4 2026.

This position will initially be employed through a partner to ensure a seamless hiring process while we establish the hub. Once the hub is established, there may be opportunities to transition to employment with Upwork depending on business needs and other requirements. While employed by the partner, you’ll work as part of Upwork’s team, with access to our resources, culture, and growth opportunities.

Our partner will offer competitive benefits. When Upwork’s hub is established, we will be excited to offer employment and benefits directly as business needs require.

Upwork is committed to building a diverse, inclusive, and equitable workforce. Employment decisions are made without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, disability, or any other status protected by applicable law.

To learn more about how Upwork processes and protects your personal information as part of the application process, please review our Global Job Applicant Privacy Notice

+400% к собеседованиям

Создайте идеальное резюме с помощью ИИ-агента

Навыки

Python
PyTorch
JAX
LLM
Reinforcement Learning
RLHF
DPO
SFT
Ray
Distributed Training
Agentic AI
Machine Learning
MLOps
API Design

Возможные вопросы на собеседовании

Проверка глубины понимания специфики Agentic AI и умения оценивать качество работы агентов.

Как бы вы спроектировали систему оценки для агента, работающего с долгосрочным планированием, чтобы избежать переобучения под конкретные бенчмарки?

Вакансия делает упор на RLEF как на ключевое направление.

В чем основные сложности внедрения Reinforcement Learning from Execution Feedback (RLEF) по сравнению с классическим RLHF в контексте использования инструментов (tool use)?

Роль предполагает 50% времени на инженерию и внедрение в продакшн.

Опишите ваш опыт перевода исследовательской модели в стабильный API-сервис: с какими проблемами задержки (latency) и стоимости вы сталкивались?

Агенты могут совершать непредсказуемые действия в среде.

Какие методы Red-Teaming и защитные барьеры (guardrails) вы считаете наиболее эффективными для автономных систем с доступом к внешним инструментам?

Позиция уровня Senior Lead требует навыков наставничества.

Расскажите о случае, когда вам нужно было убедить кросс-функциональную команду (Product/Engineering) принять сложное исследовательское решение. Как вы аргументировали свою позицию?

Устали искать работу? Мы найдём её за вас

Quick Offer улучшит ваше резюме, подберёт лучшие вакансии и откликнется за вас. Результат — в 3 раза больше приглашений на собеседования и никакой рутины!

Канада

Откликайтесь
на вакансии с ИИ

Senior Lead Research Scientist, Agentic AI

Анализ зарплаты

Сопроводительное письмо

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в upwork уже сейчас

Описание вакансии

Responsibilities

What it takes to catch our eye

Come change how the world works.

Создайте идеальное резюме с помощью ИИ-агента

Навыки

Возможные вопросы на собеседовании

Как бы вы спроектировали систему оценки для агента, работающего с долгосрочным планированием, чтобы избежать переобучения под конкретные бенчмарки?

В чем основные сложности внедрения Reinforcement Learning from Execution Feedback (RLEF) по сравнению с классическим RLHF в контексте использования инструментов (tool use)?

Опишите ваш опыт перевода исследовательской модели в стабильный API-сервис: с какими проблемами задержки (latency) и стоимости вы сталкивались?

Какие методы Red-Teaming и защитные барьеры (guardrails) вы считаете наиболее эффективными для автономных систем с доступом к внешним инструментам?

Похожие вакансии

Lead Research Engineer

MLOps Engineer (Lead)

Tech Lead NLP Engineer

AI/ML Lead

Operations & Automation Lead

AI Team Lead / Руководитель AI-проектов

Устали искать работу? Мы найдём её за вас

Откликайтесьна вакансии с ИИ

Senior Lead Research Scientist, Agentic AI

Анализ зарплаты

Сопроводительное письмо

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в upwork уже сейчас

Описание вакансии

Responsibilities

What it takes to catch our eye

Come change how the world works.

Создайте идеальное резюме с помощью ИИ-агента

Навыки

Возможные вопросы на собеседовании

Как бы вы спроектировали систему оценки для агента, работающего с долгосрочным планированием, чтобы избежать переобучения под конкретные бенчмарки?

В чем основные сложности внедрения Reinforcement Learning from Execution Feedback (RLEF) по сравнению с классическим RLHF в контексте использования инструментов (tool use)?

Опишите ваш опыт перевода исследовательской модели в стабильный API-сервис: с какими проблемами задержки (latency) и стоимости вы сталкивались?

Какие методы Red-Teaming и защитные барьеры (guardrails) вы считаете наиболее эффективными для автономных систем с доступом к внешним инструментам?

Похожие вакансии

Lead Research Engineer

MLOps Engineer (Lead)

Tech Lead NLP Engineer

AI/ML Lead

Operations & Automation Lead

AI Team Lead / Руководитель AI-проектов

Устали искать работу? Мы найдём её за вас

Откликайтесь
на вакансии с ИИ