Страна: США

+500% приглашений

Откликайтесь
на вакансии с ИИ

LeadУдалённоПолная занятость

Technical Lead - AI Inferences

Name: Quick Offer — сервис для поиска работы на hh.ru
Brand: Quick Offer
SKU: quick-offer-saas
Availability: InStock
Rating: 4.9 (682 reviews)

WEKA — это быстрорастущий стартап на стадии pre-IPO с огромным капиталом и инновационным продуктом. Позиция предлагает работу с передовым стеком технологий ИИ и возможность напрямую влиять на архитектуру продукта, что крайне привлекательно для опытных инженеров.

Вакансия из Quick Offer Global, списка международных компаний

Пожаловаться

Сложность вакансии

ЛегкоСложно

Роль требует редкого сочетания навыков: глубокого понимания архитектуры LLM (KV-кэширование, спекулятивное декодирование), опыта управления командой и экспертных знаний в системном программировании (C++/Rust/CUDA). Высокая сложность обусловлена необходимостью оптимизации производительности на уровне железа.

Анализ зарплаты

Медиана240 000 $

Рынок190 000 $ – 300 000 $

В объявлении не указана точная сумма, но для позиции Tech Lead в сфере AI инфраструктуры в США рыночные показатели значительно выше средних по IT. Ожидаемый диапазон для компаний уровня WEKA (pre-IPO, hyper-growth) составляет $200k-$280k базовой части плюс значительный пакет опционов.

I am writing to express my strong interest in the Technical Lead - AI Inferences position at WEKA. With a deep background in high-performance backend engineering and a specialized focus on LLM serving optimization, I have closely followed WEKA’s innovations in agentic AI data infrastructure. My experience in managing GPU workloads and implementing advanced techniques like continuous batching and KV cache reuse aligns perfectly with your mission to eliminate data silos and maximize GPU utilization.

In my previous roles, I have successfully led small, agile engineering teams while remaining deeply hands-on with the codebase. I have extensive experience with vLLM and CUDA, and I am particularly excited about the opportunity to work with LMCache and NIXL to push the boundaries of inference-as-a-service. I am confident that my technical leadership and passion for architecting low-latency, high-throughput systems will contribute significantly to WEKA's hyper-growth trajectory.

+250% к просмотрам

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в wekatest уже сейчас

Присоединяйтесь к WEKA, чтобы возглавить разработку инфраструктуры ИИ нового поколения и оптимизировать работу крупнейших LLM в мире!

Описание вакансии

WEKA is architecting a new approach to the enterprise data stack built for the age of reasoning. NeuralMesh by WEKAsets the standard for agentic AI data infrastructure with a cloud and AI-native software solution that can be deployed anywhere. It transforms legacy data silos into data pipelines that dramatically increase GPU utilization and make AI model training and inference, machine learning, and other compute-intensive workloads run faster, work more efficiently, and consume less energy.

WEKA is a pre-IPO, growth-stage company on a hyper-growth trajectory. We’ve raised $375M in capital with dozens of world-class venture capital and strategic investors. We help the world’s largest and most innovative enterprises and research organizations, including 12 of the Fortune 50, achieve discoveries, insights, and business outcomes faster and more sustainably. We’re passionate about solving our customers’ most complex data challenges to accelerate intelligent innovation and business value. If you share our passion, we invite you to join us on this exciting journey.

Requirements:

We are seeking a Technical Lead - AI Inferences to spearhead our AI Inference team. In this role, you will bridge the gap between complex research and production-grade engineering. You will lead a tight-knit squad of 3 developers while remaining "hands-on-keyboard," architecting high-performance systems that optimize Large Language Model (LLM) serving.

The ideal candidate is deeply invested in inference and scale,and the evolving ecosystem of serving frameworks like vLLM and LMCache.

Responsibilities include:

Technical Leadership: Architect and oversee the deployment of high-throughput, low-latency LLM inference pipelines.
Team Management: Mentor and lead a small team of developers, conducting code reviews, sprint planning, and technical career coaching.
Inference Optimization: Implement and evaluate state-of-the-art KV cache management solutions, including LMCache, and explore alternatives to minimize redundant computation.
Framework Mastery: Deeply integrate and optimize serving engines such as vLLM, LLM-d, and NIXL to maximize hardware utilization.
R&D: Stay at the forefront of the "Inference-as-a-Service" domain, benchmarking new tools and deciding when to pivot the stack.

AI Inference Domain: Proven experience with KV cache reuse, speculative decoding, and continuous batching.
Specific Stack: Deep familiarity with vLLM, LMCache, and NIXL. Understanding the trade-offs between centralized vs. distributed caching.
Backend Engineering: Expertise in Python, C++, or Rust, with a strong grasp of CUDA and GPU memory management.

Infrastructure: Experience with Kubernetes (K8s) for scaling GPU workloads and optimizing cold-start times.

The WEKA Way:

We are Accountable: We take full ownership, always–even when things don’t go as planned. We lead with integrity, show up with responsibility & ownership, and hold ourselves and each other to the highest standards.
We are Brave: We question the status quo, push boundaries, and take smart risks when needed. We welcome challenges and embrace debates as opportunities for growth, turning courage into fuel for innovation.
We are Collaborative: True collaboration isn’t only about working together. It’s about lifting one another up to succeed collectively. We are team-oriented and communicate with empathy and respect. We challenge each other and conduct positive conflict resolution. We are being transparent about our goals and results. And together, we’re unstoppable.
We are Customer Centric: Our customers are at the heart of everything we do. We actively listen and prioritize the success of our customers, and every decision we make is driven by how we can better serve, support, and empower them to succeed. When our customers win, we win.

USA Residents Only:The Total Compensation hiring wage range for this position which the Company reasonably and in good faith expects to pay for the position in the specified geographic areas or locations. Final compensation will be dependent on various factors relevant to the position and candidate such as geographical location, candidate qualifications, certifications, relevant job-related work experience, education, skillset and other relevant business and organizational factors, consistent with applicable law. In addition, the position may include some of the following comprehensive benefits such Medical, Dental, Vision, Life, 401(K), Flexible Time off (FTO), sick time, leave of absence as per the FMLA and other relevant leave laws.

Concerned that you don’t meet every qualification above?

Studies have shown that women and people of color may be less likely to apply for jobs if they don’t meet every qualification specified. At WEKA, we are committed to building a diverse, inclusive and authentic workplace. If you are excited about this position but are concerned that your past work experience doesn’t match up perfectly with the job description, we encourage you to apply anyway – you may be just the right candidate for this or other roles at WEKA.

WEKA is an equal opportunity employer that prohibits discrimination and harassment of any kind. We provide equal opportunities to all employees and applicants for employment without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.

+400% к собеседованиям

Создайте идеальное резюме с помощью ИИ-агента

Навыки

C++
Python
Rust
LLM
Kubernetes
CUDA
vLLM
GPU Computing
Inference Optimization
LMCache
NIXL

Возможные вопросы на собеседовании

Проверка глубокого понимания механизмов оптимизации памяти при работе с LLM.

Как бы вы реализовали эффективное совместное использование KV-кэша (KV cache reuse) для нескольких параллельных запросов в vLLM?

Оценка опыта работы с распределенными системами и понимания задержек.

В чем заключаются основные компромиссы между централизованным и распределенным кэшированием в контексте инференса больших моделей?

Проверка навыков управления GPU-ресурсами в облачной среде.

Какие стратегии вы используете для минимизации времени холодного старта (cold-start) GPU-контейнеров в Kubernetes?

Оценка лидерских качеств и умения балансировать между кодингом и управлением.

Как вы распределяете свое время между написанием кода и менторством команды из трех человек, чтобы не стать узким местом в архитектурных решениях?

Проверка знаний современных техник ускорения инференса.

Расскажите о вашем опыте внедрения спекулятивного декодирования: какие метрики производительности улучшились и с какими сложностями вы столкнулись?

Устали искать работу? Мы найдём её за вас

Quick Offer улучшит ваше резюме, подберёт лучшие вакансии и откликнется за вас. Результат — в 3 раза больше приглашений на собеседования и никакой рутины!

США

Откликайтесь
на вакансии с ИИ

Technical Lead - AI Inferences

Анализ зарплаты

Сопроводительное письмо

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в wekatest уже сейчас

Описание вакансии

Создайте идеальное резюме с помощью ИИ-агента

Навыки

Возможные вопросы на собеседовании

Как бы вы реализовали эффективное совместное использование KV-кэша (KV cache reuse) для нескольких параллельных запросов в vLLM?

В чем заключаются основные компромиссы между централизованным и распределенным кэшированием в контексте инференса больших моделей?

Какие стратегии вы используете для минимизации времени холодного старта (cold-start) GPU-контейнеров в Kubernetes?

Как вы распределяете свое время между написанием кода и менторством команды из трех человек, чтобы не стать узким местом в архитектурных решениях?

Расскажите о вашем опыте внедрения спекулятивного декодирования: какие метрики производительности улучшились и с какими сложностями вы столкнулись?

Похожие вакансии

Ai Tech Lead

Лид AI подсистем (Lead AI)

Lead AI Engineer — Java with Claude Code

Lead AI/ML Engineer

Senior / Lead AI Platform Engineer (RAG / Agents / Skills)

AI-native Tech Lead

Устали искать работу? Мы найдём её за вас

Откликайтесьна вакансии с ИИ

Technical Lead - AI Inferences

Анализ зарплаты

Сопроводительное письмо

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в wekatest уже сейчас

Описание вакансии

Создайте идеальное резюме с помощью ИИ-агента

Навыки

Возможные вопросы на собеседовании

Как бы вы реализовали эффективное совместное использование KV-кэша (KV cache reuse) для нескольких параллельных запросов в vLLM?

В чем заключаются основные компромиссы между централизованным и распределенным кэшированием в контексте инференса больших моделей?

Какие стратегии вы используете для минимизации времени холодного старта (cold-start) GPU-контейнеров в Kubernetes?

Как вы распределяете свое время между написанием кода и менторством команды из трех человек, чтобы не стать узким местом в архитектурных решениях?

Расскажите о вашем опыте внедрения спекулятивного декодирования: какие метрики производительности улучшились и с какими сложностями вы столкнулись?

Похожие вакансии

Ai Tech Lead

Лид AI подсистем (Lead AI)

Lead AI Engineer — Java with Claude Code

Lead AI/ML Engineer

Senior / Lead AI Platform Engineer (RAG / Agents / Skills)

AI-native Tech Lead

Устали искать работу? Мы найдём её за вас

Откликайтесь
на вакансии с ИИ