- Страна
- США
- Зарплата
- 216 700 $ – 303 400 $
Откликайтесь
на вакансии с ИИ

Senior Machine Learning Engineer, ML Training Platform
Превосходная вакансия в топовой технологической компании с прозрачной вилкой зарплаты, отличным соцпакетом и возможностью работать удаленно. Работа над высоконагруженной платформой мирового уровня делает эту роль крайне привлекательной для Senior-специалистов.
Сложность вакансии
Высокая сложность обусловлена требованиями к глубоким знаниям Kubernetes (написание контроллеров и операторов), опытом работы с GPU-инфраструктурой и распределенными системами на уровне Petabyte-scale. Роль требует редкого сочетания навыков ML-инженера и системного разработчика на Go.
Анализ зарплаты
Предлагаемая зарплата ($216k - $303k) находится на верхнем уровне рыночных ожиданий для Senior ML Infrastructure ролей в США, особенно учитывая дополнительные бонусы в виде RSU. Это соответствует уровню Tier-1 компаний (Big Tech).
Сопроводительное письмо
I am writing to express my strong interest in the Senior Machine Learning Engineer position within the ML Training Platform team at Reddit. With over five years of experience in platform engineering and a deep specialization in Kubernetes and GPU orchestration, I have consistently focused on building scalable, developer-centric infrastructure. My background in developing custom Kubernetes controllers and managing large-scale ML workflows aligns perfectly with Reddit's mission to evolve its MLE experience and streamline the 'Idea-to-Prototype' loop.
In my previous roles, I have successfully optimized distributed training environments and managed complex Jupyter ecosystems, ensuring high availability and resource efficiency. I am particularly drawn to Reddit's unique challenge of handling petabyte-scale data and the opportunity to treat internal MLEs as customers to reduce friction in their workflows. I am confident that my proficiency in Go and Python, combined with my passion for robust ML infrastructure, will allow me to make an immediate impact on your high-performing team.
Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в reddit уже сейчас
Присоединяйтесь к команде Reddit и создавайте инфраструктуру будущего для миллионов пользователей — откликайтесь прямо сейчас!
Описание вакансии
Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 121 million daily active unique visitors, Reddit is one of the internet’s largest sources of information. For more information, visit www.redditinc.com.
Who We Are: The Machine Learning Platform team at Reddit is a high-impact team that owns the infrastructure that powers recommendations, content discovery, user and content quantification, while directly impacting other teams such as Growth, Ads, Feeds, and Core Machine Learning teams.
What You’ll Do:
As a Senior Software Engineer, Machine Learning Platform (Training Platform), you will be instrumental in architecting, implementing, and maintaining foundational Machine Learning (ML) infrastructure that powers Feeds Ranking, Content Understanding, Recommendations and much more to fulfill Reddit’s mission of bringing community and belonging to everyone in the world. You will deliver a self service ML platform that enables the continuous iteration and improvement of systems that use ML techniques including Deep Learning, Natural Language Processing, Recommendation Systems, Representation Learning and Computer Vision.
- Lead the building, testing, and maintenance of ML training infrastructure at Reddit.
- Play a pivotal role in designing, building, and optimizing the infrastructure and tooling required to support large-scale machine learning workflows.
- Evolve the MLE experience, from provisioning interactive GPU environments through large-scale training, supporting on-demand and self-service workflows.
- Kubernetes Automation: Write custom Kubernetes Controllers and Operators to manage the lifecycle of interactive Jupyter workspaces and long-running ML training jobs, handle auto-idling, and ensure fault tolerance.
- GPU Orchestration: Work with the underlying compute team to ensure MLEs have efficient access to training hardware resources and handle resource contention gracefully.
- Developer Experience (DevX): Treat internal MLEs as your customers. Conduct user research, reduce friction in the "Idea-to-Prototype" loop, and standardize software environments (Docker images, Python dependency management).
Who You Might Be:
- 5+ years of software engineering experience, with a focus on Platform Engineering, ML Infrastructure, or Backend Systems.
- Deep Kubernetes Expertise: You know K8s beyond just "deploying pods." You understand CRDs, Controllers and the Operator pattern.
- Jupyter Ecosystem Knowledge: Experience customizing JupyterHub, JupyterLab extensions, or building similar interactive computing platforms.
- Strong Coding Skills: Proficiency in Python (for the ML ecosystem) and Go (for Kubernetes controllers/infrastructure tooling).
- GPU Experience: Hands-on practice with CUDA environments, GPU virtualization/containerization, and doing it all within Kubernetes.
- Cloud Provider Experience: Familiarity with both managed ML offerings (Vertex AI, Sagemaker, etc) and building custom ML components in AWS and/or GCP.
- Experience working with distributed training frameworks, including Ray and Kubernetes.
- Comfortable with distributed systems, big data (Petabyte scale) and data-intensive systems.
- Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle.
- Strong organizational & communication skills.
Benefits
- Comprehensive Healthcare Benefits and Income Replacement Programs
- 401k Match
- Family Planning Support
- Gender-Affirming Care
- Mental Health & Coaching Benefits
- Flexible Vacation & Reddit Global Days off
- Generous paid Parental Leave
- Paid Volunteer time off
#LI-Remote
Pay Transparency:
This job posting may span more than one career level.
In addition to base salary, this job is eligible to receive equity in the form of restricted stock units, and depending on the position offered, it may also be eligible to receive a commission. Additionally, Reddit offers a wide range of benefits to U.S.-based employees, including medical, dental, and vision insurance, 401(k) program with employer match, generous time off for vacation, and parental leave. To learn more, please visit https://www.redditinc.com/careers/.
To provide greater transparency to candidates, we share base salary ranges for all US-based job postings regardless of state. We set standard base pay ranges for all roles based on function, level, and country location, benchmarked against similar stage growth companies. Final offer amounts are determined by multiple factors including, skills, depth of work experience and relevant licenses/credentials, and may vary from the amounts listed below.
The base salary range for this position is:
$216,700—$303,400 USD
In select roles and locations, the interviews will be recorded, transcribed and summarized by artificial intelligence (AI). You will have the opportunity to opt out of recording, transcription and summarization prior to any scheduled interviews.
During the interview, we will collect the following categories of personal information: Identifiers, Professional and Employment-Related Information, Sensory Information (audio/video recording), and any other categories of personal information you choose to share with us. We will use this information to evaluate your application for employment or an independent contractor role, as applicable. We will not sell your personal information or disclose it to any third party for their marketing purposes. We will delete any recording of your interview promptly after making a hiring decision. For more information about how we will handle your personal information, including our retention of it, please refer to our Candidate Privacy Policy for Potential Employees and Contractors.
Reddit is proud to be an equal opportunity employer, and is committed to building a workforce representative of the diverse communities we serve. Reddit is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If, due to a disability, you need an accommodation during the interview process, please let your recruiter know.
Создайте идеальное резюме с помощью ИИ-агента

Навыки
- Kubernetes
- Python
- Go
- CUDA
- Docker
- Jupyter
- AWS
- GCP
- Ray
- Machine Learning
- Deep Learning
- Natural Language Processing
- Computer Vision
- Distributed Systems
Возможные вопросы на собеседовании
Вакансия требует написания кастомных контроллеров для управления жизненным циклом задач обучения.
Расскажите о вашем опыте разработки Kubernetes Operators или Custom Controllers: с какими основными сложностями вы сталкивались при управлении состоянием ресурсов?
Reddit ищет специалиста, способного эффективно распределять GPU-ресурсы между пользователями.
Как бы вы реализовали систему автоматического простоя (auto-idling) и квотирования GPU для интерактивных Jupyter-воркспейсов в кластере с высокой нагрузкой?
Упоминается работа с Ray и распределенным обучением.
Какие стратегии оптимизации производительности вы применяли при масштабировании распределенного обучения моделей (например, Deep Learning) на сотни узлов?
Позиция подразумевает фокус на Developer Experience (DevX).
Как вы подходите к сбору требований от внутренних ML-инженеров и какие метрики используете для оценки эффективности ML-платформы?
Требуется опыт работы с большими данными и распределенными системами.
Опишите ваш опыт работы с системами обработки данных петабайтного масштаба: как вы обеспечивали надежность и отказоустойчивость пайплайнов?
Похожие вакансии
Senior Data Scientist
Senior Data Scientist
Senior Machine Learning Scientist, Advertising
Senior Data Scientist
Senior Data Scientist
Senior/Staff Data Scientist, Infrastructure
1000+ офферов получено
Устали искать работу? Мы найдём её за вас
Quick Offer улучшит ваше резюме, подберёт лучшие вакансии и откликнется за вас. Результат — в 3 раза больше приглашений на собеседования и никакой рутины!
- Страна
- США
- Зарплата
- 216 700 $ – 303 400 $