- Страна
- США
- Зарплата
- 216 700 $ – 303 400 $
Откликайтесь
на вакансии с ИИ

Senior Machine Learning Systems Engineer
Высокая заработная плата, работа в одной из крупнейших технологических компаний мира и возможность решать уникальные задачи на стыке ML и Big Data. Удаленный формат работы в США и отличный пакет льгот делают вакансию очень привлекательной.
Сложность вакансии
Роль требует глубоких знаний в области MLOps, распределенного обучения и оптимизации GPU. Необходимость работы с графовыми нейронными сетями (GNN) и огромными объемами данных (миллиарды узлов) делает позицию крайне сложной и ответственной.
Анализ зарплаты
Предлагаемая зарплата ($216k - $303k) находится на верхнем уровне рыночных ожиданий для Senior ML Infrastructure ролей в США, особенно учитывая удаленный формат. Это соответствует уровню Tier-1 технологических компаний.
Сопроводительное письмо
I am writing to express my strong interest in the Senior Machine Learning Systems Engineer position at Reddit. With over five years of experience in building scalable ML infrastructure and a deep focus on MLOps, I have consistently delivered platforms that streamline the model lifecycle from data preparation to distributed training. My background in optimizing GPU performance and managing large-scale data pipelines aligns perfectly with Reddit's mission to enhance content discovery and user engagement through advanced machine learning.
I am particularly drawn to this role because of the opportunity to work on graph ML at an immense scale. Having worked with distributed frameworks like Ray and Kubernetes, I am excited about the challenge of architecting pipelines for billions of nodes and edges. I am confident that my technical expertise in PyTorch and cloud-based infrastructure, combined with my passion for building developer-centric platforms, will allow me to make a significant impact on the Machine Learning Platform team at Reddit.
Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в reddit уже сейчас
Присоединяйтесь к команде Reddit и создавайте инфраструктуру для рекомендаций, которыми пользуются миллионы людей по всему миру!
Описание вакансии
Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 121 million daily active unique visitors, Reddit is one of the internet’s largest sources of information. For more information, visit www.redditinc.com.
Who We Are: The Machine Learning Platform team at Reddit is a high-impact team that owns the infrastructure that powers recommendations, content discovery, user and content quantification, while directly impacting other teams such as Growth, Ads, Feeds, and Core Machine Learning teams.
What You’ll Do:As a Senior ML Infrastructure Engineer, you will lead development of a platform for large scale ML models at Reddit.
- Design end-to-end model lifecycle patterns (MLOps) to boost velocity of development for ML engineers, including data preparation, model management, experiment tracking, and more
- Zero-to-one development and support of a graph ML codebase and platform that abstracts away common patterns and enables greater model scalability and iteration
- Collaborate with ML engineers on performance tuning, including improving model training time, efficiency, and GPU training costs in a large, distributed ML training environment
- Optimize batch data processing within a data warehouse and with tools such as Apache Beam, Apache Spark, Ray Data, and more
- Architect pipelines to build and maintain massive graph data structures on the order of billions of nodes and tens of billions of edges
Who You Might Be:
- 5+ years of experience in ML infrastructure, including model training and model deployments
- Hands-on experience with ML optimization, including memory and GPU profiling
- Deep experience with cloud-based technologies for supporting an ML platform, including tools like GCP BigQuery, Google Cloud Storage, infrastructure-as-code (Terraform), and more
- Hands-on experience administering and integrating MLOps tools for experiment tracking, model serving, and model registries (e.g. MLflow or Wandb)
- Proficiency with the common programming languages and frameworks of ML, such as Python, PyTorch, Tensorflow, etc.
- Deep experience working with distributed training frameworks, including Ray and Kubernetes
- Strong focus on scalability, reliability, performance, and ease of use. You are an undying advocate for platform users and have a deep intuition for the machine learning development lifecycle.
- Strong organizational & communication skills
- Experience working with graph databases (Neo4j, JanusGraph, TigerGraph) is a big plus
- Experience working with graph neural networks (GNNs) and associated graph ML frameworks (PyTorch Geometric, Deep Graph Library) is a big plus
Pay Transparency:
This job posting may span more than one career level.
In addition to base salary, this job is eligible to receive equity in the form of restricted stock units, and depending on the position offered, it may also be eligible to receive a commission. Additionally, Reddit offers a wide range of benefits to U.S.-based employees, including medical, dental, and vision insurance, 401(k) program with employer match, generous time off for vacation, and parental leave. To learn more, please visit https://www.redditinc.com/careers/.
To provide greater transparency to candidates, we share base salary ranges for all US-based job postings regardless of state. We set standard base pay ranges for all roles based on function, level, and country location, benchmarked against similar stage growth companies. Final offer amounts are determined by multiple factors including, skills, depth of work experience and relevant licenses/credentials, and may vary from the amounts listed below.
The base salary range for this position is:
$216,700—$303,400 USD
In select roles and locations, the interviews will be recorded, transcribed and summarized by artificial intelligence (AI). You will have the opportunity to opt out of recording, transcription and summarization prior to any scheduled interviews.
During the interview, we will collect the following categories of personal information: Identifiers, Professional and Employment-Related Information, Sensory Information (audio/video recording), and any other categories of personal information you choose to share with us. We will use this information to evaluate your application for employment or an independent contractor role, as applicable. We will not sell your personal information or disclose it to any third party for their marketing purposes. We will delete any recording of your interview promptly after making a hiring decision. For more information about how we will handle your personal information, including our retention of it, please refer to our Candidate Privacy Policy for Potential Employees and Contractors.
Reddit is proud to be an equal opportunity employer, and is committed to building a workforce representative of the diverse communities we serve. Reddit is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If, due to a disability, you need an accommodation during the interview process, please let your recruiter know.
Создайте идеальное резюме с помощью ИИ-агента

Навыки
- Python
- PyTorch
- TensorFlow
- MLOps
- Kubernetes
- Ray
- GCP
- BigQuery
- Terraform
- Apache Spark
- Apache Beam
- MLflow
- Weights & Biases
- GPU Optimization
- Graph Neural Networks
- Distributed Training
Возможные вопросы на собеседовании
Проверка опыта работы с инструментами, указанными в описании вакансии.
Расскажите о вашем опыте внедрения MLOps-инструментов, таких как MLflow или WandB, в крупномасштабные системы. С какими трудностями вы столкнулись?
Вакансия предполагает работу с огромными графовыми структурами.
Как бы вы спроектировали конвейер данных для обработки графа с 10 миллиардами ребер, обеспечив при этом низкую задержку при обучении моделей?
Оптимизация ресурсов — ключевая задача для Senior инженера.
Какие стратегии вы используете для профилирования и оптимизации использования памяти GPU при обучении тяжелых моделей в распределенной среде?
Reddit использует Ray и Kubernetes для масштабирования.
Опишите ваш опыт работы с Ray для распределенного обучения. Как вы решаете проблемы отказоустойчивости в таких кластерах?
Позиция подразумевает лидерство в разработке платформы.
Как вы балансируете между созданием гибких инструментов для ML-инженеров и поддержанием строгих стандартов надежности и производительности платформы?
Похожие вакансии
Middle, Middle+, Senior GenAI/LLM Разработчик
Middle / Senior GenAI Engineer (CV)
Senior / Lead LLM Engineer
Senior Computer Vision Engineer
AI Platform Engineer (RAG/Agents/Skills)
GenAI Engineer (LLMs · RAG · ML Systems) — Senior
1000+ офферов получено
Устали искать работу? Мы найдём её за вас
Quick Offer улучшит ваше резюме, подберёт лучшие вакансии и откликнется за вас. Результат — в 3 раза больше приглашений на собеседования и никакой рутины!
- Страна
- США
- Зарплата
- 216 700 $ – 303 400 $