Страна: США
Зарплата: 230 000 $ – 270 000 $

+500% приглашений

Откликайтесь
на вакансии с ИИ

ГибридПолная занятость

Staff Reliability Engineer - Robinhood Command Center

Name: Quick Offer — сервис для поиска работы на hh.ru
Brand: Quick Offer
SKU: quick-offer-saas
Availability: InStock
Rating: 4.9 (682 reviews)

Исключительная позиция в топовой финтех-компании с прозрачной и высокой компенсацией. Роль предлагает реальное влияние на архитектуру и процессы, отличный пакет льгот и возможность работать над уникальными задачами в масштабе всей индустрии.

Вакансия из Quick Offer Global, списка международных компаний

Пожаловаться

Сложность вакансии

ЛегкоСложно

Роль уровня Staff в Robinhood требует не только глубоких технических знаний распределенных систем, но и исключительных лидерских качеств для управления инцидентами в реальном времени. Высокая планка ответственности за финансовую стабильность платформы и необходимость работы в гибридном режиме в Нью-Йорке делают эту позицию крайне сложной.

Анализ зарплаты

Медиана250 000 $

Рынок220 000 $ – 285 000 $

Предлагаемый диапазон $230,000 — $270,000 полностью соответствует рыночным ожиданиям для позиции Staff-уровня в Нью-Йорке. С учетом бонусов и опционов, совокупный доход может значительно превышать средние показатели по рынку для аналогичных ролей в Tier-1 компаниях.

I am writing to express my strong interest in the Staff Reliability Engineer position within the Robinhood Command Center. With over 8 years of experience in software engineering and a deep focus on reliability and incident leadership, I am drawn to the opportunity to be a founding member of this critical team. My background in managing high-severity incidents and driving measurable improvements in MTTR aligns perfectly with your mission to democratize finance through a resilient and high-performing platform.

Throughout my career, I have specialized in building observability frameworks and multi-region failover strategies that ensure system availability during peak volatility. I am particularly impressed by Robinhood's commitment to operational excellence and the proactive approach of the RCC in defining global dashboards for critical user journeys. I am eager to bring my expertise in OpenTelemetry and incident governance to help scale Robinhood’s infrastructure and mentor the next generation of reliability engineers in your New York office.

+250% к просмотрам

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в robinhood уже сейчас

Присоединяйтесь к элитному подразделению Robinhood и станьте архитектором надежности в эпицентре крупнейшего финансового сдвига в истории!

Описание вакансии

Join us in building the future of finance.

Our mission is to democratize finance for all. An estimated $124 trillion of assets will be inherited by younger generations in the next two decades. The largest transfer of wealth in human history. If you’re ready to be at the epicenter of this historic cultural and financial shift, keep reading.

About the team & role

We are building an elite team, applying frontier technologies to the world’s biggest financial problems. We’re looking for bold thinkers. Sharp problem-solvers. Builders who are wired to make an impact. Robinhood isn’t a place for complacency, it’s where ambitious people do the best work of their careers. We’re a high-performing, fast-moving team with ethics at the center of everything we do. Expectations are high, and so are the rewards.

The Robinhood Command Center (RCC) is a newly formed reliability team that serves as the front line for detecting, coordinating, and mitigating production incidents across Robinhood.

As part of Robinhood’s broader reliability initiative, RCC works closely with product engineering, reliability, observability, infrastructure, and business teams to reduce customer impact and shorten incident duration.

As a Staff Reliability Engineer, you will be part of the founding RCC team, helping define how Robinhood responds to and learns from incidents at scale. This is a highly visible role focused on incident leadership, operational excellence, and reliability tooling. You will not own product services or core infrastructure, but you will own the processes and tools that enable fast, high-quality incident response.

This role is based in our New York City, New York office, with in-person attendance expected at least 3 days per week.

What you'll do:

Serve as a senior technical leader driving the long-term reliability and observability strategy across Robinhood’s infrastructure
Partner closely across many different types of engineers to raise the bar for operational excellence and incident response
Lead incident mitigation efforts by coordinating service owners, facilitating time-sensitive decisions like rollbacks, traffic shifts, and maintaining a clear source of truth during active incidents
Develop and maintain incident management processes and procedures to ensure timely resolution and minimize customer impact
Own incident discovery at the company level by defining and maintaining global dashboards and alerts tied to critical user journeys (CUJs), availability, and business-impact metrics

Own and evolve incident response tooling and processes, including education, adoption, and measurement of MTTD/MTTR improvements
Drive post-incident governance and learning, defining standards for postmortems, SEV reviews, and follow-up tracking to ensure durable reliability improvements
Design and implement next-generation failure mitigation strategies that avoid full-region or full-datacenter failovers
Define and build frameworks to improve monitoring, alerting, and observability across hundreds of services and systems
Define and own the roadmap of bringing observability to critical user journeys for Robinhood’s products
Deliver key insights and executive-level reporting to enable better business decisions around service quality and reliability
Act as a force multiplier through mentoring, technical influence, and contributions to hiring and engineering culture

What you bring:

8+ years of software engineering experience, including significant experience operating production systems
4+ years focused on reliability engineering, infrastructure, distributed systems, or production operations

Hands-on experience serving in incident leadership roles (e.g., IMOC, incident commander, primary oncall)
Strong communication and cross-functional collaboration skills, especially during high-severity incidents
Deep knowledge of systems reliability, observability frameworks, and fault-tolerant architecture design
Experience with multi-region or multi-cluster architectures, capacity planning, and failover strategies
Familiarity with modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana)
Demonstrated ability to drive measurable improvements in MTTD, MTTR, availability, or customer impact

What we offer:

Challenging, high-impact work to grow your career
Performance driven compensation with multipliers for outsized impact, bonus programs, equity ownership, and 401(k) matching
Best in class benefits to fuel your work, including 100% paid health insurance for employees with 90% coverage for dependents
Lifestyle wallet - a highly flexible benefits spending account for wellness, learning, and more
Employer-paid life & disability insurance, fertility benefits, and mental health benefits
Time off to recharge including company holidays, paid time off, sick time, parental leave, and more!
Exceptional office experience with catered meals, events, and comfortable workspaces

In addition to the base pay range listed below, this role is also eligible for bonus opportunities + equity + benefits.

Base pay for the successful applicant will depend on a variety of job-related factors, which may include education, training, experience, location, business needs, or market demands. The expected base pay range for this role is based on the location where the work will be performed and is aligned to one of 3 compensation zones. For other locations not listed, compensation can be discussed with your recruiter during the interview process.

Base Pay Range:

Zone 1 (Menlo Park, CA; New York, NY; Bellevue, WA; Washington, DC)

$230,000—$270,000 USD

Zone 2 (Denver, CO; Westlake, TX; Chicago, IL)

$203,000—$238,000 USD

Zone 3 (Lake Mary, FL; Clearwater, FL; Gainesville, FL)

$180,000—$211,000 USD

Click here to learn more about our Total Rewards, which vary by region and entity.

If our mission energizes you and you’re ready to build the future of finance, we look forward to seeing your application.

Robinhood provides equal opportunity for all applicants, offers reasonable accommodations upon request, and complies with applicable equal employment and privacy laws. Inclusion is built into how we hire and work—welcoming different backgrounds, perspectives, and experiences so everyone can do their best. Please review the Privacy Policy for your country of application.

+400% к собеседованиям

Создайте идеальное резюме с помощью ИИ-агента

Навыки

Reliability Engineering
Distributed Systems
Incident Management
Observability
OpenTelemetry
Prometheus
Grafana
Capacity Planning
Architecture Design
Monitoring

Возможные вопросы на собеседовании

Проверка опыта лидерства в критических ситуациях и умения принимать решения под давлением.

Опишите самый сложный инцидент, которым вы руководили в качестве Incident Commander. Какие технические компромиссы вам пришлось принять для быстрого восстановления системы?

Оценка навыков проектирования отказоустойчивых систем, что критично для финансового сектора.

Как бы вы спроектировали стратегию смягчения последствий сбоев, которая позволяет избежать полного переключения между регионами (failover), сохраняя при этом доступность критических пользовательских сценариев?

Проверка владения современным стеком мониторинга, указанным в вакансии.

Каков ваш опыт внедрения OpenTelemetry в крупномасштабных микросервисных архитектурах? С какими основными проблемами производительности вы сталкивались при сборе метрик?

Роль Staff подразумевает влияние на процессы всей компании.

Как вы планируете внедрять культуру постмортемов и обучения на ошибках в командах, которые исторически сопротивляются изменениям в процессах разработки?

Оценка понимания бизнес-метрик и их связи с техническим состоянием системы.

Как вы определяете и отслеживаете Critical User Journeys (CUJ) для финтех-платформы, и как эти данные должны влиять на приоритеты инженерной команды?

Устали искать работу? Мы найдём её за вас

Quick Offer улучшит ваше резюме, подберёт лучшие вакансии и откликнется за вас. Результат — в 3 раза больше приглашений на собеседования и никакой рутины!

СШАот 230 000 $

Откликайтесь
на вакансии с ИИ

Staff Reliability Engineer - Robinhood Command Center

Анализ зарплаты

Сопроводительное письмо

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в robinhood уже сейчас

Описание вакансии

Join us in building the future of finance.

About the team & role

What you'll do:

What you bring:

Создайте идеальное резюме с помощью ИИ-агента

Навыки

Возможные вопросы на собеседовании

Как вы планируете внедрять культуру постмортемов и обучения на ошибках в командах, которые исторически сопротивляются изменениям в процессах разработки?

Как вы определяете и отслеживаете Critical User Journeys (CUJ) для финтех-платформы, и как эти данные должны влиять на приоритеты инженерной команды?

Похожие вакансии

Senior Devops инженер\Тимлид

Senior DevOps

DevOps Middle

DevOps Engineer (Senior)

DevOps Middle/Middle+

DevOps Engineer

Устали искать работу? Мы найдём её за вас

Откликайтесьна вакансии с ИИ

Staff Reliability Engineer - Robinhood Command Center

Анализ зарплаты

Сопроводительное письмо

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в robinhood уже сейчас

Описание вакансии

Join us in building the future of finance.

About the team & role

What you'll do:

What you bring:

Создайте идеальное резюме с помощью ИИ-агента

Навыки

Возможные вопросы на собеседовании

Как вы планируете внедрять культуру постмортемов и обучения на ошибках в командах, которые исторически сопротивляются изменениям в процессах разработки?

Как вы определяете и отслеживаете Critical User Journeys (CUJ) для финтех-платформы, и как эти данные должны влиять на приоритеты инженерной команды?

Похожие вакансии

Senior Devops инженер\Тимлид

Senior DevOps

DevOps Middle

DevOps Engineer (Senior)

DevOps Middle/Middle+

DevOps Engineer

Устали искать работу? Мы найдём её за вас

Откликайтесь
на вакансии с ИИ