Страна: Испания

+500% приглашений

Откликайтесь
на вакансии с ИИ

УдалённоПолная занятость

Principal Site Reliability Engineer (AI-first SRE)

Name: Quick Offer — сервис для поиска работы на hh.ru
Brand: Quick Offer
SKU: quick-offer-saas
Availability: InStock
Rating: 4.9 (682 reviews)

Высокий балл за инновационный подход (AI-first), возможность удаленной работы и значительное влияние на глобальную платформу. Роль Principal уровня в крупной технологической компании предлагает отличные возможности для профессионального роста и реализации амбициозных проектов.

Вакансия из Quick Offer Global, списка международных компаний

Пожаловаться

Сложность вакансии

ЛегкоСложно

Роль требует исключительного сочетания 10-летнего опыта в системной инженерии, глубоких знаний облачных платформ (GCP/AWS) и практического опыта в AIOps. Высокая сложность обусловлена необходимостью не просто поддерживать системы, а внедрять предиктивные модели и ИИ для автоматизации отказоустойчивости на уровне всей компании.

Анализ зарплаты

Медиана110 000 €

Рынок90 000 € – 140 000 €

Указанная роль Principal уровня в Мадриде или Праге предполагает зарплату значительно выше среднего по рынку для стандартных SRE ролей из-за требований к опыту (10+ лет) и специализации в ИИ. Предлагаемый диапазон соответствует топовым технологическим компаниям в данных регионах.

I am writing to express my strong interest in the Principal Site Reliability Engineer position at Groupon. With over a decade of experience in systems engineering and a deep focus on SRE principles, I am particularly drawn to your vision of an AI-first approach to reliability. My background in architecting resilient Kubernetes environments on GCP and implementing automated observability pipelines aligns perfectly with your goal of moving from reactive maintenance to predictive, self-healing systems.

In my previous roles, I have successfully led SRE transformation programs where I integrated ML-driven anomaly detection to reduce MTTR and improve platform stability. I am passionate about treating reliability as a product and have a proven track record of mentoring engineering teams to adopt adaptive SLIs/SLOs. I am excited about the opportunity to leverage my expertise in AIOps and infrastructure automation to drive revenue resilience and operational excellence at Groupon.

+250% к просмотрам

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в groupon уже сейчас

Присоединяйтесь к Groupon, чтобы возглавить трансформацию SRE с помощью ИИ и строить самовосстанавливающиеся системы будущего!

Описание вакансии

Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms uniquely committed to helping local businesses succeed on a performance basis.

Groupon is on a radical journey to transform our business with relentless pursuit of results. Even with thousands of employees spread across multiple continents, we still maintain a culture that inspires innovation, rewards risk-taking and celebrates success. The impact here can be immediate due to our scale and the speed of our transformation. We're a "best of both worlds" kind of company. We're big enough to have the resources and scale, but small enough that a single person has a surprising amount of autonomy and can make a meaningful impact.

About the Role

Groupon is modernizing its global platform — and reliability is at the center of that transformation. We’re looking for a Principal Site Reliability Engineer to lead the evolution from reactive maintenance to predictive, AI-driven resilience.

You’ll design intelligent, self-healing systems that prevent incidents before they happen, ensuring our customers enjoy fast, secure, and reliable experiences across millions of daily interactions.

📍Remote work model

Key Responsibilities:

Architect and maintain self-healing systems with 99.9%+ availability targets.
Use AI/ML to automate infrastructure governance and detect configuration or IaC anti-patterns.
Implement adaptive SLIs/SLOs that evolve automatically from real-time data.
Build AIOps-based observability and auto-remediation pipelines.
Apply predictive modeling to forecast failures before they impact users.
Lead chaos, performance, and resilience testing programs.
Map platform and service behavior to revenue impact and drive improved revenue resilience through better infrastructure performance.
Mentor engineers and drive reliability standards across teams.
Partner with platform, data, and product teams to ensure stability aligns with business goals.
Support major incident response, incident review, and participate in on-call rotations.

Key Requirements:

10+ years in software/systems engineering, including 5+ years in SRE or platform reliability.
Strong experience with GCP (preferred) or AWS, Kubernetes, and Terraform.
Proficiency in Python or Go for automation and tooling.
Deep understanding of observability stacks (Prometheus, Grafana, OpenTelemetry) and service meshes (Istio, Envoy).
Hands-on AIOps experience: anomaly detection, predictive analytics, ML-assisted operations.
Strong communication and influencing skills — data over hierarchy.

Nice to Have:

Experience with MLOps or large-scale data infrastructure.
Exposure to FinOps or cloud cost optimization.
Previous leadership of global incident response or SRE transformation programs.

**What Success Looks Like**

99.9%+ uptime sustained through predictive rather than reactive responses.
Faster MTTR via automated detection and auto-remediation.
Reliability insights used in leadership decisions.
Mentorship leading to stronger reliability practices across teams.

We Are Interested In

Technologists who see reliability as a product, not just a metric.
Engineers who use AI/ML as a tool for scale and insight.
Leaders who can balance innovation speed with operational excellence.
Engineers who understand the entire e-commerce stack and how it impacts revenue.

What We Offer:

The opportunity to work with cutting-edge technologies in a transformative environment.
Professional growth and leadership development pathways tailored to your aspirations.
A chance to leave a lasting impact by shaping the future of reliable and scalable systems.

Join us to push the boundaries of platform reliability and drive meaningful change in a fast-evolving digital world!

#LI-KH1 #LI-Hybrid

#LI-Remote

Groupon is an AI-First Company

We’re committed to building smarter, faster, and more innovative ways of working—and AI plays a key role in how we get there. We encourage candidates to leverage AI tools during the hiring process where it adds value, and we’re always keen to hear how technology improves the way you work. If you’re passionate about AI or curious to explore how it can elevate your role—you’ll be right at home here.

Groupon’s purpose is to build strong communities through thriving small businesses. To learn more about the world’s largest local e-commerce marketplace, click here. You can also find out more about us in the latest Groupon news as well as learning about our DEI approach. If all of this sounds like something that’s a great fit for you, then click apply and join us on a mission to become the ultimate destination for local experiences and services.

Beware of Recruitment Fraud: Groupon follows a merit-based recruitment process without charging job seekers any fees. We've noticed an increase in recruitment fraud, including fake job postings and fraudulent interviews and job offers aimed at stealing personal information or money. Be cautious of individuals falsely representing Groupon's Talent Acquisition team with fake job offers. If you encounter any suspicious job offers or interview calls demanding money, recognize these as scams. Groupon is not responsible for losses from such dealings. For legitimate job openings (and a sneak peek into life at Groupon), always check our official career website at Groupon Careers

+400% к собеседованиям

Создайте идеальное резюме с помощью ИИ-агента

Навыки

GCP
AWS
Kubernetes
Terraform
Python
Go
Prometheus
Grafana
OpenTelemetry
Istio
Envoy
AIOps
MLOps

Возможные вопросы на собеседовании

Проверка опыта в реализации ключевой задачи вакансии — перехода к самовосстанавливающимся системам.

Опишите ваш опыт проектирования и внедрения систем автоматического восстановления (auto-remediation). С какими основными трудностями вы столкнулись при их масштабировании?

Вакансия делает упор на AI-first SRE. Важно понять, как кандидат применяет ML на практике.

Как именно вы использовали ИИ или машинное обучение для прогнозирования сбоев инфраструктуры или обнаружения аномалий в прошлом?

Роль Principal предполагает влияние на бизнес-показатели.

Как вы связываете технические метрики надежности (SLI/SLO) с бизнес-результатами и выручкой компании? Приведите пример.

Проверка навыков лидерства и управления изменениями в культуре разработки.

Как вы подходите к внедрению стандартов надежности в командах, которые привыкли работать в реактивном режиме «тушения пожаров»?

Оценка технических навыков в управлении сложными распределенными системами.

Каков ваш подход к проведению программ хаос-инжиниринга (Chaos Engineering) в крупномасштабных средах Kubernetes?

Устали искать работу? Мы найдём её за вас

Quick Offer улучшит ваше резюме, подберёт лучшие вакансии и откликнется за вас. Результат — в 3 раза больше приглашений на собеседования и никакой рутины!

Испания

Откликайтесь
на вакансии с ИИ

Principal Site Reliability Engineer (AI-first SRE)

Анализ зарплаты

Сопроводительное письмо

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в groupon уже сейчас

Описание вакансии

Создайте идеальное резюме с помощью ИИ-агента

Навыки

Возможные вопросы на собеседовании

Как именно вы использовали ИИ или машинное обучение для прогнозирования сбоев инфраструктуры или обнаружения аномалий в прошлом?

Как вы связываете технические метрики надежности (SLI/SLO) с бизнес-результатами и выручкой компании? Приведите пример.

Как вы подходите к внедрению стандартов надежности в командах, которые привыкли работать в реактивном режиме «тушения пожаров»?

Каков ваш подход к проведению программ хаос-инжиниринга (Chaos Engineering) в крупномасштабных средах Kubernetes?

Похожие вакансии

DevOps Middle

DevOps Engineer (Senior)

DevOps Middle/Middle+

Инженер-программист DevOps [Senior]

DevOps (senior)

DevOps Senior

Устали искать работу? Мы найдём её за вас

Откликайтесьна вакансии с ИИ

Principal Site Reliability Engineer (AI-first SRE)

Анализ зарплаты

Сопроводительное письмо

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в groupon уже сейчас

Описание вакансии

Создайте идеальное резюме с помощью ИИ-агента

Навыки

Возможные вопросы на собеседовании

Как именно вы использовали ИИ или машинное обучение для прогнозирования сбоев инфраструктуры или обнаружения аномалий в прошлом?

Как вы связываете технические метрики надежности (SLI/SLO) с бизнес-результатами и выручкой компании? Приведите пример.

Как вы подходите к внедрению стандартов надежности в командах, которые привыкли работать в реактивном режиме «тушения пожаров»?

Каков ваш подход к проведению программ хаос-инжиниринга (Chaos Engineering) в крупномасштабных средах Kubernetes?

Похожие вакансии

DevOps Middle

DevOps Engineer (Senior)

DevOps Middle/Middle+

Инженер-программист DevOps [Senior]

DevOps (senior)

DevOps Senior

Устали искать работу? Мы найдём её за вас

Откликайтесь
на вакансии с ИИ