Страна: США

+500% приглашений

Откликайтесь
на вакансии с ИИ

SeniorУдалённоПолная занятость

Senior Site Reliability Engineer

Name: Quick Offer — сервис для поиска работы на hh.ru
Brand: Quick Offer
SKU: quick-offer-saas
Availability: InStock
Rating: 4.9 (682 reviews)

Отличная позиция в стабильной продуктовой компании с современным стеком и фокусом на инновации (AI в DevOps). Четко прописанные обязанности и высокие требования указывают на зрелые процессы внутри команды.

Вакансия из Quick Offer Global, списка международных компаний

Пожаловаться

Сложность вакансии

ЛегкоСложно

Роль требует глубоких знаний в архитектуре AWS, безопасности (SOC2) и владения широким стеком технологий (Terraform, Chef, Java, Python). Высокая ответственность за доступность систем 24/7 и участие в on-call ротациях повышают сложность позиции.

Анализ зарплаты

Медиана185 000 $

Рынок155 000 $ – 220 000 $

Зарплата для Senior SRE в США обычно находится в диапазоне $160k-$210k в зависимости от штата и конкретной индустрии. Данная позиция соответствует рыночным ожиданиям для опытных инженеров в сфере SaaS.

I am writing to express my strong interest in the Senior Site Reliability Engineer position at Duetto. With over five years of experience in DevOps and SRE roles, I have developed a deep expertise in architecting scalable AWS infrastructures and managing complex CI/CD pipelines using Terraform, GitHub Actions, and Jenkins. My background in maintaining high-availability systems for SaaS products aligns perfectly with Duetto's mission to redefine hotel revenue strategy through a robust and secure platform.

Throughout my career, I have focused on bridging the gap between development and operations, fostering a culture of shared responsibility and proactive problem-solving. I am particularly drawn to this role because of Duetto's sophisticated tech stack, including MongoDB, Snowflake, and Datadog, and your commitment to adopting AI within SRE workflows. I am confident that my experience in incident management, root cause analysis, and infrastructure-as-code will allow me to contribute immediately to the reliability and performance of your global services.

+250% к просмотрам

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в duettoresearch уже сейчас

Присоединяйтесь к команде Duetto и станьте архитектором надежности для ведущей SaaS-платформы в индустрии гостеприимства!

Описание вакансии

1. About the Company

Duetto, the industry-leading hospitality revenue management system, leads the way in helping hotels, resorts and casinos optimize revenue and boost profit. Our leading SaaS platform, expanding suite of products, and incredibly skilled team have been at the heart of our continued success and our ambition for future growth knows no bounds.

Duetto is building the future of hotel revenue strategy. We’re not just another SaaS company — we’re redefining what’s possible for hotels through our category-creating platform, the Revenue & Profit Operating System.

2. Role Summary / Purpose

We are seeking a highly experienced Senior Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will have a proven track record of designing, implementing, and maintaining scalable, secure, and highly reliable systems. As a key contributor, you will collaborate with cross-functional teams to drive architecture decisions, implement best practices, and ensure high system availability.

Our technology stack is built on AWS and primarily consists of:

Java
Python
NoSql
Single-page JavaScript web techniques (jQuery, Backbone, React, and RequireJS)
Patent-pending analytical methods on top of MongoDB
Postgres
Terraform/Terragrunt and Chef for IaC
DataDog and Prometheus
GitHub for source control
GitHub Actions and Jenkins for CI/CD

3. Key Responsibilities

Architect and implement infrastructure solutions to facilitate seamless migration of critical systems while ensuring uptime, reliability, and a high-quality experience for end users.
Design, develop, test, and maintain tools and processes to efficiently manage and operate SaaS products hosted on AWS, with a focus on scalability and automation.
Partner with developers to enhance the reliability, performance, scalability, and security of server and application architectures.
Build and maintain critical components of our infrastructure, emphasizing robustness, security, and high availability to meet demanding service-level expectations.
Foster strong cross-team collaboration by driving engagement, promoting shared goals, and ensuring alignment across technical and non-technical teams.
Lead efforts to ensure systems are secure by default, addressing vulnerabilities proactively and implementing best practices for cybersecurity preparedness.
Be willing to learn and adopt AI in DevOps/SRE workflows.
Be the last line of support for services that thousands of customers (hotels, resorts, casinos, etc.) around the world depend on 24/7.
Troubleshoot on-call incidents to ensure rapid resolution and minimal service disruption. Participate in detailed Root Cause Analysis (RCA) to identify underlying issues and work cross-functionally to implement preventative measures and long-term solutions, ensuring similar problems are avoided in the future.

4. Qualifications

Required:

5+ years of experience in an Ops, DevOps or SRE role.
Experience in System Design and Architecture.
Engineer-level experience with networking and security concepts.
Understanding of fundamentals behind load balancing technologies. Experience configuring Layer 7 load-balancing is a plus.
Experience collaborating with engineers on architecture decisions.
Experience administering Cloud Computing Services such as AWS (preferred), Azure, or GCP, including working knowledge of permissions structures, multi-account management structures, and single sign-on(SSO).
Experience with AWS ecosystem tools such as AWS IAM, VPC, EC2, ELB, RDS, S3, Lambda, API Gateway, Secrets Manager, KMS, CloudWatch, CloudTrail.
Experience with security compliance certifications such as SOC2.
Experience working in an environment with a heavy emphasis on DevOps and Service Reliability mindset.
Experience provisioning, configuring, administering, and using enterprise monitoring ecosystems like Prometheus, Grafana, DataDog or similar.
Experience with CI/CD Tools such as GitHub, GitHub Actions, JFrog Artifactory, Jenkins, and GitOps methodologies.
Experience using and writing infrastructure-as-code using Terraform.
Experience with configuration-management toolsets such as Chef or Puppet.
Experience with containers and container orchestration tools such as ECS/EKS (a plus).
Experience managing infrastructure and contributing as part of a multi-user infrastructure team, using Terraform and associated toolsets. Relevant SOC2 experience is also a plus.
Fluency in reading Java, Ruby, Bash/Zsh, HCL, Python and Javascript.
Strong experience in troubleshooting and resolving complex on-call incidents with a focus on minimizing service disruption and downtime.
Proven ability to lead and participate in detailed Root Cause Analysis (RCA) processes to identify and address underlying issues effectively.
Demonstrated expertise in implementing preventative measures and long-term solutions based on RCA findings to ensure recurring issues are mitigated.
Experience constructing and maintaining build/deploy automation tooling.
Participate in weekly on-call rotation.
Ability to work both independently and within a team environment.
A passion for technology with a drive to stay up to date with technology and best practices.

Ideal Candidate:

Team Player - Works well with others, highly collaborative and acts as a strong partner to other team members and functions.
Execution - Desire to work on a fast paced team and help set direction and architecture.
Creativity - Thrives in an environment without a set playbook.
Quality - Takes pride in delivering robust and high quality implementations.
Ownership - Enjoys owning and driving projects.

+400% к собеседованиям

Создайте идеальное резюме с помощью ИИ-агента

Навыки

AWS
Terraform
Terragrunt
Chef
Java
Python
MongoDB
PostgreSQL
Datadog
Prometheus
GitHub Actions
Jenkins
Kubernetes
Docker
Ruby
Bash
HCL
JavaScript
React

Возможные вопросы на собеседовании

Проверка опыта работы с инфраструктурой как кодом и понимания модульности.

Расскажите о вашем опыте использования Terraform и Terragrunt для управления сложными многоаккаунтными средами в AWS.

Оценка навыков решения критических проблем и работы под давлением.

Опишите самый сложный инцидент, который вы расследовали. Каков был ваш процесс Root Cause Analysis (RCA) и какие превентивные меры вы внедрили?

SRE должен уметь читать код разработчиков для оптимизации производительности.

Насколько комфортно вы чувствуете себя при отладке приложений на Java или Python, не являясь их основным разработчиком?

Проверка понимания сетевой безопасности и балансировки трафика.

В чем разница между балансировкой на уровне L4 и L7, и в каких случаях вы бы предпочли конфигурацию Layer 7 для SaaS-приложения?

Оценка готовности к современным трендам автоматизации.

Как вы видите применение AI/ML инструментов в процессах мониторинга и реагирования на инциденты в рамках SRE?

Устали искать работу? Мы найдём её за вас

Quick Offer улучшит ваше резюме, подберёт лучшие вакансии и откликнется за вас. Результат — в 3 раза больше приглашений на собеседования и никакой рутины!

США

Откликайтесь
на вакансии с ИИ

Senior Site Reliability Engineer

Анализ зарплаты

Сопроводительное письмо

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в duettoresearch уже сейчас

Описание вакансии

1. About the Company

2. Role Summary / Purpose

3. Key Responsibilities

4. Qualifications

Required:

Ideal Candidate:

Создайте идеальное резюме с помощью ИИ-агента

Навыки

Возможные вопросы на собеседовании

Расскажите о вашем опыте использования Terraform и Terragrunt для управления сложными многоаккаунтными средами в AWS.

Опишите самый сложный инцидент, который вы расследовали. Каков был ваш процесс Root Cause Analysis (RCA) и какие превентивные меры вы внедрили?

Насколько комфортно вы чувствуете себя при отладке приложений на Java или Python, не являясь их основным разработчиком?

В чем разница между балансировкой на уровне L4 и L7, и в каких случаях вы бы предпочли конфигурацию Layer 7 для SaaS-приложения?

Как вы видите применение AI/ML инструментов в процессах мониторинга и реагирования на инциденты в рамках SRE?

Похожие вакансии

Senior Devops инженер\Тимлид

Senior DevOps

DevOps Engineer (Senior)

Инженер-программист DevOps [Senior]

Senior Devops инженер / Тимлид

DevOps (senior)

Устали искать работу? Мы найдём её за вас

Откликайтесьна вакансии с ИИ

Senior Site Reliability Engineer

Анализ зарплаты

Сопроводительное письмо

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в duettoresearch уже сейчас

Описание вакансии

1. About the Company

2. Role Summary / Purpose

3. Key Responsibilities

4. Qualifications

Required:

Ideal Candidate:

Создайте идеальное резюме с помощью ИИ-агента

Навыки

Возможные вопросы на собеседовании

Расскажите о вашем опыте использования Terraform и Terragrunt для управления сложными многоаккаунтными средами в AWS.

Опишите самый сложный инцидент, который вы расследовали. Каков был ваш процесс Root Cause Analysis (RCA) и какие превентивные меры вы внедрили?

Насколько комфортно вы чувствуете себя при отладке приложений на Java или Python, не являясь их основным разработчиком?

В чем разница между балансировкой на уровне L4 и L7, и в каких случаях вы бы предпочли конфигурацию Layer 7 для SaaS-приложения?

Как вы видите применение AI/ML инструментов в процессах мониторинга и реагирования на инциденты в рамках SRE?

Похожие вакансии

Senior Devops инженер\Тимлид

Senior DevOps

DevOps Engineer (Senior)

Инженер-программист DevOps [Senior]

Senior Devops инженер / Тимлид

DevOps (senior)

Устали искать работу? Мы найдём её за вас

Откликайтесь
на вакансии с ИИ