yandex
A
agoda
Страна
Таиланд
+500% приглашений

Откликайтесь
на вакансии с ИИ

Ускорим процесс поиска работы
LeadВ офисеПолная занятость

Lead Software Engineer, Devops domain (Bangkok based, relocation provided)

Оценка ИИ

Отличная вакансия от мирового лидера индустрии с релокационным пакетом в Бангкок. Высокие требования компенсируются масштабом задач и возможностью влиять на архитектуру систем, обслуживающих миллионы пользователей.


Вакансия из Quick Offer Global, списка международных компаний
Пожаловаться

Сложность вакансии

ЛегкоСложно
Оценка ИИ

Высокая сложность обусловлена требованием опыта более 8 лет, необходимостью глубоких знаний архитектуры распределенных систем и Kubernetes, а также лидерской ролью в управлении инцидентами на уровне всей организации.

Анализ зарплаты

Медиана85 000 $
Рынок70 000 $ – 110 000 $
Оценка ИИ

Предлагаемая позиция Lead уровня в международной компании в Таиланде обычно предполагает зарплату выше среднего по местному рынку, учитывая релокационный пакет и статус Agoda как части Booking Holdings.

Сопроводительное письмо

I am writing to express my strong interest in the Lead Software Engineer (DevOps/SRE domain) position at Agoda. With over 8 years of experience in architecting and operating mission-critical production systems, I have developed a deep expertise in building resilient distributed systems and promoting SRE best practices at scale. My background in developing Kubernetes-based platforms and implementing automated safe-deployment strategies aligns perfectly with Agoda's mission to enhance system reliability while maintaining high developer velocity.

In my previous roles, I have successfully led cross-functional initiatives to implement SLI/SLO-driven engineering and advanced observability using Prometheus and Grafana. I am particularly drawn to this opportunity at Agoda because of the scale at which the company operates and the technical challenge of managing high-QPS systems across multiple clusters. I am eager to bring my experience in incident management, service mesh technologies like Istio, and automated rollback systems to help Agoda continue its journey as a leader in the travel technology space.

+250% к просмотрам

Составьте идеальное письмо к вакансии с ИИ-агентом

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в agoda уже сейчас

Присоединяйтесь к Agoda в Бангкоке и возглавьте развитие SRE-платформ мирового уровня!

Описание вакансии

About Agoda

At Agoda, we bridge the world through travel. Our story began in 2005, when two lifelong friends and entrepreneurs, driven by their passion for travel, launched Agoda to make it easier for everyone to explore the world.

Today, we are part of Booking Holdings [NASDAQ: BKNG], with a diverse team of over 7,000 people from 90 countries, working together in offices around the globe. Every day, we connect people to destinations and experiences, with our great deals across our millions of hotels and holiday properties, flights, and experiences worldwide.

No two days are the same at Agoda. Data and technology are at the heart of our culture, fueling our curiosity and innovation. If you’re ready to begin your best journey and help build travel for the world, join us.

In this Role, you'll get to:

· Lead the technical vision, architecture, and execution of new SRE platforms or reliability initiatives.

· Define and promote SRE best practices across Agoda’s services e.g., SLI/SLO-driven engineering, error budgets, and other data-driven reliability factors.

· Design, build, and operate reliability platforms including load shedding , business signals monitoring, and safe-deployment automation to reduce blast radius while preserving developer velocity..

· Own safe deployment strategies such as canary releases, automated rollback, and business-impact protection integrated with deployment & monitoring.

· Proactively identify and mitigate reliability and scaling risks across Agoda’s services.

· Improve system resilience and multi-cluster readiness by partnering with platform team and operation team.

· Lead major incident response and operational excellence, driving fast detection, mitigation, root cause analysis, postmortems, and learnings focused on business impact.

· Maintain and evolve incident, observability, alerting, and on-call tooling, improving signal quality, alert enrichment, grouping, and reducing time-to-clue and time-to-mitigation for NOC and on-call engineers.

· Advance platform observability and reliability signals using Prometheus and Grafana, balancing actionability, scale, and cost efficiency.

· Define reliability roadmaps and OKRs, translating ambiguous business reliability goals into clear technical requirements.

What You’ll Need to Succeed:

· 8+ years of relevant experience.

· Demonstrated ownership of architecting, building, and operating mission-critical production systems, making long-term technical and reliability trade-off decisions.

· Proven ability to lead and coordinate complex cross-team initiatives, setting technical direction and aligning stakeholders to deliver outcomes at organizational scale.

· Expertise in one or more programming skills (e.g., Go, Python, Rust, Java) with a solid understanding of distributed systems fundamentals (concurrency, backpressure, timeouts/retries, idempotency, circuit breaking).

· Deep hands-on experience with the Kubernetes ecosystem, service mesh technologies (e.g., Istio), Kubernetes deployment workflows (e.g., Argo CD).

· Observability & monitoring expertise, using Prometheus, Grafana, and common logging/telemetry stacks (e.g., OpenTelemetry), with an understanding of signal quality, scalability, and cost trade-offs.

· Strong incident management lifecycle aiming for improving area of alert quality, alert management, incident response, RCA, and postmortems.

· Experience with reliability engineering patterns such as canary deployments, automated rollback, capacity/right-sizing automation, and production operation.

· Solid data analysis, including SQL(e.g., PostgreSQL, MSSQL) and data pipelines.

· Data-driven mindset, able to perform deep research, analyze complex problems, and make informed technical decisions.

· Excellent communication and collaboration skills, able to explain complex technical concepts clearly to stakeholders at all levels, and to operate effectively both as a self-directed individual contributor and as part of a team.

· Curiosity and continuous learning, staying current with industry trends, open-source advancements, and emerging reliability practices.

Nice-to-Have:

· Experience operating large-scale, high-QPS systems serving millions of users in domains such as e-commerce, travel, or fintech.

· Hands-on experience with multi-region / multi-DC architectures and traffic isolation or failover strategies.

· Background in chaos engineering and resilience testing.

· Experience defining or scaling org-wide SLO/SRE frameworks.

· Built or operated Kubernetes controllers/operators.

· Exposure to ML-assisted detection or statistical methods for signal tuning (e.g., windowing strategies, precision/recall trade-offs).

Discover more about working at Agoda

Equal Opportunity Employer

At Agoda, we pride ourselves on being a company represented by people of all different backgrounds and orientations. We prioritize attracting diverse talent and cultivating an inclusive environment that encourages collaboration and innovation. Employment at Agoda is based solely on a person’s merit and qualifications. We are committed to providing equal employment opportunity regardless of sex, age, race, color, national origin, religion, marital status, pregnancy, sexual orientation, gender identity, disability, citizenship, veteran or military status, and other legally protected characteristics.

We will keep your application on file so that we can consider you for future vacancies and you can always ask to have your details removed from the file. For more details please read our privacy policy.

Disclaimer

We do not accept any terms or conditions, nor do we recognize any agency’s representation of a candidate, from unsolicited third-party or agency submissions. If we receive unsolicited or speculative CVs, we reserve the right to contact and hire the candidate directly without any obligation to pay a recruitment fee.

+400% к собеседованиям

Создайте идеальное резюме с помощью ИИ-агента

Создайте идеальное резюме с помощью ИИ-агента

Навыки

  • Python
  • Rust
  • Kubernetes
  • Prometheus
  • Grafana
  • OpenTelemetry
  • SRE
  • PostgreSQL
  • Java
  • Chaos Engineering
  • SQL Server
  • Go
  • Service Mesh
  • Istio
  • Argo CD

Возможные вопросы на собеседовании

Проверка опыта работы с высоконагруженными системами и понимания механизмов защиты.

Расскажите о вашем опыте внедрения стратегий load shedding и автоматического отката (rollback) в крупномасштабных системах.

Оценка навыков управления надежностью через метрики.

Как вы подходите к определению SLI/SLO для критически важных сервисов и как вы работаете с бюджетом ошибок (error budgets)?

Проверка технических знаний экосистемы Kubernetes.

Опишите ваш опыт работы с Service Mesh (например, Istio) и Argo CD для обеспечения безопасного развертывания.

Оценка лидерских качеств и навыков антикризисного управления.

Опишите самый сложный инцидент, которым вы руководили. Как вы организовали процесс RCA и какие системные изменения были внедрены после?

Проверка способности балансировать между техническим совершенством и стоимостью.

Как вы оптимизируете затраты на мониторинг и логирование (Prometheus/Grafana) без потери качества наблюдаемости (observability)?

Похожие вакансии

Б
Билайн
Не указана

Ведущий инженер DevOps

LeadУдалённоРоссия
CI/CD · Docker · Kubernetes · GitLab CI/CD · Jenkins · Harbor · Jira · Confluence · Monitoring · Logging · Tracing
+11 навыков
NDA
Не указана

DevOps Team Lead

LeadВ офисеПольша
Kubernetes · Docker · Helm · Terraform · Ansible · Jenkins · GitLab CI · Linux · CentOS · RedHat · GCP · Bash · Python · Gradle
+14 навыков
N
NeuroVision
300 000 ₽ – 500 000 ₽

DevOps Engineer / Blockchain & AI Infrastructure Engineer

SeniorУдалённоРоссия
DevOps · Blockchain · Artificial Intelligence · Computer Vision · NVIDIA GPU · CUDA · TensorRT · Kubernetes · Bare Metal · CI/CD
+10 навыков
P
peloton
190 800 $ – 267 250 $

Prinicpal Architect, Site Reliability Engineering

LeadУдалённоСША
SRE · DevOps · NetSuite · Workday · Coupa · Salesforce · Python · Go · Terraform · Datadog · Splunk · New Relic · Prometheus · Okta · Azure AD · IAM · SD-WAN · VPN
+18 навыков
S
synthesishealth
170 000 $ – 205 000 $

Principal Platform Engineer

LeadУдалённоСША
TypeScript · Node.js · Go · Kubernetes · PostgreSQL · Kafka · gRPC · REST · CQRS · Distributed Systems · Microservices · DICOM · HL7 · FHIR · HIPAA
+15 навыков
ОБ
ОТП Банк
250 000 ₽ – 300 000 ₽

DevOps / SRE инженер Middle+

MiddleУдалённоРоссия
Docker · Kubernetes · Deckhouse · Helm · Ansible · Terraform · Prometheus · Grafana · Zabbix · ELK stack · Python · Bash · Go · GitLab · GitHub · BitBucket · Linux · CI/CD
+18 навыков
более 1000 офферов получено
4.9

1000+ офферов получено

Устали искать работу? Мы найдём её за вас

Quick Offer улучшит ваше резюме, подберёт лучшие вакансии и откликнется за вас. Результат — в 3 раза больше приглашений на собеседования и никакой рутины!

A
agoda
Страна
Таиланд