- Страна
- Ирландия
Откликайтесь
на вакансии с ИИ

Major Incident Lead
InterSystems — стабильная международная компания с отличной репутацией. Роль предлагает высокую ответственность, работу с современным стеком (AWS/Azure, SRE) и возможность влиять на глобальные процессы надежности.
Сложность вакансии
Роль требует высокого уровня стрессоустойчивости и способности принимать быстрые решения в условиях неопределенности. Необходимо глубокое понимание SRE, облачных платформ и навыков управления стейкхолдерами на уровне руководства.
Анализ зарплаты
Предлагаемая роль Major Incident Lead в Ирландии (удаленно) соответствует рыночным стандартам для опытных специалистов в области SRE и Incident Management. Уровень компенсации сильно зависит от опыта работы с конкретными платформами (IRIS) и готовности к дежурствам.
Сопроводительное письмо
I am writing to express my strong interest in the Major Incident Lead position at InterSystems. With over five years of experience managing high-severity incidents in mission-critical cloud environments, I have developed the 'reliability-first' mindset and the calm, decisive leadership required to act as an effective Incident Commander. My background in SRE principles and ITIL frameworks aligns perfectly with your managed services model, ensuring that service restoration and stakeholder communication remain top priorities during P1 and P2 events.
Throughout my career, I have excelled at bridging the gap between technical engineering teams and executive leadership. I am particularly drawn to InterSystems' commitment to excellence in healthcare and finance sectors, where platform resilience is paramount. I am confident that my expertise in utilizing observability tools like Grafana and Splunk, combined with my experience in driving post-incident reviews to automate away recurring issues, will contribute significantly to the continued stability and success of your managed services platforms.
Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в intersystems уже сейчас
Присоединяйтесь к InterSystems в качестве лидера по управлению инцидентами и обеспечьте надежность критически важных систем мирового уровня!
Описание вакансии
Overview
We are seeking an experienced Major Incident Lead – Site Reliability to join our Managed Services team. This role is responsible for leading the response to high-severity, customer-impacting incidents across InterSystems’ managed services platforms. Acting as the Incident Commander, the role ensures rapid service restoration, clear and confident stakeholder communication, and disciplined coordination across SRE, engineering, support, cloud, and service delivery teams.
Operating within an SRE-aligned service model, the Major Incident Lead focuses on protecting service reliability through the effective use of service level indicators and service level objectives, prioritizing customer impact reduction over root cause analysis during live incidents. Beyond incident response, the role drives post-incident reviews, turning operational failures into measurable reliability improvements and reduced repeat incidents.
This position is critical to maintaining customer trust, platform resilience, and operational excellence in a 24x7, mission-critical, and highly regulated environment.
Key Responsibilities
- Lead end-to-end management of P1 and P2 major incidents affecting InterSystems managed services customers; Serve as the single point of coordination for major incidents
- Deliver timely, accurate and audience-appropriate communications to customers, partners, internal leadership, service delivery, implementation, support and customer success teams
- Manage executive-level updates during prolonged or high-risk incidents
- Lead post-incident reviews and root cause analysis (RCA)
- Ensure corrective and preventative actions are clearly defined, owned and tracked; while identifying trends, systemic risks and recuring issues for automation opportunities and improved tooling
- Contribute to continuous improvement of incident processes, tooling and runbooks
- Ensure incidents are managed in line with contractual SLAs and regulatory requirements; maintain accurate incident documentation and audit-ready records
- Maintain and improve major incident playbooks and escalation paths; participate in on-call rotations and incident simulations
- Support customer service reviews and contractual discussions when required
Experience/Qualification Required
- 5+ years of experience leading major incidents in a managed services, service delivery, cloud or enterprise IT environment
- Strong understanding of SRE principles, availability engineering, operational resilience practices and ITIL Incident, Problem, and Change Management
- Experience operating in 24x7, mission-critical environments with incident and service management tools (ServiceNow, Jira, PagerDuty)
- Understanding of services related to cloud-native platforms (AWS, Azure or GCP), observability solutions (Coralogix, Grafana, Prometheus, CloudWatch, Splunk)
- Strong understanding of data platform operations, high availability and scaling in support of mission-critical systems such as InterSystems IRIS, HealthShare, Intellicare and Trakcare
- Ability to lead under pressure and make clear, structured decisions; Reliability-first mindset
- Excellent verbal and written communication skills
- Strong stakeholder management, including senior leadership and customers
- Bachelors or Masters degree in Computer Science, Engineering or related technical field
About InterSystems
InterSystems, a creative data technology provider, delivers a unified foundation for next-generation applications for healthcare, finance, manufacturing, and supply chain customers in more than 80 countries. Our data platforms solve interoperability, speed, and scalability problems for large organizations around the globe to unlock the power of data and allow people to perceive data in imaginative ways. Established in 1978, InterSystems is committed to excellence through its 24×7 support for customers and partners around the world. Privately held and headquartered in Boston, Massachusetts, InterSystems has 38 offices in 28 countries worldwide. For more information, please visit InterSystems.com.
Создайте идеальное резюме с помощью ИИ-агента

Навыки
- SRE
- Incident Management
- ITIL
- AWS
- Azure
- GCP
- ServiceNow
- Jira
- PagerDuty
- Grafana
- Prometheus
- Splunk
- Coralogix
- CloudWatch
- InterSystems IRIS
Возможные вопросы на собеседовании
Проверка способности сохранять спокойствие и структурированность в критических ситуациях.
Опишите самый сложный инцидент P1, которым вы руководили. Какие шаги вы предприняли для восстановления сервиса и как управляли ожиданиями стейкхолдеров?
Оценка понимания философии SRE, где восстановление важнее поиска причин во время сбоя.
Как вы расставляете приоритеты между немедленным восстановлением обслуживания и сбором данных для последующего анализа первопричин (RCA)?
Проверка навыков коммуникации с нетехнической аудиторией.
Как бы вы объяснили техническую причину длительного простоя крупному клиенту или топ-менеджменту компании?
Оценка вклада кандидата в долгосрочную стабильность системы.
Каким образом вы превращаете результаты разбора инцидентов (Post-incident reviews) в конкретные улучшения надежности системы?
Проверка технического кругозора в области мониторинга.
Какие метрики (SLI/SLO) вы считаете наиболее критичными для мониторинга здоровья облачной платформы данных?
Похожие вакансии
Strategic Technical Consultant - Clarity
Software Engineer Visual Simulation, Advanced Capabilities
Simulation Physicist - Ion Transport and Waveform Design
Physicist - Resilience Engineering and Operations
Forward Deployed Engineer (£120k) at Deducta
Engineering Technician
1000+ офферов получено
Устали искать работу? Мы найдём её за вас
Quick Offer улучшит ваше резюме, подберёт лучшие вакансии и откликнется за вас. Результат — в 3 раза больше приглашений на собеседования и никакой рутины!
- Страна
- Ирландия