yandex
Страна
США
Зарплата
170 000 $ – 230 000 $
+500% приглашений

Откликайтесь
на вакансии с ИИ

Ускорим процесс поиска работы
В офисеПолная занятость

Staff SRE/DevOps Engineer (Platform Reliability & Security)

Оценка ИИ

Исключительная позиция в инновационном инкубаторе с высокой зарплатой, полной оплатой страховки и возможностью работать над передовыми ИИ-технологиями. Высокий балл за влияние на продукт и отличный соцпакет.


Вакансия из Quick Offer Global, списка международных компаний
Пожаловаться

Сложность вакансии

ЛегкоСложно
Оценка ИИ

Роль уровня Staff требует более 10 лет опыта и глубокой экспертизы не только в DevOps, но и в архитектуре распределенных систем, безопасности и управлении SLO. Высокая сложность обусловлена необходимостью работы с долгоживущими stateful-воркфлоу и строгими требованиями комплаенса.

Анализ зарплаты

Медиана210 000 $
Рынок185 000 $ – 250 000 $
Оценка ИИ

Предлагаемый диапазон $170k–$230k полностью соответствует рыночным стандартам для позиции Staff-уровня в Сиэтле, где медиана составляет около $210k без учета бонусов и опционов. Верхняя граница в $230k является очень конкурентоспособной даже для крупных технологических хабов.

Сопроводительное письмо

I am writing to express my strong interest in the Staff SRE/DevOps Engineer position at Trase Systems. With over a decade of experience in designing and operating production distributed systems, I am particularly drawn to your mission of simplifying AI adoption for enterprises while maintaining rigorous security and reliability standards. My background in building resilient infrastructure for long-lived, stateful workflows aligns perfectly with the architectural challenges described for Trase OS.

Throughout my career, I have specialized in transforming infrastructure into a strategic asset rather than a support function. I have extensive experience implementing SLO-driven engineering practices, managing complex CI/CD pipelines, and securing service-to-service communications in regulated environments. I am especially excited about the opportunity to leverage LLMs to automate operational workflows and to mentor a high-performing engineering team as you scale your mission-critical applications in healthcare and national security.

+250% к просмотрам

Составьте идеальное письмо к вакансии с ИИ-агентом

Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в redcellpartners уже сейчас

Присоединяйтесь к Trase, чтобы строить фундамент надежности для ИИ-агентов нового поколения в Red Cell Partners!

Описание вакансии

About Us

Red Cell Partners is an incubation firm building and investing in rapidly scalable technology-led companies that are bringing revolutionary advancements to market in three distinct practice areas: healthcare, cyber, and national security. United by a shared sense of duty and deep belief in the power of innovation, Red Cell is developing powerful tools and solutions to address our Nation’s most pressing problems.

About Trase

Co-founded in 2023 by Joe Laws and Grant Verstandig, Trase Systems is AI, Uncomplicated. Trase empowers enterprise leaders to harness the full potential of AI without the associated complexity and risks. We are an end-to-end solution for deploying, managing, and optimizing AI in the enterprise. Our platform specializes in bridging the “last mile” of AI adoption, unlocking AI's full potential while driving efficiency and significant cost savings. Trase is at the forefront of AI Agent innovation, topping the Hugging Face GAIA Leaderboard for Generalized AI Assistants, ahead of industry giants such as Google, Meta, Microsoft, and OpenAI. We are leveraging our cutting-edge technologies to develop mission-critical agentic applications in complex industries such as Healthcare, Oil & Gas, and National Security.

About the Role

Location: Seattle, WA area

As a Staff DevOps Engineer, you will own the reliability, security, and operational foundations of Trase OS, the shared platform that powers every Trase deployment.

This is a core engineering role, not a support function. You will design and operate the infrastructure, delivery systems, and runtime controls that allow the OS platform  to safely run long-lived, multi-step workflows under real security and compliance constraints.

Your work directly shapes the architecture of the platform and determines how confidently Trase can scale.

Why this role is needed

Trase OS is a distributed system with long-lived, stateful workflows and strict security constraints. Reliability and security are core architectural concerns, not operational afterthoughts.

Without strong infrastructure ownership, small failures can become systemic instability, and scaling introduces risk instead of leverage.

This role exists to:

  • Prevent systemic instability at the platform level
  • Establish reliability and security as first-class design properties
  • Enable safe, repeatable scaling as customer count, workload complexity, and regulatory expectations grow

What you'll do

You will work on:

  • A shared platform that powers all Trase deployments
  • Long-running, multi-step workflows that must survive failures and restarts
  • Security-first architecture including authentication, RBAC, auditability, and traceability
  • Platform-level observability so customers can trust what the system did and why
  • Infrastructure and delivery systems that turn one-off builds into reusable platform capabilities

You will:

  • Own deployment, runtime reliability, and security for Trase OS services and infrastructure
  • Design and operate cloud infrastructure supporting secure, repeatable multi-environment deployments
  • Build and maintain CI/CD systems, release orchestration, and environment management to ensure safe, predictable delivery
  • Own observability systems (metrics, logs, traces, alerting) enabling rapid detection, diagnosis, and recovery
  • Design and operate networking and traffic management, including secure service-to-service communication and rollout patterns
  • Implement and operate policy enforcement mechanisms (e.g., service mesh controls, authentication/authorization integration, runtime guardrails)
  • Define, instrument, and operate service level objective and indicators (SLOs/SLIs) and error budgets, and use them to drive engineering decisions
  • Ensure the system is resilient by design, including:
  • Failure isolation and blast-radius control
  • Safe retries and idempotency
  • State recovery for long-lived workflows
  • Capacity planning and operational runbooks

Staff-level technical leadership

  • Lead infrastructure and reliability architecture across teams building on Trase OS
  • Set standards for production readiness, security posture, and operational excellence
  • Drive adoption of SLO-driven engineering practices across the platform
  • Partner with platform, product, and DevEx engineers to align architecture with developer velocity and customer trust
  • Mentor engineers (including senior engineers) and raise the bar for how Trase designs, ships, and operates distributed systems

Qualifications

Required

  • 10+ years of experience designing and operating production distributed systems
  • Significant experience with reliability and security-critical systems
  • Deep expertise in several of: cloud infrastructure, CI/CD, observability, networking, service-to-service security, and runtime operations
  • Experience defining and operating SLOs/SLIs and using them to guide engineering tradeoffs
  • Strong software engineering fundamentals and ability to automate infrastructure and operational workflows
  • Proven ability to lead cross-team initiatives and influence platform-level architecture
  • Experience using LLMs to automate operational workflows, infrastructure management, and incident investigation

Nice to have

  • Experience with service mesh or policy-as-code systems
  • Experience operating systems in regulated or security-sensitive environments
  • Experience with HIPAA and government regulations on data handling and protection
  • Background supporting long-running or stateful workloads

Up to 20% travel may be required.

If you want to be on the cutting edge of technology, building AI solutions for the future, and are up for a challenge, let’s talk!

Salary Range: $170,000-230,000. This represents the typical salary range for this position based on experience, skills, and other factors.

Our Red Cell Partners Benefits:

For full-time roles

  • Career track opportunity with potential for rapid advancement with strong performance as the firm grows
  • 100% employer paid, comprehensive health care including medical, dental, and vision for you and your family.
  • Paid maternity and paternity for 14 weeks at employees' normal pay.
  • Unlimited PTO, with management approval.
  • Opportunities for professional development and continued learning.
  • Optional 401K, FSA, and equity incentives available.
  • Mental health benefits are available through Tara Mind.
  • Cost effective GLP-1 solutions available through Crux.

We’re an Equal Opportunity Employer: You’ll receive consideration for employment without regard to race, sex, color, religion, sexual orientation, gender identity, national origin, protected veteran status, or on the basis of disability.


*Applicant Data Disclosure*   

By submitting an application, you acknowledge that Red Cell Partners, LLC ("Red Cell") uses third-party service providers to facilitate its recruitment and hiring processes. These providers include applicant tracking systems, candidate verification platforms, and fraud detection tools (collectively, "Hiring Platforms"). Your application materials, including your résumé, cover letter, work samples, responses to application questions, and any other information you submit, may be transmitted to and processed by these Hiring Platforms for the following purposes:  

  • Managing and administering your application throughout the hiring process;
  • Verifying the accuracy and authenticity of application materials, including by cross-referencing information you provide against publicly available sources and proprietary databases;
  • Identifying indicators of potentially fraudulent, fabricated, or materially misleading application content, including but not limited to discrepancies between submitted materials and publicly available professional profiles, geographic anomalies, and fabricated work histories.

Applications that are flagged through this process as containing indicators of fraud or material misrepresentation may be declined from further consideration. If you have questions about the status of your application or the evaluation process, please contacttalent@redcellpartners.com. 

Red Cell requires its Hiring Platform providers to process your information solely for the purposes described above and in accordance with applicable law. Your information will be retained only for as long as necessary to fulfill these purposes and any applicable legal obligations, after which it will be deleted in accordance with Red Cell's data retention policies.

For more information about how your data is used, please refer to our Privacy Policy and Applicant Privacy Notice.

+400% к собеседованиям

Создайте идеальное резюме с помощью ИИ-агента

Создайте идеальное резюме с помощью ИИ-агента

Навыки

  • RBAC
  • LLM
  • Kubernetes
  • SRE
  • CI/CD
  • DevOps
  • Networking
  • Security
  • Distributed Systems
  • Cloud Infrastructure
  • Observability
  • SLO
  • Service Mesh
  • SLI

Возможные вопросы на собеседовании

Проверка опыта работы с долгоживущими процессами, которые являются ключевыми для Trase OS.

Как бы вы спроектировали инфраструктуру для обеспечения отказоустойчивости и восстановления состояния (state recovery) для воркфлоу, которые выполняются несколько дней?

Оценка навыков стратегического планирования надежности.

Опишите ваш процесс внедрения SLO/SLI в организации: как вы определяете метрики и как связываете их с процессом принятия инженерных решений?

Проверка компетенций в области безопасности, критически важных для работы с государственными и медицинскими данными.

Какие механизмы обеспечения безопасности на уровне рантайма и сетевого взаимодействия (например, Service Mesh) вы считаете наиболее эффективными для систем с требованиями HIPAA?

Оценка лидерских качеств и способности влиять на архитектуру.

Приведите пример, когда вам пришлось убеждать команду изменить архитектурный подход в пользу надежности, несмотря на давление со стороны бизнеса по скорости выпуска фич.

Проверка инновационного подхода, упомянутого в вакансии.

Как именно вы использовали или планируете использовать LLM для автоматизации расследования инцидентов или управления инфраструктурой?

Похожие вакансии

N
NeuroVision
300 000 ₽ – 500 000 ₽

DevOps Engineer / Blockchain & AI Infrastructure Engineer

SeniorУдалённоРоссия
DevOps · Blockchain · Artificial Intelligence · Computer Vision · NVIDIA GPU · CUDA · TensorRT · Kubernetes · Bare Metal · CI/CD
+10 навыков
ОБ
ОТП Банк
250 000 ₽ – 300 000 ₽

DevOps / SRE инженер Middle+

MiddleУдалённоРоссия
Docker · Kubernetes · Deckhouse · Helm · Ansible · Terraform · Prometheus · Grafana · Zabbix · ELK stack · Python · Bash · Go · GitLab · GitHub · BitBucket · Linux · CI/CD
+18 навыков
O
OUTKOD
270 000 ₽ – 320 000 ₽

Devops senior

SeniorУдалённоРоссия
Linux · FreeIPA · Active Directory · TCP/IP · SMTP · IMAP · LDAP · DNS · PKI · Bash · OpenID Connect · Astra Linux
+12 навыков
OL
OK LLC
100 000 ₽ – 200 000 ₽

DevOps\SRE Engineer

JuniorУдалённоРоссия
Linux · PostgreSQL · TCP · UDP · HTTP · Networking · iptables · Grafana · VPS · VDS · Bash · Python
+12 навыков
D
DstLab
240 000 ₽ – 280 000 ₽

Devops Middle+/Senior

SeniorУдалённоРоссия
Kubernetes · Redis · Kafka · Keycloak · PostgreSQL · MonetDB · VK Cloud · GitLab CI · ArgoCD · HashiCorp Vault · Prometheus · Grafana · ELK stack · Linux
+14 навыков
КТ
Комплексные технологии
200 000 ₽ – 220 000 ₽

DevOps Middle +/ Senior

SeniorУдалённоРоссия
SQL · Kubernetes · Docker · Ansible · Prometheus · Grafana · ELK stack · CI/CD · Java · Go · C++ · Bash · Terraform · SonarQube · SAST · Python · Linux · Windows Server · Cisco · MikroTik · Fortinet · Ubiquiti · TCP/IP · DNS · DHCP · BGP · OSPF · VLAN · NAT · Zero Trust · RBAC · SIEM · Zabbix · Wazuh · PowerShell · VMware · Proxmox · Hyper-V · KVM
+39 навыков
более 1000 офферов получено
4.9

1000+ офферов получено

Устали искать работу? Мы найдём её за вас

Quick Offer улучшит ваше резюме, подберёт лучшие вакансии и откликнется за вас. Результат — в 3 раза больше приглашений на собеседования и никакой рутины!

Страна
США
Зарплата
170 000 $ – 230 000 $