- Страна
- Германия
Откликайтесь
на вакансии с ИИ

Senior Data Engineer
Работа в культовой компании с мировым именем над передовыми технологиями на стыке Data Engineering и AI. Отличные возможности для профессионального роста и сильный стек технологий.
Сложность вакансии
Роль требует глубоких знаний не только в классическом Data Engineering (Airflow, dbt, SQL), но и в специфических технологиях для AI, таких как векторные базы данных и RAG-системы. Высокая ответственность за владение схемой данных и интеграцию с LLM повышает порог входа.
Анализ зарплаты
Предлагаемая позиция Senior уровня в Германии в сфере AI/Data обычно предполагает конкурентную оплату. Указанный диапазон соответствует рыночным ожиданиям для опытных инженеров в регионе Северный Рейн-Вестфалия.
Сопроводительное письмо
I am writing to express my strong interest in the Senior Data Engineer position at Atari. With a proven track record of building production-grade data pipelines and a deep understanding of schema design for AI consumption, I am excited about the opportunity to help Atari build the knowledge systems that power its next generation of AI automation. My experience with orchestration tools like Apache Airflow and vector databases such as Pinecone aligns perfectly with your technical requirements.
In my previous roles, I have focused on transforming raw source material into structured, agent-consumable knowledge, ensuring high data quality through frameworks like Great Expectations. I am particularly drawn to Atari's unique position in the gaming industry and the challenge of integrating specialist domain knowledge into scalable data assets. I am confident that my expertise in RAG systems and cloud infrastructure will allow me to contribute immediately to your engineering team's success.
Составьте идеальное письмо к вакансии с ИИ-агентом

Откликнитесь в atariinc уже сейчас
Присоединяйтесь к легендарной Atari и создавайте будущее AI-автоматизации в игровой индустрии!
Описание вакансии
About Atari
Atari is an interactive entertainment company and an iconic gaming industry brand recognized worldwide for its multi-platform games and licensed products. Atari owns and/or manages a portfolio of more than 400 games and franchises, including globally recognized brands such as Asteroids®, Centipede®, Missile Command®, Pong®, and RollerCoaster Tycoon®.
The Atari family of brands includes Digital Eclipse, Nightdive Studios, Infogrames, AtariAge, MobyGames, as well as Coatsink, Early Morning Studios, and Stormteller Games—spanning game development, publishing, and community experiences worldwide.
Atari operates internationally with offices in New York, Paris, Germany, and India.
Overview
Build and own the data pipelines and knowledge systems that power AI automation. You own the schema, the sources, the quality, and the completeness — working closely with the engineering team to ensure the right knowledge is always in the system and working.
What You’ll Do
Data Schema Design and Ownership
- Design and own canonical data schemas that structure knowledge for AI consumption — versioned, documented, and treated as engineering artefacts.
- Evolve schemas as AI system requirements change: assess downstream impact and coordinate with the engineering team before any rollout.
- Keep the knowledge model legible to domain experts, AI engineers, and future team members through clear documentation of schema decisions.
Source Integration and Pipeline Engineering
- Identify and connect to source systems — databases, repositories, APIs, workflow outputs, domain expert inputs — and build reliable, monitored ingestion pipelines for each.
- Build and maintain production-grade data pipelines using pipeline orchestration tooling (such as Apache Airflow, Prefect, or equivalent): extraction, transformation, validation, loading, error handling, and full observability.
- Keep pipelines current as sources change, new knowledge is produced, or schemas evolve — downstream AI systems are never fed stale or broken data.
Data Cleaning, Transformation, and Quality
- Transform raw source material into structured, consistent, agent-consumable knowledge: normalise formats, resolve conflicts, enforce schema conformance, and eliminate ambiguity.
- Own data quality end-to-end: define standards, build validation logic at ingestion using data quality frameworks (such as Great Expectations or equivalent), and maintain quality metrics across the full knowledge estate.
- Trace quality issues to their root — pipeline fault, schema gap, or source problem — and fix there, not downstream.
Data Systems Maintenance and Lifecycle Management
- Maintain all data pipelines and knowledge systems in production: monitor health, manage dependencies, handle source system changes, and ensure nothing degrades silently over time.
- Run periodic data cleanup: identify and remove stale, duplicate, or degraded knowledge assets that would dilute AI system quality if left in place.
- Manage knowledge deprecation cleanly: when data becomes outdated or superseded, retire it from the system without breaking downstream pipelines or AI system behaviour.
AI Knowledge System Partnership
- Work in a closed loop with the engineering team: diagnose knowledge gaps when AI outputs fail, fix at the pipeline or schema level, and validate through the evaluation framework.
- Understand how RAG retrieval and context assembly work well enough to make data design decisions with AI consumption as the primary constraint.
- Proactively surface knowledge gaps by monitoring AI system failures and escalations, and address root causes before they become production issues.
- Extract specialist knowledge from domain experts and convert it into structured, pipeline-managed data assets the AI system can rely on.
Data Infrastructure and Governance
- Select and manage data storage infrastructure appropriate to each knowledge type — relational databases, document stores, and vector databases (such as Pinecone, Weaviate, or pgvector) — with AI consumption as the deciding factor.
- Implement full data lineage and audit trails: origin, transformation history, and change log for every knowledge asset.
- Enforce data versioning and change management across the knowledge estate, with rollback capability preserved at every step.
- Build a data estate that is legible and accessible across the organisation — laying the foundation for intelligence and analytics as the product portfolio grows.
Stakeholder Communication and Collaboration
- Translate between domain and engineering language in both directions — extracting knowledge requirements from non-technical stakeholders and communicating data decisions and tradeoffs back clearly.
- Work with domain experts to understand how data should be structured and what outputs need to look like — then convert that directly into schema and pipeline requirements.
- Keep engineering, product, and domain stakeholders appropriately informed on pipeline health, data quality, and knowledge gaps — right level of detail, right audience.
Requirements & Qualifications
- Proven track record building and owning production data pipelines end-to-end: source integration, transformation, validation, loading, observability, and ongoing maintenance.
- Strong data schema design skills: canonical data modelling, schema evolution management, downstream impact assessment, and decision documentation.
- Experience integrating with diverse source systems — APIs, databases, document repositories, event streams — with reliable, monitored ingestion pipelines.
- Hands-on experience with pipeline orchestration tooling (such as Apache Airflow, Prefect, or equivalent): scheduling, dependency management, failure handling, and pipeline observability.
- Data cleaning and transformation expertise: normalisation, conflict resolution, format standardisation, and quality validation at scale using data transformation tools (such as dbt or equivalent).
- Experience with vector databases (such as Pinecone, Weaviate, or pgvector) and relational or document stores: schema design, indexing, and production data management.
- Working knowledge of how RAG systems and context assembly operate — sufficient to make data design decisions with AI consumption as the primary constraint.
- Experience partnering with AI or ML engineers: translating knowledge requirements into pipeline tasks and closing feedback loops between AI output quality and data quality.
- Ability to interrogate and profile data across sources — querying, inspecting, and validating content using SQL and Python to identify anomalies, gaps, and quality issues before they reach AI systems.
- Cloud platform experience (AWS, Azure, or GCP): deploying and operating data pipelines and storage systems at production reliability standards.
Preferred / Nice-to-Have
- Experience in the gaming industry: game development pipelines, content management, or platform-specific data and metadata requirements.
- Familiarity with game engine data formats, asset pipelines, or platform SDK data structures (Xbox GDK, PlayStation SDK, or similar).
To Apply
Please submit your resume and a brief cover letter outlining your experience and interest in the role. If available, you are also welcome to include a link to your portfolio.
EEO Statement
Atari is an equal opportunity employer and we are committed to providing a workplace free from harassment and discrimination. We are committed to equal employment regardless of race, religion or lack thereof, color, national origin, gender, sexual orientation, gender identity or expression, age, marital status, medical condition, veteran status, ancestry, disability status, pregnancy, parental status, genetic information, political affiliation, or any other status protected by the laws or regulations in the locations where we operate.
Создайте идеальное резюме с помощью ИИ-агента

Навыки
- Apache Airflow
- Prefect
- dbt
- Python
- SQL
- Pinecone
- Weaviate
- PostgreSQL
- AWS
- Azure
- Google Cloud Platform
- Great Expectations
- Vector Databases
- RAG
Возможные вопросы на собеседовании
Проверка опыта работы с современными инструментами управления данными для AI.
Расскажите о вашем опыте проектирования схем данных специально для RAG-систем. Какие ключевые отличия вы выделяете по сравнению с традиционными аналитическими хранилищами?
Оценка навыков обеспечения качества данных, что критично для обучения и работы AI.
Как вы организуете процесс валидации данных с использованием Great Expectations или аналогичных инструментов в реальном времени?
Проверка умения работать с векторными хранилищами.
Какие критерии вы используете при выборе между Pinecone, Weaviate и pgvector для конкретного проекта?
Оценка навыков решения проблем в сложных пайплайнах.
Опишите случай, когда данные в AI-системе стали неактуальными или «битыми». Как вы обнаружили проблему и как изменили пайплайн, чтобы это не повторилось?
Проверка коммуникативных навыков и умения работать с экспертами.
Как вы подходите к извлечению знаний у экспертов предметной области (domain experts) для их последующей структуризации в базе знаний?
Похожие вакансии
Staff Product Analyst, Consumer Analytics (f/m/x)
Senior Data Engineer
Senior Data Engineer, Marketing Customer Profile (f/m/x)
Senior Data Analyst
Senior Data Engineer, Global Data Products
Senior Data Analyst (Full Stack)
1000+ офферов получено
Устали искать работу? Мы найдём её за вас
Quick Offer улучшит ваше резюме, подберёт лучшие вакансии и откликнется за вас. Результат — в 3 раза больше приглашений на собеседования и никакой рутины!
- Страна
- Германия