Data Engineer at Docker
San Francisco, CA, US


Docker's IT Data Team is looking for a data engineer to help transform data generated from the Docker product and services ecosystem into actionable insights. You'll leverage both software engineering and analytics skills as part of the team responsible for managing data pipelines across the company: sales, marketing, finance, hr, customer support, and product development.  

Based in our San Francisco office, the growth team is continually refining our data pipelines to be more scalable, reliable, maintainable, and better integrated with our data ecosystem. In this role you'll help design and implement event ingestion, data models, and ETL processes that support mission-critical reporting and analysis.

You'll work together with other data engineers, analysts, project managers, and subject matter experts to deliver impactful outcomes to the organization. You'll participate in high-visibility projects along with occasional ad hoc questions from your internal customers. As the company grows, ensuring data flows reliably and accurately to business units and systems is a huge and exciting challenge. Come join a fast moving team tasked with making Docker an even smarter, data-driven enterprise.

We're looking for someone who's familiar with:

  •  Data warehousing concepts (including data model design and query optimization strategies)
  • Source system integration
  • Creating ETL scripts with Python/SQL/Docker
  • Automating business and reporting processes

Some of the tools/languages/systems you'll use are:

  • Docker
  • Python
  • Looker
  • Snowflake
  • AWS/Azure Infrastructure
  • Jenkins (CI/CD)
  • GitHub and JIRA

What's most important is that you're quick on your feet, meticulous, and excited to take on new challenges! You can tackle an ambiguous assignment and derive valuable insight. You use multiple tools and methods to find solutions, and couple that with intuition and quick tests to prioritize how to unravel complicated problems.

Day-to-day responsibilities:

  • Implement and oversee the Snowflake and ETL infrastructure
  • Maintain the integrity of data within our data pipeline and warehouse
  • Ensure quality of data and completeness of event logging across Docker codebase
  • Integrate data from 3rd party services such as Marketo and SalesForce
  • Develop ETL jobs and tests to process, validate, transport, collate, aggregate, and distribute data
  • Transform raw event logs into higher-order tables to make existing analysis easier and new analysis possible
  • Champion a data-informed mindset within our culture
  • Creating automated reporting of weekly and monthly metrics and ROI for the executive management team and board