DevOps Lead (Platform Engineering Group)
At CommerceIQ, we help consumer brands accelerate their retail ecommerce market share growth and profitability through machine learning algorithms. We are building the world’s most complete and sophisticated Retail Ecommerce Management Platform, which connects and intelligently automates the management of retail ecommerce channels like Amazon, Walmart, and Instacart, across the entire ecommerce operational chain of retail media management, sales operations, supply chain, and digital self analytics.
We are in hyper growth mode, having recently raised our Series D funding at unicorn valuation (>$1B) and ended our third year of triple-digit revenue growth. Continued acceleration of our growth is fueled by landing new customers, expanding our platform through new products, managing new retail ecommerce platforms, and delivering exceptional customer service to unlock high net retention rates.
Top consumer brands like Nestle, Kimberly Clark, Natures Bounty, Johnson & Johnson, Mondelez, Kellogg to name a few rely on CommerceIQ suite of products to make efficient business decisions on a daily basis. It is critical to have services which run high quality data and algorithms that drive business decisions for our customers in a timely manner. As a DevOps you are responsible for building scalable, extensible, secure infrastructure on cloud for CommerceIQ’s applications and data science teams.
In this role, you will work with a couple of Infra/DevOps SDEs and partner with engineering managers to achieve CommerceIQ’s QoQ goals. As a DevOps lead, you will play a crucial role in upkeep security, cost, deployment infrastructure and maintain/develop services owned by Infra/DevOps team.
A successful candidate will be obsessed with technology and relentlessly raise the bar on the architecture, design and quality of code delivered while aggressively pursuing optimizations to meet cost and scale SLAs. The candidate should be capable of managing a fast-paced delivery schedule and influence and drive a high-level engineering strategy with the leadership, as well as take a hands-on approach to implementing that strategy.
Functional level Expectations
- You should possess advanced knowledge of AWS and other software design approaches to guide the DevOps team in designing infrastructure that caters to scale, concurrency. Preferably AWS certified.
- You should be able to work in situations where you can use your prior expertise and judgment to determine goals, identify constraints, and propose an actionable plan.
- Your work is typically focused on working with multiple teams’ architecture and product solutions.
- You should be able to lead the design and implementation that are extensible and scalable
- For continuous integration and deployment
- For multi cloud deployments
- For DR of various services with CiQ infrastructure
- You should be able to influence as a “team lead”, management decisions and priorities and actively mentor to create force multipliers
- You should drive teams to adhere to engineering best practices in SDLC like code coverage, acceptance testing, CI/CD and design patterns that ensure consistency and standardization of architecture
- You should proactively simplify code, identify bottlenecks and resolve team architecture deficiencies.
- You should be able to work along with other SDEs on their team, build relationships with stakeholders including customers, product managers, cross functional partners and external partners and integrate for a cohesive launch.
- At least 6-8 years of technology experience including 3+ years of working experience in cloud
- Ideal candidate should have handled operations, deployment and security of multiple SaaS/B2C products
- Deep knowledge of Python, Cloud or GoLang and one of the public clouds
- Good knowledge of log collection designs like EFK and metric collection tools like newrelic, Prometheus/Grafana or SignalFx
- Good knowledge in design, creation and consumption of RESTful API, Micro service architectures on public clouds preferably AWS
- Excellent analytical, communication and coding skills is a must.
- Thorough orientation towards code reviews, coding/design standards and documentation
- Good knowledge in designing with messaging systems such as SQS, Kafka.
- Good knowledge in designing with ETL pipelines using AWS Step, Azkaban, Airflow
- Good knowledge of any big data engine like Spark would be an advantage.
Below is a brief description of the charter for Infra team (subject o product/business deliverables)
Prometheus / Grafana
K8s metrics and deployment metrics
We use newrelic(NR) for log collection and operational metrics. But certain shortcomings and solved problems in Prometheus/Grafana might force us to move away from NR. Apart from the prometheus track, the lead will have to work with all. stakeholders [apps/DS] to migrate from OCD to K8s, with all balance and checks in place
Log collection and analysis is via newrelic and it’ll cost us as we move more applications to K8s. We need to optimise or explore options like EFK stack of logs or sumo logic etc
We need to be proactive and do frequent security audits of our applications. We need a security center with all our documentation. Place to store the docs/questionnaires we filled for our clients etc.
We need to audit all our services for disaster recovery and develop a BCP across the company.
The lead is expected to enforce git branching and set code cov, style check guidelines across teams and projects. E.g Publishing the guidelines for Java project, NodeJS project, Python project. Establishing average code cov metrics, running static code analysis, findbug and other security code scanning tools
The Infra team also has many services, especially crucial services like BSS (CSS) . These infra services should set the benchmark for DevOps best practices in the org.
Azkaban is our ETL orchestrator. It's run in-house and has custom code deployed. Maintenance of Azkaban to match growth is needed. Exploration and guidance to use other orchestrators like airflow, step function, astronomer is also expected
AWS admin and access / SF admin and access / AWS/SF cost , unit cost for client
Constant efforts to identify and reduce costs
AWS partnership [case studies, blogs, trainings, tech conf talks]
Cost savings across all our infrastructure.
SOC2 and other compliance
Automate our soc2 process with diligence. This is more important as we expand to other pillars of soc2 and ISO.