Overview

We’re growing and we’re looking for a dedicated Site Reliability Engineer to help us automate operations and build an infrastructure for growth. We’ve found product market fit and now we’re working to ensure that our systems are available, performant, and visible as we develop architecture that is scalable, cost-effective, secure, and reliable. Your role will be to help us meet those goals and set even more ambitious ones, looking forward to design systems that will take us into the future while helping us become hyper-aware of what’s happening in our systems right now.

This is an opportunity to engage with cutting-edge technology and work on a real-world problem at global scale. In addition to competitive compensation and benefits there is also room for the right person to take on increased responsibilities. And it’s a lot of fun (although fast-paced and even chaotic at times) working as part of a small, passionate team.

Responsibilities

  • Take ownership of our infrastructure as code, which is currently in Terraform
  • Lead our DevOps culture, encouraging and enabling developer effectiveness through powerful and secure tooling
  • Keep us abreast of what’s happening with our systems and our customers up to the second — we have one tracing obsessive in the team and we’re all trying to be a bit more like him
  • Expand and improve our CI and CD systems
  • Help us develop and uphold SLIs and SLOs
  • Develop and maintain a (blameless) postmortem practice
  • Make monitoring and alerting alert on symptoms and not on outages

Qualifications

  • Excellent systems thinking: edge cases, failure modes, behaviours, specific implementations
  • Experience in a DevOps oriented role
  • Experience with at least one major cloud provider (we use AWS)
  • Experience with infrastructure as code, especially Terraform
  • Familiarity with and interest in security best practice
  • Operational experience with containers — Kubernetes a plus
  • Strong programming and shell scripting skills
  • An obsession with documentation
  • Ability to thrive in a remote-first team