Lead Site Reliability Engineer (SRE)
As a Lead SRE (Site Reliability Engineer) for CTS you will work with customer GCP environments, identifying opportunities for optimisation and delivering continual improvement. You’ll help shape and grow the SRE department, and will line manage and mentor a small team of Support Engineers and SREs.
As the Lead over the SRE function, you will be expected to have strong opinions about the future roadmap of the department, and influence the business on how to maintain best practice on all our engagements.
You will be an empathetic leader who cares about the development and growth of your team. This involves assisting with the personal development and enabling the successful delivery of their projects.
You will be responsible for the development and deployment of the best in breed technologies to monitor, alert and proactively manage our customers' GCP environments. You will lead the internal SRE tooling roadmap and help drive consistency and best practice across our growing engineering community (60+) in order to deliver high quality services to our customers.
Exposure to a wide variety of technologies and software is a given. You will be expected to investigate the use of new technologies as they become available and provide recommendations to both leaders in the business and the engineering community.
*Please Note: Previous experience with GCP is not required; experience of other public cloud providers (AWS or Azure) would be sufficient.
For this role we are able to offer:
- Hybrid working - you can work from where you want, when you want.
- Flexible hours - we ask that you work during core hours (10-4) to help with collaboration, but outside of that you can work when suits you
We also encourage you to apply if you need;
- Condensed hours - Working full time hours during a shorter number of days
- Identify optimisation opportunities across customer GCP environments and lead the implementation of said improvements.
- Lead a team of engineers working on continuous improvement/delivery of our services, with a focus on proactive improvements of customers’ services.
- Provide leadership and mentoring to the wider engineering team with a focus on SRE principles.
- Provide thought leadership on building and nurturing the SRE function
- Design, plan and implement highly available, global and cloud native improvements.
- Develop and manage the internal tooling for CTS Google Estate.
- Be a champion for highly available, reliable services which are continuously improved versus reactive break fix support.
- Champion blameless postmortems / route cause analysis and sharing of best practices so that service reliability is increased.
- On call; roughly one in four weeks, paid above salary.
- Extensive hands-on experience with at least one major cloud provider (GCP, AWS, Azure)
- Previous experience in a SRE or DevOps role
- Previous experience of leading teams and mentoring more junior team members.
- Kubernetes operational experience at scale
- Ability to automate common tasks in at least one programming language (i.e. reduce toil)
- A thorough understanding of Terraform and associated deployment methodologies
- Incident Management - able to triage incidents and effectively handle escalations.
- Good awareness of application development, delivery and infrastructure methodologies
- GCP / AWS professional certifications
- Understanding of Agile ways of working
- Certified Kubernetes Administrator
- Prior experience working in a customer-facing role.
- Solid knowledge of cloud architecture principles and highly available systems
Experience with some of the following technologies:
- Containerisation technologies (e.g. Docker)
- Configuration management tooling (e.g. Puppet, Chef, Ansible, Salt)
- Excellent understanding of the Linux operating system (e.g. Debian, CentOS)
- Continuous integration and deployment tool sets (e.g. Jenkins, ArgoCD, GoCD)
- Common DevOps technology stacks (e.g. ELK, NGINX, Apache, RabbitMQ, Elasticsearch, Redis, Consul)
- Site Reliability Engineering
- Database technologies including MySQL and PostgreSQL
- Serverless and microservice architectures
This role would be primarily remote based, with occasional travel to customer sites in the UK and to our offices in Manchester, Edinburgh and London. There might be the possibility of travelling to customer sites in the Netherlands.
What you’ll get:
In addition to the competitive salary, you’ll get private health insurance and company contributed pensions as standard.
We have a multitude of other benefits, including a market leading parental leave policies, health & wellbeing initiatives and access to a discounts and rewards programme (including discounted gym membership).
You’ll be invited to the annual international company ‘kick off’ conferences, which are a great chance to meet your colleagues that you don’t see everyday.
Fair Pay. Done Right.
We don't advertise salary brackets because we don't have salary brackets. We encourage conversation about your (and our) salary expectations from the off and throughout your time with us. We don't want to discourage anyone from applying because they are on significantly more or less than a "bracket".
Over the last 10+ years our open culture has been the backbone of our growth. We’ve nurtured an environment that empowers our people to get the job done. Meanwhile, we invest in their education, development, and - whether for family, mental health or other reasons - we give the flexibility to maintain a real work-life balance.
We strive to progress our industry as a whole and are using the B Corporation framework to continuously build on how we can treat our employees, community and environment with respect. We’re excited to see the progress we can collectively make.
Over the coming years we want to become the number one dedicated Global Partner for Google Cloud. To get there we’re growing and building talented teams ready to change the world using Google technologies. So if you’re passionate, curious and keen to get stuck in - Get in touch and join us for the ride!