Site Reliability Engineer


Company 

Nicholas Howard Ltd

Location 

London

Employment Hours 

Full Time

Employment Type 

Permanent

Salary 

Job Requirements/Description

Site Reliability Engineer

Are you a Site Reliability Engineer, Environment Manager, Platform Engineer, or a senior-level DevOps Engineer? Are you looking for an exciting role in a newly formed team that will drive innovation and create best-in-class development environments to support product innovation and delivery? Does a remote-first role sound good to you? If so, then this could be right up your street!

Nicholas Howard is delighted to be recruiting for a Site Reliability Engineer to join a leading systems integrator. Our client helps companies to establish, maintain and grow their IT services, and operate their critical technology in a more cost-effective manner. This is a brand-new role within the strategic engineering team, which sets and maintains design and development standards across IP development.

As a Site Reliability Engineer (SRE), you will ensure the reliability, availability, and performance of services, primarily utilising Microsoft Azure with a focus on containers, serverless, AI, analytics, and database services. You will work closely with development teams to build scalable and resilient systems and provide advisory support to our support teams. Although Azure will be our main Cloud Platform experience with AWS would be desirable. Fundamentally, the post-holder will play a crucial role in building the environment for internal development capability.

This is a remote-first role, with time in the office in London once a month.

Key Responsibilities:

  • Collaborate with development teams to design scalable and resilient architectures in Azure.
  • Develop and implement monitoring and alerting solutions to ensure service reliability.
  • Automate operational processes and tasks using Infrastructure as Code (IaC) and scripting.
  • Manage and optimise Azure resources, focusing on:
    • Containers (e.g., Azure Kubernetes Service (AKS), Azure Container Apps).
    • Serverless computing (e.g., Azure Functions, Logic Apps).
    • AI and analytics (e.g., Azure Machine Learning, Synapse Analytics, Data Factory).
    • Database services (e.g., Cosmos DB, Azure SQL, PostgreSQL).
  • Perform root cause analysis for incidents and implement preventative measures.
  • Provide advisory support to platform support teams.
  • Work in a multi-cloud environment, and while Azure is the primary focus, experience with AWS (e.g., ECS, Lambda, RDS) is beneficial.

Key Skills and Experience:

  • Proven experience as an SRE, or in a similar role.
  • Strong expertise in Azure services (containers, serverless, AI, analytics, databases).
  • Experience with implementing and utilising monitoring & logging tools (Azure Monitor, Application Insights, Datadog, Grafana).
  • Proficient in scripting & automation (Python, Bash, PowerShell).
  • Infrastructure as Code (IaC) experience (Terraform, Bicep, ARM Templates).
  • Experience with making technical decisions and implementing solutions that align with best practices and business goals.
  • Excellent problem-solving and collaboration skills.
  • AWS knowledge and experience would be a plus.

The company offers a highly competitive salary, along with comprehensive benefits including flexible remote working, a generous company pension, health and dental insurance, life assurance, access to the Udemy training platform to support ongoing skills development and training, and a wide range of additional lifestyle perks.

Please register your interest by applying now!

Company 

Nicholas Howard Ltd

Location 

London

Employment Hours 

Full Time

Employment Type 

Permanent

Salary 

An unhandled error has occurred. Reload 🗙