Senior Site Reliability Engineer (Remote)

Remote-EMEA
Posted 8 hours, 15 minutes ago
Engineering

About the role

Job summary

As a Senior Site Reliability Engineer, you will work autonomously on complex reliability and platform challenges, taking ownership of feature planning and execution within the SRE/Platform domain. You will enhance the platform's architecture and reliability strategy, translating vague requirements into effective, maintainable solutions while collaborating with product and security teams in a fully remote environment.

Qualifications

  • Extensive experience in Site Reliability Engineering, DevOps, or Platform Engineering.
  • Proficient in Kubernetes, including operating and scaling production clusters and container tooling (Docker).
  • Experience in building and managing cloud infrastructure, preferably on AWS or similar platforms.
  • Strong skills in infrastructure-as-code using Terraform.
  • Familiarity with reliability frameworks such as SLOs, SLIs, and alerting strategies.
  • Background in observability tools like OpenTelemetry and Grafana/Prometheus.
  • Proficient in CI/CD tools (e.g., GitLab CI, GitHub Actions) and deployment automation.
  • Comfortable with programming in Golang and Bash; broader programming knowledge is a plus.
  • Practical experience using AI in infrastructure and operations work.
  • Excellent communication skills, particularly in an asynchronous global setting.
  • Proactive and curious, with a strong sense of ownership.
  • Ability to collaborate respectfully across diverse cultures and time zones.

Responsibilities

  • Lead the discovery and delivery of solutions for reliability and infrastructure challenges.
  • Contribute to the platform's architecture and tooling, influencing team priorities.
  • Define and operate reliability practices, taking responsibility for operational metrics.
  • Address cross-team requests and develop reusable solutions for recurring issues.
  • Integrate AI workflows into team operations, creating reusable prompts and tools.
  • Mentor junior engineers and participate in hiring and onboarding processes.
  • Collaborate with security teams on platform hardening and threat mitigation.
  • Engage in incident response and on-call rotations to maintain system reliability.

Skills

  • Strong experience with Kubernetes and cloud infrastructure management.
  • Proficiency in infrastructure-as-code and observability tools.
  • Familiarity with CI/CD practices and programming languages.
  • Experience in AI applications within infrastructure and operations.

Education

  • Relevant degree or equivalent experience in a related field is preferred.

Tools

  • Kubernetes, Docker, AWS, Terraform, OpenTelemetry, Grafana, Prometheus, GitLab CI, GitHub Actions.
Full Access

Ready to apply for this role?

Full Access gives you the company name, full job description, and a direct link to apply. The summary above helps you explore the role.

Share this job