Senior Site Reliability Engineer (Remote)

Remote-EMEA
Posted 2 hours, 47 minutes ago
Engineering

About the role

Job summary

The role involves owning the operational excellence and infrastructure strategy for a platform that integrates AI agents across HR and Finance. The position is fully remote and focuses on ensuring reliability, performance, and security for customers.

Qualifications

  • Proven experience in Site Reliability Engineering, DevOps, or SysOps roles, with a track record of managing production systems at scale.
  • Deep hands-on experience with Kubernetes in production environments and solid AWS fundamentals.
  • Proficiency in infrastructure-as-code tools like Terraform.
  • Experience with CI/CD tools such as GitLab, GitHub Actions, or Jenkins.
  • Strong bash scripting skills and familiarity with Linux system-level issues.
  • Excellent communication skills for explaining complex infrastructure concepts to diverse audiences.

Responsibilities

  • Design, implement, and maintain infrastructure-as-code patterns using Terraform and Kubernetes.
  • Build and maintain monitoring, logging, and alerting systems; lead incident response and post-mortems.
  • Collaborate with the Security team to ensure compliance and security across infrastructure.
  • Optimize system performance and resource utilization while managing cloud costs.
  • Identify and eliminate manual operational tasks to enhance efficiency.
  • Partner with platform teams to ensure resilience and observability of APIs and tools.

Skills

  • Experience with backend programming languages (e.g., Elixir, Python, Go, Java, Node.js) is a plus.
  • Familiarity with observability tools (e.g., Datadog, Prometheus, ELK, Grafana) is advantageous.
  • Experience in consultancy settings and scaling multi-tenant platforms is a bonus.

Education

  • Relevant experience in a technical role is required; formal education details are not specified.

Tools

  • Terraform, Kubernetes, AWS, CI/CD tools (GitLab, GitHub Actions, Jenkins), and monitoring tools (Datadog, Prometheus, ELK, Grafana).
Full Access

Ready to apply for this role?

Full Access gives you the company name, full job description, and a direct link to apply. The summary above helps you explore the role.

Share this job