SRE Engineer

SRE Engineer

SRE Engineer

Remotive

Remotive

Remote

12 hours ago

No application

About

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

This role involves ensuring the reliability, performance, and scalability of our MarTech SaaS platform that serves millions of users running thousands of marketing campaigns daily.

  • Monitor systems, respond to incidents, and implement automation to improve platform reliability.
  • Design, implement, and maintain comprehensive monitoring and alerting systems using tools such as Prometheus, Grafana, and DataDog.
  • Lead incident response efforts, conduct root cause analyses, and implement preventive measures.
  • Build and maintain automation tools and processes to reduce manual work and enhance system resilience.
  • Identify and implement reliability improvements across our platform.
  • Monitor system performance trends and plan for scaling needs.
  • Create and maintain runbooks, procedures, and system documentation.

Qualifications

  • 3+ years of hands-on experience in site reliability engineering, DevOps, or similar roles.
  • Strong knowledge of SRE best practices including SLIs/SLOs, error budgets, and reliability engineering principles.
  • Cloud Platform experience with services like Compute Engine, Kubernetes, Cloud SQL, and related infrastructure components.
  • DataDog or similar expertise for monitoring, alerting, and observability.
  • Backend development experience with Java, PHP and/or Node.js.
  • Incident management skills including on-call experience and troubleshooting under pressure.
  • Automation mindset with experience in scripting and Infrastructure as Code principles.

Requirements

  • SaaS platform experience, particularly in high-volume environments serving millions of users.
  • MarTech or AdTech industry background with understanding of campaign management systems.
  • Experience scaling systems that handle thousands of concurrent operations.
  • CI/CD pipeline experience and deployment automation.
  • Security best practices knowledge for cloud environments.

Benefits

  • Remote-first culture with flexible working arrangements.
  • High-impact role in a small, collaborative team.
  • Growth opportunities as we scale our platform and expand our engineering team.
  • Competitive compensation and benefits package.
  • Learning budget for professional development and certifications.
  • Modern tech stack with opportunities to work with cutting-edge solutions.