Senior Site Reliability Engineer

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

5+ years of experience in Site Reliability Engineering or related field., Hands-on experience with Kubernetes and AWS Cloud Platform., Proficiency in scripting/programming languages such as Python or Go., Strong understanding of Infrastructure as Code (IaC) using tools like Terraform..

Key responsibilities:

  • Ensure the reliability, availability, and performance of critical systems.
  • Develop and maintain automation scripts and monitoring solutions.
  • Lead incident response and conduct post-mortem analysis.
  • Participate in on-call rotations for 24/7 critical system support.

Turtle Trax S.A. logo
Turtle Trax S.A. Startup https://www.turtle-trax.com/
2 - 10 Employees
See all jobs

Job description

Job Title: Senior Site Reliability Engineer (SRE)

Experience: 5+ years Location: Mexico/LATAM

Engagement Type: Full-Time/contractual, Fully Remote

Job Description:

We are seeking a skilled Senior Site Reliability Engineer (SRE) to join our offshore team. In this role, you

will be responsible for ensuring the reliability, performance, and scalability of our critical systems. You'll

develop automation, build monitoring solutions, lead incident response, and work closely with

engineering teams to implement infrastructure as code, CI/CD, and cloud-native tools.

Job Responsibilities:

● Maintain the reliability, availability, and performance of critical systems

● Develop and maintain automation scripts and tools to streamline operations

● Develop and maintain monitoring dashboards and alerts

● Lead incident response, conduct post-mortem analysis, and implement preventative measures

● Optimize system performance and scalability

● Implement and maintain security best practices

● Create and maintain comprehensive system and process documentation

● Participate in on-call rotations for 24/7 critical system support

Must Haves:

● Kubernetes (hands-on experience) – managing and deploying workloads

● AWS Cloud Platform – deep understanding and production experience

● Infrastructure as Code (IaC) – using tools like Terraform (or CloudFormation/Ansible)

● Scripting/Programming – Proficiency in Python or Go

● Monitoring & Alerting – Experience with Prometheus, Grafana

● CI/CD Pipelines – Jenkins, GitLab CI, or similar

● Incident Management – Proven experience in responding to and analyzing outages

● Linux Systems & Networking – Strong fundamentals

Good to Haves:

● ArgoCD, Linkerd, Karpenter, or other Kubernetes-related tools

● Logging tools – Loki, ELK Stack

● Security best practices – Cloud and container security knowledge

● Leadership/Mentorship – Experience guiding junior engineers

● Post-mortem writing & RCA – Comfortable documenting incidents and learnings

● Experience in distributed systems or high-availability architectures

Recruitment Process:

● AI-based online screening test

● Assignment

● 2 client interviews

● CEO Discussion

● Offer: Successful candidates will receive an offer to join the team.

Soft Skills

● Excellent verbal and written communication skills in English - Must

● Strong problem-solving ability with a customer-first mindset

● Accountability – Takes ownership of reliability and incident outcomes.

● Demonstrated ability to operate in high-pressure, multitasking environments independently

● Passion for supporting and helping others


Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Leadership
  • Accountability
  • Communication
  • Problem Solving

Site Reliability Engineer (SRE) Related jobs