Job Title: Senior Site Reliability Engineer (SRE)
Experience: 5+ years Location: Mexico/LATAM
Engagement Type: Full-Time/contractual, Fully Remote
Job Description:
We are seeking a skilled Senior Site Reliability Engineer (SRE) to join our offshore team. In this role, you
will be responsible for ensuring the reliability, performance, and scalability of our critical systems. You'll
develop automation, build monitoring solutions, lead incident response, and work closely with
engineering teams to implement infrastructure as code, CI/CD, and cloud-native tools.
Job Responsibilities:
● Maintain the reliability, availability, and performance of critical systems
● Develop and maintain automation scripts and tools to streamline operations
● Develop and maintain monitoring dashboards and alerts
● Lead incident response, conduct post-mortem analysis, and implement preventative measures
● Optimize system performance and scalability
● Implement and maintain security best practices
● Create and maintain comprehensive system and process documentation
● Participate in on-call rotations for 24/7 critical system support
Must Haves:
● Kubernetes (hands-on experience) – managing and deploying workloads
● AWS Cloud Platform – deep understanding and production experience
● Infrastructure as Code (IaC) – using tools like Terraform (or CloudFormation/Ansible)
● Scripting/Programming – Proficiency in Python or Go
● Monitoring & Alerting – Experience with Prometheus, Grafana
● CI/CD Pipelines – Jenkins, GitLab CI, or similar
● Incident Management – Proven experience in responding to and analyzing outages
● Linux Systems & Networking – Strong fundamentals
Good to Haves:
● ArgoCD, Linkerd, Karpenter, or other Kubernetes-related tools
● Logging tools – Loki, ELK Stack
● Security best practices – Cloud and container security knowledge
● Leadership/Mentorship – Experience guiding junior engineers
● Post-mortem writing & RCA – Comfortable documenting incidents and learnings
● Experience in distributed systems or high-availability architectures
Recruitment Process:
● AI-based online screening test
● Assignment
● 2 client interviews
● CEO Discussion
● Offer: Successful candidates will receive an offer to join the team.
Soft Skills
● Excellent verbal and written communication skills in English - Must
● Strong problem-solving ability with a customer-first mindset
● Accountability – Takes ownership of reliability and incident outcomes.
● Demonstrated ability to operate in high-pressure, multitasking environments independently
● Passion for supporting and helping others
Exoscale
Evoluum
Brixio Singapore
Sustain.Life
Nearsure