Match score not available

Senior Reliability Engineer (L3)

Remote: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 

Offer summary

Qualifications:

3-5 years managing software deployments, Strong knowledge of devops principles, Experience with cloud platforms, e.g. AWS, Bachelor’s degree in Comp Sci or InfoSec, Strong programming skills, preferably Python or Go.

Key responsabilities:

  • Review architecture and ensure best practices
  • Own SLOs and SLAs; monitor operational metrics
  • Manage security controls and compliance standards
  • Lead incident response and disaster recovery plans
  • Develop documentation and perform technical audits
Horizontal Talent logo
Horizontal Talent Human Resources, Staffing & Recruiting SME https://www.horizontaltalent.com/
201 - 500 Employees
See more Horizontal Talent offers

Job description

Logo Jobgether

Your missions

Title: Senior Site Reliability Engineer (L3)

Team: Software Engineering

Working Arrangement: Remote

About Horizontal: Established since 2003 in the US, Horizontal solves complex challenges across two distinct businesses: Horizontal Digital and Horizontal Talent. We are consistently recognized for being a top workplace and one of the fastest growing private companies. Horizontal Talent specializes in staffing for IT, Digital & Creative and Business & Strategy markets. We have global offices in US, UAE, India, Malaysia and Australia.

What you'll be doing:

  • System Architecture: Review architecture and software components with software engineers and architects. Ensure best practices are consistent across all teams.
  • Operational Excellence: Own and ensure SLOs and SLAs are met. Monitor operational metrics and lead improvement plans. Develop tools including infra-as-code resources to scale operations and allow other teams to be autonomous.
  • Security and Compliance: Manage and audit security controls to meet enterprise requirements. Implement and maintain best practices and compliance standards. Collaborate with legal and compliance to assess overall risk management.
  • Release Planning: Conduct performance tests for large scale events or critical releases.
  • Disaster Recovery: Develop and implement DR plans and procedures, including data recovery and fault injection simulations on production replicas.
  • Incident Management: Lead incident response and post-mortems to resolve production issues, identify root-causes and prevent future occurrences.
  • Documentation: Develop runbooks and other technical assets. Complete periodic technical audits as required.
  • Daily Operations: Perform and improve day-to-day tasks including access onboarding-offboarding, config and patch management etc.
  • Sharpen the Saw: Stay up-to-date with emerging trends, threats and technologies to propose improvements and proof-of-concepts in technical roadmaps.
  • Team Player: Collaborating with cross-functional teams to ensure smooth deployment and operation of software releases. Answer technical questions from other teams or outside the organization.
  • Coaching: Provide feedback on the performance of junior staff and participate in people development initiatives.
  • Support any ad hoc tasks as required by the company.

What We Look For In You

  • Proven Track Record: 3 to 5 years in managing software deployments and instrumentation in production environments with defined SLAs and SLOs. Strong knowledge of software delivery and devops principles.
  • Cloud Operations: Experience with cloud platforms (e.g., AWS, CloudFlare, GCP) and infrastructure-as-code tools (e.g., Terraform, CloudFormation). Strong programming and scripting skills, preferably in languages such as Python, Go, or Ruby.
  • Accreditation: Bachelor’s degree in Comp Sci., InfoSec or similar fields, or professional certificates e.g. Certified DevOps Professional, Certified Solutions Architect Professional in AWS or GCP.
  • Scope of Work: Fully capable of taking substantial features from concept to shipping as a sole contributor. Works effectively in open-ended projects and is self-sufficient to deep dive and evaluate multiple solutions to a problem.
  • Problem Solving: Solve hard problems with many constraints, using sound judgment to assess risks and present arguments in a well-structured, data-backed, written narrative. Have passion, creativity and empathy for users.
  • Quick Thinking: Able to derive information, think critically and make snap judgements based on measured data in high pressure situations.
  • People Skills: Strong communicator who is able to build positive working relationships between teams and form relationships with key customers.

Nice To Have

  • Experience working in a early-to-growth stage startup.
  • Experience building applications in different tech stacks.
  • Keen interest in decentralized technologies and its applications including cryptocurrencies.

Some Of The Perks

  • Remote Work Flexibility: Work wherever you feel most productive. We also provide office space in 1Powerhouse (Malaysia) and WeWork (Singapore) if you ever feel like meeting your colleagues in person.
  • Flexible Working Hours: No 9-5 structure, work the hours you need to get your tasks done.
  • Comprehensive Insurance Coverage: We provide life, medical, and critical illness insurance.
  • Virtual Share Options: You'll be entitled to virtual options, with terms and conditions.
  • Bonus: You’ll be entitled to a bonus, with terms and conditions.
  • Parking Allowance: You will be given a monthly fixed allowance of RM 150
  • Meal Allowance: You will be given a monthly fixed allowance of RM600
  • Learning Allowance:
  • Social Activity Allowance: Want to set a date to watch a movie or play futsal with your colleagues? Get it organized and we subsidize a portion (claim basis) of the cost.
  • Annual Company Offsite: We gather once a year to meet each other in person, reflect on the year, and partake in social activities!

Required profile

Experience

Level of experience: Senior (5-10 years)
Industry :
Human Resources, Staffing & Recruiting
Spoken language(s):
Check out the description to know which languages are mandatory.

Soft Skills

  • Verbal Communication Skills
  • Problem Solving
  • Coaching
  • Operations
  • Critical Thinking

Site Reliability Engineer Related jobs