Site Reliability Engineer Lead - Catalyst

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Bachelor’s Degree or higher in Computer Science, Software Engineering, or related technical field, or equivalent practical experience., 5+ years of professional experience in SRE, DevOps, Platform Engineering, or Infrastructure roles., Strong scripting and programming skills in languages such as Bash, Python, Go, or Rust., Experience with CI/CD systems and cloud platforms like AWS, GCP, or Azure..

Key responsabilities:

  • Lead the Service Reliability team to ensure high-quality, stable environments for customers.
  • Support build, deployment, and configuration management for multi-tier applications.
  • Collaborate with agile teams to establish automated regression suite infrastructure and performance testing.
  • Develop tooling for internal and external users to monitor and maintain production systems.

Input Output (IOHK) logo
Input Output (IOHK) Information Technology & Services Scaleup https://iohk.io/
201 - 500 Employees
See all jobs

Job description

Who are we?

IOG, is a technology company focused on Blockchain research and development. We are renowned for our scientific approach to blockchain development, emphasizing peer-reviewed research and formal methods to ensure security, scalability, and sustainability. Our projects include decentralized finance (DeFi), governance, and identity management, aiming to advance the capabilities and adoption of blockchain technology globally.

We invest in the unknown, applying our curiosity and desire for positive change to everything we do. By fueling creativity, innovation, and progress within our teams, our products and services are designed for people to be fearless, to be changemakers.

About Catalyst:

IOG’s Catalyst Tribe is a pioneering innovation platform for the Cardano blockchain ecosystem. It enables decentralized communities to propose, evaluate, and fund projects, fostering innovation through efficient grant allocation, verifiable decision-making, and voter privacy. With core products including Catalyst Voting application and upcoming advances for idea incubation, and distributed decision-making, Catalyst solves critical challenges in ecosystem growth hacking and decentralized governance while exploring network effect-led monetization opportunities.

What the role involves:

As Site Reliability Engineer Lead at IOG you will have strong functional programming and operations skills. As part of our Service Reliability team, you will work closely with geographically diverse experts and the Research & Development teams to ensure high-quality, stable environments for our customers. 

  • Working on ‘build and deployment cycles’ across all development environments
  • Supporting the build, deployment, and configuration management for multi-tier applications
  • Participating in the building of tools and processes to support the infrastructure. 
  • Improving and maintaining tooling and scripts for automation purposes
  • Develop tooling for internal and external users to monitor and maintain production systems.
  • Supporting our teams to write software that is simple and flexible to configure and deploy
  • Collaborating with agile teams to establish and maintain automated regression suite infrastructure and performance testing infrastructure
  • Building capabilities to allow development teams to be self-sufficient

Leadership

As Leaders it is our responsibility to motivate, develop and progress our fellow team members. As a Leader there is a need to communicate openly with all members of your team, address any issues head on and not shy away from difficult conversations.

Empowering your team to provide the best results by organizing clear processes and coordinating team efforts should be your top priority.

Please read our Leadership at IO Global document for more information on your duties and responsibilities as a leader at IOG

Requirements

Who you are:

  • Bachelor’s Degree or higher in Computer Science, Software Engineering, or related technical field, or equivalent practical experience
  • 5+ years of professional experience in SRE, DevOps, Platform Engineering, or Infrastructure roles
  • 2+ years in a technical leadership or senior engineering capacity
  • Proven track record of building and operating highly available, distributed, fault-tolerant systems
  • Strong foundation in Linux system internals, networking (TCP/IP, DNS, HTTP), and systems programming
  • Demonstrated experience in open-source contribution is highly desirable
  • Experience leading incident responses, writing post-mortems, and driving reliability improvements
  • Experience working with Agile, Kanban, or similar development methodologies
  • You will be someone who works well on your own and with a team
  • You value cooperation and collaboration above all, and are not afraid to ask for clarification or help when needed
  • You are kind and respectful of others’ opinions, and you are open and act with integrity when engaging in academic or technical discussions
  • Strong scripting and programming skills: Bash, Python, Go, or Rust preferred
  • Extensive experience with Git: branching strategies, GitOps workflows, code review best practices
  • Experience with CI/CD systems, such as GitHub Actions, GitLab CI, Jenkins, Buildkite, or equivalent
  • Cloud platform proficiency: AWS, GCP, Azure — including compute, storage, networking, and IAM
  • Containerization and orchestration: deep experience with Docker and Kubernetes (k8s), Helm
  • Infrastructure as Code (IaC): using Terraform, Pulumi, or similar tools
  • Configuration management: Ansible, Chef, or SaltStack (with preference for declarative approaches)
  • Monitoring, logging, and observability: Prometheus, Grafana, Loki, OpenTelemetry, Datadog, or similar
  • Security best practices: secrets management (Vault, SOPS), least privilege, security incident handling
  • Incident Management and Root Cause Analysis (RCA): strong ownership in production reliability
  • Automated testing and validation: unit testing, integration testing, chaos engineering exposure
  • Experience managing large-scale Linux-based systems: operational excellence in Ubuntu, Debian, or NixOS environments
  • Advocate of DevOps/SRE culture: focus on reducing toil, Service Level Objectives (SLOs), error budgets
  • Strong communication skills: written and verbal, capable of collaborating across distributed teams

Are you an IOGer?

Do you find yourself questioning the status quo? Do you tinker with ideas and long to turn those ideas into solutions? Are you able to spark thoughtful debates, bringing out the inquisitiveness in others? Does the promise of continuously growing excite you? Then get ready to reimagine everything you thought wasn’t possible because that’s what it means to be an IOGer - we don’t set limits, we break them. 

Location

IOG is a fully distributed organization but due to team distribution, we require someone to be based in the United States.

The base salary for this position has a range of $150k up to $175k per year at the commencement of employment. Any offer is determined by overall experience and performance during the interview process. This is only part of the total compensation package.

Benefits

All colleagues

  • Remote work
  • Laptop reimbursement
  • New starter package to buy hardware essentials (headphones, monitor, etc)
  • Learning & Development opportunities
  • Competitive PTO and Sick Leave plan

US Employees

  • Medical, Dental, and Vision benefits coverage for the employee and dependents
  • 401k
  • Health Savings Account
  • Life Insurance


At IOG, we value diversity and always treat all employees and job applicants based on merit, qualifications, competence, and talent. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Required profile

Experience

Industry :
Information Technology & Services
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Collaboration
  • Communication
  • Problem Solving

Site Reliability Engineer (SRE) Related jobs