Match score not available

Site Reliability Engineer

Remote: 
Full Remote
Contract: 
Experience: 
Mid-level (2-5 years)
Work from: 

Offer summary

Qualifications:

2+ years experience with AWS, Extensive experience using AWS EKS, 1 year as a technical lead, Proficiency in Python, Ruby, Elixir, Go, Javascript, or Rust, Understanding of container and hypervisor fundamentals.

Key responsabilities:

  • Maintain 24/7 production environment on Kubernetes
  • Implement DevOps methodologies
  • Perform proactive system monitoring and configuration
  • Respond to incidents
TableCheck logo
TableCheck SME https://www.tablecheck.com/
51 - 200 Employees
See more TableCheck offers

Job description

Job Description

TableCheck, Japan's leading restaurant reservation management platform, is seeking a Site Reliability Engineer. As a member of our SRE team you will own the technology stack and help support our demanding business and developer needs.
We run a robust and fault-tolerant infrastructure built on Amazon Web Services (AWS) with Terraform, Kubernetes, Helm, and an array of tools for CI/CD, logging, monitoring, and so on. We emphasize DevOps best practices such as agile, scrum, automation, and customer-centric improvements.
TableCheck has embraced remote work. As such, communication and documentation are in our blood. We look for and write about signals in the noise which enables us to constantly learn from mistakes and adapt, and we expect members of our teams to constantly follow up with questions and updates to keep everyone in the loop.
You can read more about Working at TableCheck as an SRE.


Responsibilities include

  • Following SRE principles to maintain a 24/7 production environment running on Kubernetes
  • Implementation of DevOps methodologies to improve IT team quality of life
  • Proactive system monitoring and configuration
  • Incident response

 

Mandatory Skills

  • Must have at least 2 years experience with Amazon Web Services (AWS), with particular focus on EKS, EC2, RDS, Fargate, CloudFront, Lambda, and S3
  • Must have extensive experience using AWS EKS
  • Must have experience in direct software engineering following DevOps / SRE practices with at least 1 year as a technical lead
  • Current ability in at least one of the following languages: Python, Ruby, Elixir, Go, Javascript, Rust
  • Must understand container and hypervisor fundamentals
  • Configuration management (YAML / Bash), experience with Helm and Terraform preferred
  • Experience running production systems at large scale, and an understanding of the kinds of problems that can occur along with likely solutions

 

Recommended Skills 

  • Previous startup experience is highly desired
  • Terraform, Pulumi
  • ArgoCD
  • Prometheus
  • Grafana
  • PostgreSQL
  • MongoDB
  • Kafka
  • Security, PCI-DSS, GDPR, forensics, etc

 

Language Skills

  • A native level of English is required. (No Japanese skill is required for this role.)

 

Evaluation Criteria
We will evaluate candidates based on the following stages:

  • Initial interview - a one-on-one 30 minute chat over Google Meet to see if we're the right fit
  • Technical interview - (virtually) meet the SRE team at TableCheck to evaluate your skills (no whiteboard or materials required)
  • Take-home project - we will provide you with a 30-60 minute project, which will evaluate your dev and ops skills

Required profile

Experience

Level of experience: Mid-level (2-5 years)
Spoken language(s):
EnglishEnglish
Check out the description to know which languages are mandatory.

Other Skills

  • Communication

Site Reliability Engineer (SRE) Related jobs