Match score not available

Site Reliability Engineer (SRE)

extra holidays - extra parental leave
Remote: 
Full Remote
Experience: 
Mid-level (2-5 years)
Work from: 

Offer summary

Qualifications:

3 years experience in SRE or DevOps, Strong knowledge of cloud providers like AWS or GCP, Hands-on experience with Kubernetes and Docker, Proficiency in scripting languages like Python or Bash.

Key responsabilities:

  • Ensure high availability and scalability of infrastructure
  • Manage and optimize cloud-based workloads
Lucidya | لوسيديا logo
Lucidya | لوسيديا Scaleup https://www.lucidya.com/
51 - 200 Employees
See all jobs

Job description

We are looking for a Site Reliability Engineer (SRE) to join Lucidya Cloud Engineering team and contribute to improving the reliability, scalability, and automation of our cloud-based infrastructure. The ideal candidate will have hands-on experience with cloud environments, containerized workloads, automation tools, and monitoring systems, as well as a proactive mindset for enhancing system availability and performance.

Key Responsibilities:
  1. Infrastructure Reliability

  • Ensure high availability (HA) and scalability of critical infrastructure components (e.g., Redis, RabbitMQ, Kubernetes clusters).
  • Proactively identify and eliminate single points of failure across the cloud environment.
  • Linux Systems Administration: Handle infrastructure management tasks such as patching, performance tuning, and monitoring of Linux-based systems.
  • Cloud Operations

    • Manage and optimize cloud-based workloads across AWS, GCP, or Azure.
    • Automate provisioning, scaling, and maintenance tasks using Infrastructure as Code (IaC) tools such as Terraform, AWS CloudFormation, or similar.
  • Kubernetes Clusters

    • Manage the day-to-day operations of Kubernetes clusters, including deployment, scaling, upgrades, and troubleshooting.
  • Monitoring and Incident Response

    • Implement and standardize monitoring solutions using tools like Datadog, Prometheus, or Grafana to track golden metrics and improve alerting systems.
    • Participate in on-call rotations, troubleshoot incidents, and drive post-incident reviews to implement lasting solutions.
  • Automation and Scripting

    • Develop and maintain automation scripts for routine operational tasks to reduce manual efforts and increase efficiency.
    • Advocate for AWX/Ansible adoption to automate configurations and deployments.
  • Collaboration and Best Practices

    • Work closely with DevOps and Engineering teams to identify and resolve performance bottlenecks.
    • Contribute to the establishment of best practices for infrastructure and application reliability.
    Key Requirements:
    1. Experience and Knowledge

    • ~ 3 years of experience in a similar SRE, DevOps, or Infrastructure Engineer role.
    • Strong experience with at least one major cloud provider (AWS, GCP, or Azure).
    • Hands-on experience with Kubernetes and containerization (e.g., Docker).
  • Technical Skills

    • Proficient in scripting languages such as Python, Bash, or similar for automation.
    • Familiarity with Infrastructure as Code (IaC) tools like Terraform, Pulumi, or AWS CloudFormation.
    • Strong understanding of load balancers, networking (IP management, subnetting), and HA architecture.
    • Experience with CI/CD tools (e.g., Bitbucket Pipelines, Jenkins, GitHub Actions).
  • Monitoring and Observability

    • Experience with modern monitoring and observability tools (e.g., Datadog, ELK, Grafana).
    • Ability to define and track golden metrics and establish meaningful alerting thresholds.
  • Problem Solving and Troubleshooting

    • Strong analytical skills and ability to resolve complex technical issues.
    • Proven track record in root cause analysis and incident management.
  • Soft Skills

    • Excellent communication and collaboration skills to work across teams.
    • Self-motivated and proactive in improving systems and processes.

    Required profile

    Experience

    Level of experience: Mid-level (2-5 years)
    Spoken language(s):
    English
    Check out the description to know which languages are mandatory.

    Site Reliability Engineer (SRE) Related jobs