Match score not available

Senior SRE Engineer

Remote: 
Full Remote
Contract: 
Experience: 
Expert & Leadership (>10 years)
Work from: 

Offer summary

Qualifications:

10-12 years SRE experience, Expertise in AWS cloud technologies, Proficient with Docker/Kubernetes, Experience in Linux and GitLab CI/CD, Hands-on with monitoring tools like Splunk.

Key responsabilities:

  • Design scalable systems focused on performance
  • Evangelize SRE practices within IT operations
  • Collaborate with teams for operational reliability
  • Implement monitoring to optimize applications
  • Guide and mentor SRE teams for skill enhancement
Damco Solutions logo
Damco Solutions Large https://www.damcogroup.com/
1001 - 5000 Employees
See more Damco Solutions offers

Job description

Job Description:

We are currently seeking a highly skilled Senior SRE Engineer with solid experience to help lead transformational initiatives within IT operations, encompassing development as well. As a crucial figure in this role, you will participate/help designing and implementing cutting-edge SRE solutions, driving the transformation of IT operations organizations to adopt an engineering-centric approach.

Responsibilities:

  • Participate in design, architecture of reliable, scalable, and high-performance systems and services with a focus on operational excellence, availability, and performance.
  • Evangelize SRE evolution within IT operations and promoting a culture of engineering excellence and best practices.
  • Define best practices and principles for SRE, including incident management, monitoring, alerting, and automation.
  • Collaborate with development teams on resiliency to ensure that services and applications are designed with operational reliability in mind.
  • Implement monitoring systems to assess the performance of applications and infrastructure, and proactively identifying areas for optimization.
  • Understanding incident and problem management process, post-mortems, and driving improvements to prevent future incidents.
  • Analyze resource utilization patterns and forecasting future capacity needs to ensure optimal performance and cost-efficiency.
  • Ensure that SRE practices align with security and compliance requirements and implementing measures to protect systems and data.
  • Operational excellence with focus on automation and developing tools to streamline operational tasks and increase efficiency.
  • Provide guidance and mentorship to SRE teams, fostering skill development, and building a strong and capable SRE practice.
  • Ability to develop close relationship with other operational teams to integrate SRE practices and drive overall operational improvements across enterprise.
  • Stay up to date on industry trends, new technologies, and best practices in SRE and applying relevant advancements to the organization.

Qualifications:

  • Around 10-12 years of SRE hands on experience with cloud technologies, development, SRE toolsets and automation
  • Strong hands-on experience with any Cloud Technology (AWS): Control Tower, Project Setup, Creating Accounts, RDS, SSO
  • Solid understanding and hands on experience with Docker/Kubernetes
  • Should have good experience with Linux Commands, GitLab CICD Setup and Terraform (state management, etc)
  • Monitoring & alerting setup experience with Splunk, Prometheus, Grafana, Kibana, ELK etc.
  • Hands on APM Tool/s experience, preferably Datadog or AppDynamics or Dynatrace
  • Good understanding of Observability Framework leveraging programmatic SLI/SLO blueprints to standardize the collection of golden signals.
  • Should have automation (data refresh, releases, DB snapshots) experience using Ansible or any other scripting languages
  • Experience with following languages (Groovy-DSL, Java, Python, Yaml and microservices architecture)
  • Good understanding and hands on experience with MQ, Kafka
  • Experience with Databases (Oracle, MySQL)

Good to have:

  • Any of the relevant professional certifications – Certified Site Reliability Engineer (CSRE), Certified Kubernetes Administrator (CKA), AWS Certified DevOps Engineer Professional, , Google Cloud Professional; DevOps Engineer

Required profile

Experience

Level of experience: Expert & Leadership (>10 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Site Reliability Engineer (SRE) Related jobs