Offer summary

Qualifications:

Bachelor’s degree in Computer Science, Information Technology, or equivalent experience., 8+ years of experience in cloud operations, reliability engineering, or infrastructure management., Certifications such as GCP Professional Cloud Architect or GCP Professional DevOps Engineer are preferred., Expertise in Google Cloud services, Infrastructure as Code tools like Terraform and Ansible, and strong knowledge of SRE principles..

Key responsabilities:

Manage and maintain GCP infrastructure to ensure high availability and reliability.

Monitor resource utilization and performance trends for capacity planning and cost optimization.

Design and implement resilient cloud architectures using Infrastructure as Code tools.

Collaborate with DevOps teams to streamline deployment processes and maintain documentation.

Job description

We are seeking an experienced Google Cloud Platform (GCP) Site Reliability Engineer (SRE) to manage daily operational workloads, ensuring the reliability, scalability, and cost efficiency of cloud infrastructure. The ideal candidate will have deep expertise in capacity planning, performance optimization, infrastructure design, and FinOps best practices to maintain an efficient and cost-effective GCP environment.

Key Responsibilities:

• Operations & Reliability: Manage and maintain GCP infrastructure, ensuring high availability, scalability, and system reliability.

• Capacity Planning & Optimization: Monitor and forecast resource utilization, performance trends, and infrastructure scaling needs to optimize cloud costs and efficiency.

• Infrastructure Design & Automation: Design and implement highly available, fault-tolerant, and resilient cloud architectures, leveraging Infrastructure as Code (IaC) tools such as Terraform and Ansible.

• Performance Monitoring & Incident Response: Utilize Google Cloud Monitoring, Cloud Logging, and third-party tools to proactively detect and resolve performance issues.

• FinOps & Cost Management: Analyze and optimize cloud spending, implement cost controls, recommend rightsizing strategies, and ensure efficient resource allocation.

• Security & Compliance: Implement best practices for IAM, network security, encryption, and compliance frameworks (SOC2, ISO 27001, NIST).

• CI/CD & DevOps Integration: Collaborate with DevOps teams to streamline deployment processes, automate workflows, and optimize application performance.

• Disaster Recovery & High Availability: Design and implement disaster recovery (DR) plans, backup strategies, and failover mechanisms to ensure business continuity.

• Documentation & Collaboration: Maintain comprehensive documentation of infrastructure, best practices, and optimization strategies while working closely with cross-functional teams.

Requirements

Qualifications:

• Education: Bachelor’s degree in Computer Science, Information Technology, or equivalent experience.

• Experience: 8+ years of experience in cloud operations, reliability engineering, or infrastructure management.

• Certifications: GCP Professional Cloud Architect, GCP Professional DevOps Engineer, or equivalent is preferred.

• Technical Proficiency:

• Expertise in Google Cloud networking, Compute Engine, Kubernetes (GKE), Cloud Functions, and Cloud Storage.

• Strong knowledge of Terraform, Ansible, or other Infrastructure as Code (IaC) tools.

• Experience with Google Kubernetes Engine (GKE), microservices, and container orchestration.

• Hands-on experience with FinOps tools and cost optimization strategies in cloud environments.

• Familiarity with monitoring and logging solutions such as Google Operations Suite (formerly Stackdriver), Prometheus, Grafana.

• Experience with CI/CD pipelines, automation, and GitOps best practices.

• Strong understanding of SRE principles, SLAs, SLOs, and error budgets.

Preferred Qualifications:

• Experience with multi-cloud or hybrid cloud environments.

• Knowledge of serverless computing and cloud-native application design.

• Understanding of ITIL frameworks for incident, problem, and change management

Required profile

Are you interested?

Site Reliability Engineer (SRE) Related jobs

Site Reliability Engineering Manager

2 day ago

Canonical

Full time

DevOpsKubernetesLinuxSoftware DeploymentDistributed ComputingCloud Computing

Senior Site Reliability Engineer

30+ days ago

Global Fashion Group

Full time

KubernetesInfrastructure as Code (IaC)LinuxSystem MonitoringProject CollaborationTerraform

Senior Staff Site Reliability Engineer

6 day ago

Ping Identity

Full time

Cloud ComputingGo (Programming Language)KubernetesDocker (Software)Git (Version Control System)Continuous Delivery

Senior Reliability Engineer

30+ days ago

ServiceNow

Full time

Performance AnalysisSimple Network Management ProtocolsSplunkWeb ApplicationsProgramming LanguagesDynamic HTML

Junior Site Reliability Engineer

1 day ago

GovOS

Full time

Docker (Software)AWS Cloud ServicesLinuxMicrosoft Operating SystemsGit (Version Control System)PostgreSQL

See more Site Reliability Engineer (SRE) jobs

Expert Site Reliability Engineer - GCP