Match score not available

Site Reliability Engineer (SRE) III_USA

Remote: 
Hybrid
Contract: 
Experience: 
Expert & Leadership (>10 years)
Work from: 
Phoenix (US)

Offer summary

Qualifications:

4-year degree in Computer Science or related field, 5+ years of technical lead experience, 8+ years of development experience, 10+ years in integration engineering for Observability, Experience with Microsoft Azure or GCP.

Key responsabilities:

  • Lead Observability initiatives as Lead Engineer
  • Develop and implement build release pipelines
  • Design solutions for observability applications
  • Provide technical leadership in design and testing
  • Support and guide less experienced staff
Metasys Technologies logo
Metasys Technologies SME https://www.metasysinc.com/
201 - 500 Employees
See more Metasys Technologies offers

Job description

Site Reliability Engineer (SRE) II
Phoenix, AZ (Hybrid -3 days per week)
6+ Month Contract

Main responsibilities

  • Lead Observability initiatives as Lead Engineer.
  • Develop and implement build release pipelines; manage deployment schedules, issues, risks, and impediments.
  • Participate in Agile development with team accountability for commitment and delivery each sprint.
  • Ensure implementations of observability meet IT Services requirements through approved processes and methodologies.
  • Design solutions for observability applications and system integration with internal and external vendors.
  • Provide technical leadership in design, development, and testing of solutions.
  • Track infrastructure delivery and dependencies to implementation.
  • Prepare and present technical solutions; advise teams on approaches and tradeoffs.
  • Define system structures, interfaces, and guiding principles for organization, software design, and implementation.
  • Support reusable application components from a business and technology perspective.
  • Provide coding and technical direction to less experienced staff or develop complex original code.

Qualifications

  • Experience in gathering and organizing large volumes of data for Enterprise Observability solutions.
  • Experience recommending baseline monitoring thresholds, performance monitoring KPIs, and SLAs.
  • Proficient in installing agents, forwarders, APIs, performance monitoring alerts, dashboards, and data trend analysis.
  • Strong knowledge of Azure foundation components (e.g., App GW, APIM, Virtual Network, NSG, Load Balancer, Azure VM).

Top responsibilities

  • Lead the Observability Ingestion team.
  • Provide technical solutions on a day-to-day basis.
  • Ensure technical delivery of the team.
  • Resolve any technical blockers.
  • Collaborate with Architects on solution options and perform POC and learning on new technologies.

Experience

  • Proficiency in at least one of the following languages: Java (required); desired: Python, Go, C, C++.
  • Experience with databases: Azure SQL, PostgreSQL, MySQL, MongoDB, TSDB, or similar.
  • Required experience on one of the following cloud platforms: Microsoft Azure or GCP.
  • Experience with PCF, Docker, Kubernetes is required.
  • Familiarity with DevOps and CI/CD tools and processes is required.
  • Preferred experience in high-performance and high-frequency data streaming (e.g., using Kafka) and handling large batch data.
  • Required experience with Agile/Scrum methodologies.

Requirements

  • Education: 4-year degree in Computer Science, Information Systems, or related field; or equivalent combination of education and experience.
  • Experience:
    • 5+ years of tech lead experience.
    • 8+ years of development experience (GCP experience is a plus).
    • 10+ years of experience in integration engineering related to Observability/Monitoring frameworks; experience with two or more APM Tools (e.g., AppDynamics, Datadog, Splunk, Dynatrace, Kibana, Elastic).
    • 5+ years of experience as a System Reliability Engineer.
    • Hands-on experience with tools and technology preferred.
    • Experience with Open-source platforms and OpenTelemetry libraries (e.g., Grafana) preferred.

Ideal candidate skills

  • SRE skills
  • Observability development skills
  • Technical lead experience
  • Performance Monitoring
  • Problem Solving
  • Familiarity with Grafana, Prometheus, Cortex, Loki, Tempo, Mimir
  • GCP experience is a critical need.

Education/Certifications: Bachelor’s in Engineering, BTECH, BE

Required profile

Experience

Level of experience: Expert & Leadership (>10 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Problem Solving

Site Reliability Engineer (SRE) Related jobs