Site Reliability Engineer - remote

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Bachelor's degree with 5+ years of related experience in SRE/DevOps., Experience with AWS and/or GCP, and infrastructure-as-code tools like Terraform., Proficiency in containerization (Docker) and orchestration (Kubernetes)., Strong communication skills and ability to learn new technologies. .

Key responsabilities:

  • Design, implement, and operate cloud infrastructure for a SaaS platform.
  • Collaborate with engineering teams to evolve product architecture and migrate services.
  • Build and maintain monitoring strategies and automate processes.
  • Participate in incident management and contribute to achieving high availability.

Broadcom Inc. logo
Broadcom Inc. XLarge http://www.broadcom.com
10001 Employees
See all jobs

Job description

Please Note:

1. If you are a first time user, please create your candidate login account before you apply for a job. (Click Sign In > Create Account)

2. If you already have a Candidate Account, please Sign-In before you apply.

Job Description:

As a Site Reliability Engineer, you will be responsible for the implementation and operation of cloud infrastructure for a SaaS based network monitoring solution. In this role, you will:

  • Participate in the design, implementation, and operation of our SaaS platform, addressing concerns such as continuous integration, cloud infrastructure, solution deployment, and monitoring & alerting.

  • Partner closely with our other engineering teams to evolve product/service architecture

  • Migrate services into our freshly-minted platform, and collaborate with our dev teams to ensure that new services are designed with operability and observability in mind.

  • Build out, deploy, and maintain our monitoring strategy and technology stack

  • Automate all the things, freeing yourself and others from the tyranny of manual tasks.

  • Contribute to the achievement of our 99.99% monthly availability by participating in our incident management process and quiet on-call rotation.

  • Practice sustainable incident response and coordinate blameless postmortems.

  • Assist in the definition, prioritization, and planning of work through backlog maintenance and collaboration on the product delivery roadmap.


 

Required Education and Experience

  • SRE/DevOps experience in building and operating cloud-based SaaS platforms

  • Familiarity and experience with:

    • AWS and/or GCP

    • Infrastructure-as-code tooling (e.g. Terraform)

    • Containerization (Docker) and orchestration (Kubernetes, helm)

    • CI/CD pipelines, either self-hosted (e.g. Jenkins, TeamCity), or managed (e.g. GitHub Actions, GitLab)

    • Configuration management (Chef, Ansible, Puppet)

    • At least one programming language (Python preferred)

    • Monitoring solutions (e.g. Prometheus, Grafana, Cloudwatch, Stackdriver, ELK)

    • Linux systems, automation, package management

  • Demonstrable aptitude to learn new technologies, and apply that knowledge to solve real problems

  • Strong interpersonal communication skills (listening, speaking, and writing)

  • Experience operating large-scale, distributed systems on top of cloud infrastructure

Bachelors + 5+ years of related experience.

Broadcom Software - Agile Operations Division

Join Broadcom Software (#BroadcomSW), a world leader in business-critical software that modernizes, optimizes, and protects the world’s most complex hybrid environments. With our engineering-centered culture, we are building an extensive portfolio of industry-leading infrastructure and security software. Together, we solve big customer problems with some of the top technical talent in the industry.

In the Agile Operations Division, we offer business-critical software solutions that help the world’s leading companies transform their operating model to be more agile. Our ValueOps, NetOps, and Automation solutions help these organizations drive innovation and achieve operational excellence to realize better business outcomes – and better experiences for their customers.

Our industry success is built on a decades-long track record of delivering transformational solutions to teams who plan, build, test, and operate mission-critical software for the world’s largest and most complex businesses. To do this, we respond quickly and thoughtfully, innovate in the context of customer needs, and collaborate inclusively with customers and internal partners. Our business will nurture your intellect and give you opportunities to expand your skills even further. 

Additional Job Description:

Compensation and Benefits

The annual base salary range for this position is $91,000  - $146,000

This position is also eligible for a discretionary annual bonus in accordance with relevant plan documents, and equity in accordance with equity plan documents and equity award agreements.

Broadcom offers a competitive and comprehensive benefits package: Medical, dental and vision plans, 401(K) participation including company matching, Employee Stock Purchase Program (ESPP), Employee Assistance Program (EAP), company paid holidays, paid sick leave and vacation time. The company follows all applicable laws for Paid Family Leave and other leaves of absence.

Broadcom is proud to be an equal opportunity employer.  We will consider qualified applicants without regard to race, color, creed, religion, sex, sexual orientation, gender identity, national origin, citizenship, disability status, medical condition, pregnancy, protected veteran status or any other characteristic protected by federal, state, or local law.  We will also consider qualified applicants with arrest and conviction records consistent with local law.

If you are located outside USA, please be sure to fill out a home address as this will be used for future correspondence.

Required profile

Experience

Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Communication
  • Problem Solving

Site Reliability Engineer (SRE) Related jobs