Match score not available

Senior Site Reliability Engineer

UNLIMITED HOLIDAYS - EXTRA HOLIDAYS - EXTRA PARENTAL LEAVE - LONG REMOTE PERIOD ALLOWED

Remote:

Full Remote

Contract:

Full time

Salary:

110 - 135K yearly

Experience:

Senior (5-10 years)

Work from:

United States

Offer summary

Qualifications:

5+ years of AWS experience, Deep Linux OS knowledge, Familiarity with SRE tools and technologies, Proficiency in infrastructure as code tools, Strong networking understanding.

Key responsabilities:

Design and maintain AWS infrastructure
Develop monitoring and alerting solutions
Automate infrastructure provisioning and deployment
Manage and resolve production incidents
Ensure security compliance across systems

Procare Solutions Computer Software / SaaS SME https://www.procaresolutions.com/

201 - 500 Employees

See more Procare Solutions offers

Job description

Your missions

About Procare

Our mission is to simplify childcare operations and create meaningful connections by providing technology, expertise, and unparalleled service.

Procare Solutions is the #1 name in childcare software – used by more than 35,000 childcare businesses across the country. For over 30 years, childcare professionals have looked to Procare to provide real-time information for making critical decisions, maintaining compliance with local and state regulations, and adhering to business best practices.

We make childcare management run smoothly, so that our customers can spend more time focusing on the kiddos, not back office administrative duties.

A Little About the Role

We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) to join our team. The ideal candidate will have a deep understanding and extensive experience working with AWS, a thorough knowledge of the Linux operating system, and a robust background in managing and optimizing infrastructure and services in a cloud environment. As an SRE, you will be responsible for maintaining the reliability, availability, and performance of our applications and infrastructure.

What You Will Do

Infrastructure Management: Design, implement, and maintain scalable, reliable, and secure AWS infrastructure using best practices.
Monitoring & Alerting: Develop and maintain monitoring, logging, and alerting solutions to ensure the health and performance of our systems. Utilize tools such as New Relic, AWS CloudWatch, Prometheus, Grafana, and ELK stack.
Automation & Scripting: Automate infrastructure provisioning, configuration, and deployment processes using tools like Terraform, CloudFormation, and Ansible.
Incident Management: Respond to and resolve production incidents, conduct root cause analysis, and implement corrective measures to prevent recurrence.
Performance Optimization: Continuously analyze system performance and implement tuning improvements to enhance the overall efficiency and scalability of the infrastructure.
Security Compliance: Ensure all systems and infrastructure comply with security best practices and policies. Implement and manage IAM roles and policies, VPC configurations, and security groups.
Collaboration: Work closely with development teams to integrate reliability into the software development lifecycle, including CI/CD pipeline management using tools such as Jenkins or AWS CodePipeline.
Documentation: Maintain comprehensive documentation of infrastructure, processes, and incident reports to ensure knowledge sharing and transparency.

Our Ideal Candidate Will Have

AWS Expertise: Minimum 5 years' of hands-on experience with AWS services including EC2, S3, RDS, Lambda, ECS/EKS, CloudFormation, CloudWatch, VPC, and IAM
Linux Expertise: Deep knowledge and extensive experience with Linux operating systems, including system administration, shell scripting, and troubleshooting.
SRE Tools & Technologies: Familiarity with common SRE-related services and tools such as Kubernetes, Docker, Prometheus, Grafana, Elasticsearch, Logstash, Kibana (ELK), and Splunk.
Automation & Configuration Management: Proficiency in infrastructure as code (IaC) tools like Terraform, Ansible, and CloudFormation.
Monitoring & Logging: Experience with monitoring and logging solutions, including setting up metrics, creating dashboards, and alerts.
Networking: Strong understanding of networking concepts, including DNS, load balancing, VPN, firewalls, and network security.
Programming & Scripting: Proficiency in at least one programming/scripting language such as Python, Go, or Bash.
Problem-Solving: Excellent problem-solving skills with a proactive and analytical approach to resolving issues.
Communication: Strong written and verbal communication skills, with the ability to collaborate effectively with cross-functional teams
Certifications: AWS Certified Solutions Architect – Professional, AWS Certified DevOps Engineer, or similar certifications.
DevOps Engineering Background: Experience in DevOps engineering, including continuous integration and continuous deployment (CI/CD) practices and tools.
Experience: Previous experience in a similar SRE role within a large-scale, complex environment.

Why Procare?

Excellent comprehensive benefits packages including: medical, dental, & vision plans
HSA option with employer contributions
Vacation time, holidays, sick days, volunteer & personal days
401K Plan with employer match and immediate vesting
Employee Stock Purchase Plan
Employee Discount Program
Medical, Dependent Care, and Transportation FSA Plans
Company paid Short and Long-Term disability and Life Insurance
RTD EcoPass for all Denver employees
Tuition Reimbursement and continued Professional Development
Fast paced, high energy workplace environment in prime downtown location
Regular company provided meals

Salary

$110,000-$135,000/year DOE

Location

While our preference is a candidate located in Denver, CO, this role is open to remote candidates in the following states: AL, AZ, CA, CO, CT, FL, GA, ID, IL, IN, IA, KY, ME, MD, MA, MI, MN, MO, NV, NJ, NY, NC, OH, OR, PA, TN, TX, VA, WA, WI.