Match score not available

Senior/Lead Site Reliability/DevOps/Platform Engineer

Remote: 
Full Remote
Contract: 
Experience: 
Senior (5-10 years)
Work from: 
United States

Offer summary

Qualifications:

Bachelor’s degree with 5+ years of experience OR Associate degree with 7+ years for senior roles, Minimum 4 years of AWS Cloud Platform experience in Production with various technologies.

Key responsabilities:

  • Maintain high availability, reliability, and performance of application services through monitoring and automation
  • Provide technical guidance and suggestions for improvement in deployment and operational processes
Horizontal Talent logo
Horizontal Talent Human Resources, Staffing & Recruiting SME https://www.horizontaltalent.com/
201 - 500 Employees
See more Horizontal Talent offers

Job description

Essential Accountabilities

  • Ensures that application services are highly available, reliable, and performant through monitoring and alerting.
  • Serves as the primary subject matter expert for the application services towards preventing (pro-active) as well as troubleshooting and mitigating (re-active) service availability/performance issues.
  • Develops tools or automation to improve our ability to effectively monitor application services in a large-scale and complex environment. Evaluates and implements improvement of existing tools and monitoring thresholds.
  • Provides technical assistance and operational guidelines for business operations and application development to ensure applications are running optimally in production, test, and development environments.
  • Designs, implements, and maintains SRE dashboard, bots and other automation based on the current operational needs and current release changes. Evaluate and suggest improvement of the dashboard, bots, and other automation.
  • Identifies repetitive, manual, and scalable tasks and automates them using scripting/programming languages or tools.
  • Identifies key operational metrics, follows through by defining and designing methods to programmatically capture the data necessary to create them.
  • Functions as the subject matter expert for coordinating and managing the deployment process and support of the full lifecycle of applications in Amazon Web Services.
  • Understands and evaluates current application release changes to identify any potential addition or modification needs to current SRE program.
  • Serves as a technical resource to internal and external IT groups. May provide subject matter expertise for third party products and utilities used to support enterprise-wide applications.
  • Consults with developers on issues related to the impact of development on the infrastructure, works with system engineers and developers to define server configuration settings, leads the migration of code through staging environments to production, and provides assistance to software quality assurance technicians during system acceptance testing.
  • Influences new application and infrastructure designs and architectures, as well as create standards and guidelines for large-scale distributed systems with a focus on operability.
  • Create and maintain cloud operations processes and technical documentation.
  • Provide technical mentorship and training to team members.
  • Perform other duties as assigned or requested.
  • Adhere to the Bank's attendance policies through regular and prompt attendance.

Problem Solving Skills

  • Logical analysis: Requires thinking through and solving problems step by step, completing root cause analysis, often looking beyond the obvious solution to problems and digging deeper for the best solutions.
  • Requires following vaguely defined procedures. Decisions are consistently made within reason and affect the work group or department.
  • Working in a group environment: Requires working as part of a group to solve issues and problems.

Qualifications

  • Bachelor’s degree and 5+ years of experience OR Associate degree and/or Technical Bootcamp Certificate with 7+ years of experience for Sr. or Lead Site Reliability Engineer, DevOps Engineer, or Platform Engineer
  • 4+ years of working experience with AWS Cloud Platform technologies, infrastructure, and practices in Production environment including CloudWatch, ECS, Lambda, Canaries, DynamoDB, RDS, PostgreSQL, S3, API Gateway, Elastic Load Balancer, Athena, AWS X-Ray, SQS
  • 2+ years of working experiences with creating automation or solution development
  • 2+ years of working experiences with GitLab, CDK, Terraform and CI/CD pipeline
  • 2+ years of working experiences with cloud technology including Grafana, OpenSearch, and Docker
  • 2+ years of working experiences with Infrastructure as Code, Configuration as Code, Alerts and Monitoring as Code
  • Ability to read, comprehend, and create complex technical documentation.
  • Ability to comprehend business operational requirements.
  • Demonstrated ability to analyze complex and communicate complex technical analysis to technical and non-technical audiences.
  • Strong communication skills; verbal & written. Ability to articulate clear and concise instructions and resolutions.
  • Excellent problem solving, organizational and analytical skills

Knowledge Areas Preferred

  • Traditional and Cloud infrastructure components and techniques in Production and Lower environments, including virtualization, elasticity, networking, and load balancing
  • Development, QA, and Production Deployment patterns and version control (e.g., zero downtime, blue/green deployments, canary releases, etc.)
  • Cloud Operating Console commands, administration, and configuration
  • Experience in coding languages, such as Python, Typescript, NodeJS, .Net, Java,
  • Understanding of Agile and DevOps practices
  • Familiar with ITIL framework
  • Familiar with Chaos Engineering

Required profile

Experience

Level of experience: Senior (5-10 years)
Industry :
Human Resources, Staffing & Recruiting
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Teamwork

Site Reliability Engineer (SRE) Related jobs