Match score not available

Site Reliability Engineer - SRE Toulouse

Remote:

Hybrid

Contract:

Full time

Experience:

Mid-level (2-5 years)

Work from:

Lille (FR), Paris (FR), Toulouse (FR)

Offer summary

Qualifications:

Experience in Go, Python, Rust, Strong system administration background, Troubleshooting production systems, Understanding of cloud architectures and networks.

Key responsabilities:

Optimize tools for incident remediation
Collaborate with engineering teams
On-call support and real-time issue resolution
Implement best practices for system performance

Scaleway Scaleup https://www.scaleway.com/

201 - 500 Employees

HQ: Paris

See more Scaleway offers

Job description

Fondée en 1999, Scaleway est la filiale cloud du groupe Iliad, l’un des leaders des télécommunications en Europe. Notre mission est de favoriser une industrie numérique plus responsable en aidant les développeurs et les entreprises à créer, déployer et adapter des applications à n'importe quelle infrastructure.

Depuis nos bureaux situés à Paris et à Lille, nous perfectionnons quotidiennement l'écosystème cloud de Scaleway, dont nous sommes les premiers utilisateurs.

Nos quelques 25 000 clients nous choisissent pour notre redondance multi-AZ, notre expérience-utilisateur fluide, nos datacenters neutres en carbone ainsi que nos outils natifs de gestion d'architectures multi-cloud. Nos produits incluent des solutions entièrement gérées pour le bare metal, la conteneurisation et les architectures serverless, offrant ainsi un choix responsable dans le domaine du cloud computing.

Rejoignez notre équipe dynamique de près de 600 collaborateurs venant de divers horizons, dans un environnement stimulant et international alliant excellence technique, créativité et partage.

About the job

Scaleway is looking for a Site Reliability Engineer to join our teams.

Reporting to a Lead SRE, you will be responsible to ensure we can reliably serve our products for users around the world. We expect you to have a strong background in development and system administration. Our systems evolve constantly and the tools needed to observe and act to ensure their resilience need to evolve accordingly.

Minimum qualifications

Previous experience as a developer in Go, Python or Rust

Experience in system programming with usual scripting languages (bash, Python)

Demonstrated ability to troubleshoot production systems failures

A great attitude and desire to work with a team

Passion for incremental improvements on tooling, love all things of automation

Experience with Linux systems (Ubuntu/Debian)

Experience with cloud environments architecture (baremetal, virtual machines, containers, orchestrators)

Good understanding of computer networks: TCP/IP, DNS, load-balancing, IPv6, BGP and network virtualisation

Understanding of written and spoken english, capable of writing technical documentation in English, ability to speak english if needed

Preferred qualifications

Experience with infrastructure as code and continuous deployment

Experience dealing with physical hardware automation

Experience with monitoring & logging systems

Experience administering relational databases

Knowledge of one cloud platform and related use-cases

Take initiatives to propose new solutions and defend them

Team player, willing to share knowledge, opinions, and participate in regular team rituals

Good communication skills and coaching skills

Responsibilities

Create or optimize existing tools & documentation that will help identify, diagnose and remediate production incidents, automating as much as possible

Troubleshoot high-impact issues working with multiple engineering teams

Take on-call responsibilities, mitigate issues encountered in production and secure the best real-time answer to our customers

Ensure a high quality of service for our customers by leveraging observability and monitoring technologies

Manage lifecycle of products in production

Help implementing best practices in stability, resiliency, scalability, security and performance across our systems

Technical Stack

Python, Go, Rust

RabbitMQ

PostgreSQL

HA Proxy, Nginx, REST APIs / Flask

S3 API

Sentry, Prometheus, Grafana, ElasticSearch, Fluentd, Kibana

Ansible, AWX, Foreman, Salt

GitLab, Nexus

Ubuntu, Debian, CentOS

Jira, Confluence, Slack, GSuite

Location

This position is based in our offices in Paris, Toulouse or Lille (France)

Si vous ne vous voyez pas cocher toutes les cases, n'hésitez pas à postuler tout de même. Ne vous limitez pas à une description de poste - on ne sait jamais !

🌐Scaleway | Scaleway Blog| Scaleway sur X