Offer summary

Qualifications:

Experience in Data Engineering., Proficient in Python and Java-based frameworks., Experience with distributed systems like Hadoop and GCP., Ability to build high-quality datasets..

Key responsabilities:

Build large-scale speech and audio data pipelines.

Collaborate with teams for machine learning projects.

Job description

The Speak team is Spotifies in-house text-to-speech (TTS) team, supporting products like DJ, AI Voice Translation, as well as the development of exciting new unreleased products. We focus on building world class speech technologies that can power the next generation of personalized generative voice products at scale.

What You'll Do

Build large-scale speech and audio data pipelines using frameworks like Google Cloud Platform and Apache Beam

Work on machine learning projects powering new generative AI experiences and helping to build state-of-the-art text-to-speech models

Learn and contribute to the teams best practices and techniques for building data pipelines for large scale generative models, including cleaning, filtering, classifying and labelling

Collaborate with other engineers, researchers, product managers and stakeholders, taking on learning and leadership opportunities that arise

Deliver scalable, testable, maintainable, and high-quality code

Share knowledge, promote standard methodologies, making your team the best version of itself through mentorship and constructive accountability.

Who You Are

You have Data Engineering experience and you know how to work with high-volume, heterogeneous data, preferably with distributed systems such as Hadoop, BigTable, Cassandra, GCP, AWS

You have experience building clean, high quality datasets for training large scale machine learning models, a focus on audio data is preferred

You have experience with one or more higher-level Python or Java based data processing frameworks such as Beam, Dataflow, Crunch, Scalding, Storm, Spark etc

You have strong Python programming abilities. You might have worked with Docker as well as Luigi, Airflow, or similar tools

You care about quality and you know what it means to ship high quality code

You have experience managing data retention policies

You care about agile software processes, data-driven development, reliability, and responsible experimentation

You understand the value of collaboration and partnership within teams

Were You'll Be

This role is located in London, UK or Stockholm, Sweden

Required profile