Help us maintain the quality of our job listings. If you find any issues with this job post, please let us know.
Select the reason you're reporting this job:
Our mission is to unlock the potential of human creativity—by giving a million creative artists the opportunity to live off their art and billions of fans the opportunity to enjoy and be inspired by it.
Spotify transformed music listening forever when it launched in Sweden in 2008. Discover, manage and share over 70m tracks for free, or upgrade to Spotify Premium to access exclusive features including offline mode, improved sound quality, and an ad-free music listening experience.
Today, Spotify is the most popular global audio streaming service with 365m users, including 165m subscribers across 178 markets. We are the largest driver of revenue to the music business today.
The Speak team is Spotifies in-house text-to-speech (TTS) team, supporting products like DJ, AI Voice Translation, as well as the development of exciting new unreleased products. We focus on building world class speech technologies that can power the next generation of personalized generative voice products at scale.
What You'll Do
Build large-scale speech and audio data pipelines using frameworks like Google Cloud Platform and Apache Beam
Work on machine learning projects powering new generative AI experiences and helping to build state-of-the-art text-to-speech models
Learn and contribute to the teams best practices and techniques for building data pipelines for large scale generative models, including cleaning, filtering, classifying and labelling
Collaborate with other engineers, researchers, product managers and stakeholders, taking on learning and leadership opportunities that arise
Deliver scalable, testable, maintainable, and high-quality code
Share knowledge, promote standard methodologies, making your team the best version of itself through mentorship and constructive accountability.
Who You Are
You have Data Engineering experience and you know how to work with high-volume, heterogeneous data, preferably with distributed systems such as Hadoop, BigTable, Cassandra, GCP, AWS
You have experience building clean, high quality datasets for training large scale machine learning models, a focus on audio data is preferred
You have experience with one or more higher-level Python or Java based data processing frameworks such as Beam, Dataflow, Crunch, Scalding, Storm, Spark etc
You have strong Python programming abilities. You might have worked with Docker as well as Luigi, Airflow, or similar tools
You care about quality and you know what it means to ship high quality code
You have experience managing data retention policies
You care about agile software processes, data-driven development, reliability, and responsible experimentation
You understand the value of collaboration and partnership within teams
Were You'll Be
This role is located in London, UK or Stockholm, Sweden
Required profile
Experience
Level of experience:Senior (5-10 years)
Industry :
Music
Spoken language(s):
English
Check out the description to know which languages are mandatory.