Proficiency in programming languages for data processing such as Python, Scala, or Java., Strong experience with big data technologies like Hadoop and Spark, as well as ETL tools., Familiarity with data storage systems including SQL and NoSQL databases, and data lakes., Experience with cloud platforms and data services like AWS Redshift and Google BigQuery..
Key responsabilities:
Design, develop, and maintain robust data pipelines for machine learning workflows and GenAI applications.
Implement data ingestion, transformation, and storage solutions for both structured and unstructured data.
Ensure data quality, integrity, and consistency across the entire data pipeline.
Collaborate with ML engineers and data scientists for seamless integration of data pipelines with models and applications.
Report This Job
Help us maintain the quality of our job listings. If you find any issues with this job post, please let us know.
Select the reason you're reporting this job:
Robusta is a tech agency working with a diverse client base across different sectors & industries on implementing digital transformation programs. Engagements are typically focused on digitization of existing operations & processes and/or activation of digital customer engagement channels. With a team of 100+ tech and market consultants, robusta maintains an impactful footprint across EMEA and engages with its clients through its two key operations hubs in Egypt and Germany.
Octopus by RTG is on a mission of connecting top notch ogranizations around the globe with top notch talents. We are currently looking for a Senior Data Engineer.
Responsibilities:
Design, develop, and maintain robust data pipelines to support machine learning workflows and GenAI applications.
Implement data ingestion, transformation, and storage solutions for structured and unstructured data.
Ensure data quality, integrity, and consistency across the entire pipeline.
Optimize data infrastructure for scalability, performance, and cost-efficiency.
Implement real-time data processing workflows
Collaborate with ML engineers and data scientists to ensure seamless integration of data pipelines with models and applications.
Requirements
Proficiency in programming languages for data processing (e.g., Python, Scala, Java).
Strong experience with big data technologies (e.g., Hadoop, Spark) and ETL tools.
Familiarity with data storage systems (e.g., SQL databases, NoSQL databases, data lakes).
Strong Experience with vector databases and embedding stores
Experience with cloud platforms and data services (e.g., AWS Redshift, Google BigQuery, Azure Data Factory).
Knowledge of data modeling, warehousing, and real-time processing frameworks (e.g., Kafka, Flink).
Strong problem-solving skills and ability to work in cross-functional teams.
Required profile
Experience
Industry :
Information Technology & Services
Spoken language(s):
English
Check out the description to know which languages are mandatory.