Company Overview:
At Codvo, software and people transformations go together. We are a global, empathy-led technology services company with a core DNA of product innovation and mature software engineering. We uphold the values of Respect, Fairness, Growth, Agility, and Inclusiveness in everything we do.
Data Engineer - Optimization Solutions
About the Role: Responsibilities: Design, build, and maintain robust and scalable data pipelines to support the development and deployment of mathematical optimization models.
Collaborate closely with data scientists to deeply understand the data requirements for optimization models.
This includes: Data preprocessing and cleaning
Feature engineering and transformation
Data validation and quality assurance
Develop and implement comprehensive data quality checks and monitoring systems to guarantee the accuracy and reliability of the data used in our optimization solutions.
Optimize data storage and retrieval processes for highly efficient model training and execution.
Work effectively with large-scale datasets, leveraging distributed computing frameworks when necessary to handle data volume and complexity.
Stay up to date on the latest industry best practices and emerging technologies in data engineering, particularly in the areas of optimization and machine learning.
Experience:
3+ years of demonstrable experience working as a data engineer, specifically focused on building and maintaining complex data pipelines.
Proven track record of successfully working with large-scale datasets, ideally in environments utilizing distributed systems.
Technical Skills - Essential:
Programming: High proficiency in Python is essential. Experience with additional scripting languages (e.g., Bash) is beneficial.
Databases: Extensive experience with SQL and relational database systems (PostgreSQL, MySQL, or similar). You should be very comfortable with:
Writing complex and efficient SQL queries
Understanding performance optimization techniques for databases
Applying schema design principles
- Data Pipelines: Solid understanding and practical experience in building and maintaining data pipelines using modern tools and frameworks. Experience with the following is highly desirable:
- Workflow management tools like Apache Airflow
- Data streaming systems like Apache Kafka
- Cloud Platforms: Hands-on experience working with major cloud computing environments such as AWS, Azure, or GCP. You should have a strong understanding of:
- Cloud-based data storage solutions (Amazon S3, Azure Blob Storage, Google Cloud Storage)
- Cloud compute services
- Cloud-based data warehousing solutions (Amazon Redshift, Google Big Query, Snowflake)
- Technical Skills - Advantageous (Not Required, But Highly Beneficial):
- NoSQL Databases: Familiarity with NoSQL databases like MongoDB, Cassandra, and DynamoDB, along with an understanding of their common use cases.
- Containerization: Understanding of containerization technologies such as Docker and container orchestration platforms like Kubernetes.
- Infrastructure as Code (IaC): Experience using IaC tools such as Terraform or CloudFormation.
- Version Control: Proficiency with Git or similar version control systems.
Additional Considerations: Industry Experience: While not a strict requirement, experience working in industries with a focus on optimization, logistics, supply chain management, or similar domains would be highly valuable.
Machine Learning Operations (MLOps): Familiarity with MLOps concepts and tools is increasingly important for data engineers in machine learning-focused environments.