Job description

We are seeking a skilled Data Engineer to design, build, and optimize data pipelines and analytics platforms. This role will focus on developing scalable, efficient, and high-performance data solutions that support advanced analytics, business intelligence, and machine learning applications. The ideal candidate is passionate about data engineering, cloud-based data architectures, and modern data processing frameworks.

Key Responsibilities:

Data Architecture & Engineering:
- Design and implement scalable data architectures leveraging BigQuery, Iceberg, Starburst, and Trino.
- Develop robust, high-performance ETL/ELT pipelines to process structured and unstructured data.
- Optimize SQL queries and data processing workflows for efficient analytics and reporting.
Cloud & Big Data Infrastructure:
- Build and maintain data pipelines and storage solutions using Google Cloud Platform (GCP) and BigQuery.
- Implement best practices for data governance, security, and compliance within cloud-based environments.
- Optimize data ingestion, storage, and query performance for high-volume and high-velocity datasets.
Data Processing & Analytics:
- Leverage Apache Iceberg for large-scale data lake management and transactional processing.
- Utilize Starburst and Trino for distributed query processing and federated data access.
- Develop strategies for data partitioning, indexing, and caching to enhance performance.
Collaboration & Integration:
- Work closely with data scientists, analysts, and business stakeholders to understand data needs and requirements.
- Collaborate with DevOps and platform engineering teams to implement CI/CD pipelines and infrastructure-as-code for data workflows.
- Integrate data from multiple sources, ensuring data integrity and accuracy across systems.
Performance Optimization & Monitoring:
- Monitor, troubleshoot, and optimize data pipelines for efficiency, scalability, and reliability.
- Implement data quality frameworks and automated validation checks to ensure consistency.
- Utilize monitoring tools and performance metrics to proactively identify bottlenecks and optimize queries.

Qualifications:

Education: Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Systems, or a related field.
Experience:
- 4+ years of experience in data engineering, with expertise in SQL, BigQuery, and GCP.
- Strong experience with Apache Iceberg, Starburst, and Trino for large-scale data processing.
- Proven track record of designing and optimizing ETL/ELT pipelines and cloud-based data workflows.
Technical Skills:
- Proficiency in SQL, including query optimization and performance tuning.
- Experience working with BigQuery, Google Cloud Storage (GCS), and GCP data services.
- Knowledge of data lakehouse architectures, data warehousing, and distributed query engines.
- Hands-on experience with Apache Iceberg for managing large-scale transactional datasets.
- Expertise in Starburst and Trino for federated queries and cross-platform data access.
- Familiarity with Python, Java, or Scala for data pipeline development.
- Experience with Terraform, Kubernetes, or Airflow for data pipeline automation and orchestration.

Preferred Skills:

Understanding of machine learning data pipelines and real-time data processing.
Experience with data governance, security, and compliance best practices.
Exposure to Kafka, Pub/Sub, or other streaming data technologies.
Familiarity with CI/CD pipelines for data workflows and infrastructure-as-code.

We are an Equal Opportunity Employer, including disability/vets.

Required profile