Offer summary

Qualifications:

Bachelor’s or Master’s degree in Computer Science, 5+ years of experience in data engineering, Proficiency in noSQL and graph databases, Experience with cloud platforms (AWS, GCP, Azure), Strong experience with Hadoop and Spark.

Key responsibilities:

Design, build, and maintain scalable data pipelines

Implement data lake solutions on cloud platforms

Ensure data quality, integrity, and security

Collaborate with data scientists for machine learning initiatives

Continuously monitor and optimize data pipelines

Job description

Who are we?

We are a globally expanding software technology company that helps brands communicate more effectively with their audiences. We are looking forward to expand our people capabilities and success in developing high-end solutions beyond existing boundaries and establish our brand as a Global Powerhouse.

We are free to work from wherever we want and go to the office whenever we like!!!

What is the role?

We are looking for a highly skilled and motivated Senior Data Engineer to join our dynamic team. The ideal candidate will have extensive experience in building and managing data pipelines, noSQL databases, and cloud-based data platforms. You will work closely with data scientists and other engineers to design and implement scalable data solutions.

Key Responsibilities:

Design, build, and maintain scalable data pipelines and architectures.
Implement data lake solutions on cloud platforms.
Develop and manage noSQL databases (e.g., MongoDB, Cassandra).
Work with graph databases (e.g., Neo4j) and big data technologies (e.g., Hadoop, Spark).
Utilize cloud services (e.g., S3, Redshift, Lambda, Kinesis, EMR, SQS, SNS).
Ensure data quality, integrity, and security.
Collaborate with data scientists to support machine learning and AI initiatives.
Optimize and tune data processing workflows for performance and scalability.
Stay up-to-date with the latest data engineering trends and technologies.

Detailed Responsibilities and Skills:

Business Objectives and Requirements:

Engage with business IT and data science teams to understand their needs and expectations from the data lake.
Define real-time analytics use cases and expected outcomes.
Establish data governance policies for data access, usage, and quality maintenance.

Technology Stack:

Real-time data ingestion using Apache Kafka or Amazon Kinesis.
Scalable storage solutions such as Amazon S3, Google Cloud Storage, or Hadoop Distributed File System (HDFS).
Real-time data processing using Apache Spark or Apache Flink.
NoSQL databases like Cassandra or MongoDB, and specialized time-series databases like InfluxDB.

Data Ingestion and Integration:

Set up data producers for real-time data streams.
Integrate batch data processes to merge with real-time data for comprehensive analytics.
Implement data quality checks during ingestion.

Data Processing and Management:

Utilize Spark Streaming or Flink for real-time data processing.
Enrich clickstream data by integrating with other data sources.
Organize data into partitions based on time or user attributes.

Data Lake Storage and Architecture:

Implement a multi-layered storage approach (raw, processed, and aggregated layers).
Use metadata repositories to manage data schemas and track data lineage.

Security and Compliance:

Implement fine-grained access controls.
Encrypt data in transit and at rest.
Maintain logs of data access and changes for compliance.

Monitoring and Maintenance:

Continuously monitor the performance of data pipelines.
Implement robust error handling and recovery mechanisms.
Monitor and optimize costs associated with storage and processing.

Continuous Improvement and Scalability:

Establish feedback mechanisms to improve data applications.
Design the architecture to scale horizontally.

Qualifications:

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
5+ years of experience in data engineering or related roles.
Proficiency in noSQL databases (e.g., MongoDB, Cassandra) and graph databases (e.g., Neo4j).
Strong experience with cloud platforms (e.g., AWS, GCP, Azure).
Hands-on experience with big data technologies (e.g., Hadoop, Spark).
Proficiency in Python and data processing frameworks.
Experience with Kafka, ClickHouse, Redshift.
Knowledge of ETL processes and data integration.
Familiarity with AI, ML algorithms, and neural networks.
Strong problem-solving skills and attention to detail.
Excellent communication and teamwork skills.
Entrepreneurial spirit and a passion for continuous learning.