• Design, develop, and maintain scalable, reliable, and secure data pipelines to process large volumes of structured and unstructured healthcare data using PySpark and cloud-based databases.• Collaborate with data architects, data scientists, and analysts to understand data requirements and implement solutions that meet business and technical objectives.• Leverage AWS or Azure cloud services for data storage, processing, and analytics, optimizing cost and performance.• Utilize tools like Airflow for workflow management and Kubernetes for container orchestration to ensure seamless deployment, scaling, and management of data processing applications.• Develop and implement data ingestion, transformation, and validation processes to ensure data quality, consistency, and reliability across various healthcare datasets.• Monitor and troubleshoot data pipelines, proactively identifying and resolving issues to minimize downtime and ensure optimal performance.• Establish and enforce data engineering best practices, ensuring compliance with data privacy and security regulations specific to the healthcare industry.• Continuously evaluate and adopt new tools, technologies, and frameworks to improve the data infrastructure and drive innovation.• Mentor and guide junior data engineers, fostering a culture of collaboration, learning, and growth within the team.• Collaborate with cross-functional teams to align data engineering efforts with broader organizational goals and strategies.• Is familiar with SOC 2 compliance and its impact on company policies and processes.• Understands importance of adhering to SOC 2 requirements and maintains an effort to do so
Requirements:
• Bachelor’s or master’s degree in computer science, Engineering, or a related field.• 3+ years of experience in data engineering, with a strong background in Apache Spark and cloud-based databases such as Snowflake.• Strong Knowledge in Big Data Technologies, PySpark, thorough in one or more programming language like Python• Proven experience with AWS or Azure cloud services for data storage, processing, and analytics.• Expertise in workflow management tools like Airflow and container orchestration systems such as Kubernetes.• Strong knowledge of SQL and NoSQL databases, as well as data modeling and schema design principles.• Familiarity with healthcare data standards, terminologies, and regulations, such as HIPAA and GDPR, is highly desirable.• Excellent problem-solving, communication, and collaboration skills, with the ability to work effectively in cross-functional teams.• Demonstrated ability to manage multiple projects, prioritize tasks, and meet deadlines in a fast-paced environment.• A strong desire to learn, adapt, and contribute to a rapidly evolving data landscape.
Required profile
Experience
Level of experience:Mid-level (2-5 years)
Spoken language(s):
English
Check out the description to know which languages are mandatory.