Python Gen Pyspark API and SQL

Remote: 
Full Remote
Contract: 
Work from: 

Offer summary

Qualifications:

Any graduation with 3-10 years of experience., Proficiency in Python, especially with data libraries like Pandas and NumPy., Solid hands-on experience with PySpark and Spark SQL for distributed data processing., Expertise in writing complex SQL queries and designing scalable data pipelines..

Key responsibilities:

  • Develop and optimize PySpark jobs for big data processing.
  • Build scalable batch or streaming data pipelines using PySpark.
  • Develop REST APIs for data access and automation using frameworks like FastAPI or Flask.
  • Automate ETL workflows and integrate them into orchestration tools like Airflow.

Black and White Business Solutions Private Ltd logo
Black and White Business Solutions Private Ltd Human Resources, Staffing & Recruiting SME https://www.blackwhite.in
51 - 200 Employees
See all jobs

Job description











































































































































































































































































Company Name :

 

Job Title :

Python Gen Pyspark API and SQL

Qualification :

Any graduation

Experience :

3 -10 YEARS

Must Have Skills :

Python (Core + Data Libraries) – Proficiency in Python, especially for data manipulation using Pandas, NumPy, etc.

  • PySpark – Solid hands-on experience with distributed data processing using PySpark and Spark SQL.

  • RESTful API Development – Ability to build and consume APIs using Flask, FastAPI, or Django.

  • SQL (Advanced) – Expertise in writing complex SQL queries, tuning, and data modeling.

  • ETL and Data Pipelines – Experience in designing and implementing scalable data pipelines in distributed environments.


Good to Have Skills :

Apache Airflow or Other Workflow Orchestration Tools – Knowledge of scheduling and monitoring data pipelines.

  • Delta Lake / Apache Hudi / Data Lakehouse Architecture – Familiarity with modern data storage formats.

  • Cloud Platforms (AWS/GCP/Azure) – Experience working with cloud-based data services like AWS EMR, Azure Synapse, or GCP Dataproc.

  • Data Quality and Validation Frameworks – Use of tools like Great Expectations or custom validations.

  • Containerization (Docker/Kubernetes) – Understanding of containerizing Spark or API services for deployment.


Roles and Responsibilities :

Develop and Optimize PySpark Jobs
Build scalable batch or streaming data pipelines using PySpark for big data processing.

  • API Design and Integration
    Develop REST APIs for data access and automation using Python frameworks like FastAPI or Flask.

  • SQL Development and Tuning
    Write, optimize, and maintain complex SQL queries for data extraction, transformation, and reporting.

  • Data Pipeline Automation
    Build automated ETL workflows and integrate them into orchestration tools like Airflow or cloud-native solutions.

  • Collaboration and Documentation
    Work closely with data engineers, analysts, and business stakeholders; maintain clear documentation for code, APIs, and processes


Location :

Hyderabad ,Bangalore and chennai

CTC Range :

20 -30 LPA

Notice period :

Immediate

Shift Timings :


Mode of Interview :

VIRTUAL

Mode of Work :


Mode of Hire :


Note :




Required profile

Experience

Industry :
Human Resources, Staffing & Recruiting
Spoken language(s):
English
Check out the description to know which languages are mandatory.

Other Skills

  • Collaboration

Related jobs