OnBenchMark Logo


Data Engineer
Location Pune
Total Views77
back in time
Member since30+ Days ago
 back in time
Contact Details
phone call {{contact.cdata.phone}}
phone call {{contact.cdata.email}}
Candidate Information
  • User Experience
    Experience 6 Year
  • Cost
    Hourly Rate$4
  • availability
  • work from
    Work FromOffsite
  • check list
    CategoryEngineering & Design
  • back in time
    Last Active OnJune 17, 2024
Key Skills

(Computer Science)



ADI (Aboitiz Data Innovation)


Project                                   SAS Script Migration to Pyspark (CITI – Union Bank)

                                                                               I.           CVI

                                                                             II.           Speed Cash

                                                                           III.           Cross Sell to CITI Gold        

                                                                           IV.           Portfolio Segmentation

Client                                     Aboitize Data Innovation, Singapore.

Environment                        PySpark, Spark SQL, Python, AWS EMR, S3, Glue, Athena, Airflow, Oracle, CML

Role                                       Sr. Data Engineer


Aboitize Data Innovation is providing top class transformative AI Consulting and data-driven IoT and sustainability solutions to business across diverse sectors


The objective of this project is, to migrate the entire CITI Bank’s data platform from SAS to Union Bank’s Pyspark platform by applying updated transformations and created AWS Data Pipelines to optimize the data utilization and informed decision making.


Roles & Responsibilities:

·    Performed in-detail Code analysis to start with the migration of SAS scripts to Pyspark and actively participated in client calls to collect the business requirements and highlight the technical dependencies in advance.

·    Collecting data from various sources into S3.

·    Creating Spark scripts using Python based on the optimized logic and requirements.

·    Creating and modifying the Data pipelines and deploying it as required in CML.

·    Created documentation for Processes, Coding best practices, Code review guidelines.

·    Performed Code reviews before merging Pull requests (PRs).

·    Responsible to create DAGs and execute the scripts in the Airflow.

·    Responsible to create and perform Data quality checks using Pyspark, SQL and Hive queries.

·    Responsible in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.

·    Collaborated with the infrastructure, network, database, application, and Data Governance team to ensure data quality and availability.

Copyright© Cosette Network Private Limited All Rights Reserved
Submit Query
WhatsApp Icon