Srilatha

  • Data Engineer
  • Pune

Rate

$ 4.00 (Hourly)

Experience

6 Years

Availability

Immediate

Work From

Offsite

Skills

PythonAWS

Description

PROJECT DETAILS

 

ADI (Aboitiz Data Innovation)

 

Project                                   SAS Script Migration to Pyspark (CITI – Union Bank)

                                                                               I.           CVI

                                                                             II.           Speed Cash

                                                                           III.           Cross Sell to CITI Gold        

                                                                           IV.           Portfolio Segmentation

Client                                     Aboitize Data Innovation, Singapore.

Environment                        PySpark, Spark SQL, Python, AWS EMR, S3, Glue, Athena, Airflow, Oracle, CML

Role                                       Sr. Data Engineer

 

Aboitize Data Innovation is providing top class transformative AI Consulting and data-driven IoT and sustainability solutions to business across diverse sectors

 

The objective of this project is, to migrate the entire CITI Bank’s data platform from SAS to Union Bank’s Pyspark platform by applying updated transformations and created AWS Data Pipelines to optimize the data utilization and informed decision making.

 

Roles & Responsibilities:

·    Performed in-detail Code analysis to start with the migration of SAS scripts to Pyspark and actively participated in client calls to collect the business requirements and highlight the technical dependencies in advance.

·    Collecting data from various sources into S3.

·    Creating Spark scripts using Python based on the optimized logic and requirements.

·    Creating and modifying the Data pipelines and deploying it as required in CML.

·    Created documentation for Processes, Coding best practices, Code review guidelines.

·    Performed Code reviews before merging Pull requests (PRs).

·    Responsible to create DAGs and execute the scripts in the Airflow.

·    Responsible to create and perform Data quality checks using Pyspark, SQL and Hive queries.

·    Responsible in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.

·    Collaborated with the infrastructure, network, database, application, and Data Governance team to ensure data quality and availability.

Submit Query icon