ADI (Aboitiz Data Innovation)
Project SAS Script Migration to Pyspark (CITI – Union Bank)
I. CVI
II. Speed Cash
III. Cross Sell to CITI Gold
IV. Portfolio Segmentation
Client Aboitize Data Innovation, Singapore.
Environment PySpark, Spark SQL, Python, AWS EMR, S3, Glue, Athena, Airflow, Oracle, CML
Role Sr. Data Engineer
Aboitize Data Innovation is providing top class transformative AI Consulting and data-driven IoT and sustainability solutions to business across diverse sectors
The objective of this project is, to migrate the entire CITI Bank’s data platform from SAS to Union Bank’s Pyspark platform by applying updated transformations and created AWS Data Pipelines to optimize the data utilization and informed decision making.
Roles & Responsibilities:
· Performed in-detail Code analysis to start with the migration of SAS scripts to Pyspark and actively participated in client calls to collect the business requirements and highlight the technical dependencies in advance.
· Collecting data from various sources into S3.
· Creating Spark scripts using Python based on the optimized logic and requirements.
· Creating and modifying the Data pipelines and deploying it as required in CML.
· Created documentation for Processes, Coding best practices, Code review guidelines.
· Performed Code reviews before merging Pull requests (PRs).
· Responsible to create DAGs and execute the scripts in the Airflow.
· Responsible to create and perform Data quality checks using Pyspark, SQL and Hive queries.
· Responsible in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
· Collaborated with the infrastructure, network, database, application, and Data Governance team to ensure data quality and availability.
Copyright© Cosette Network Private Limited All Rights Reserved