Project Experience Bank of America / Data Engineer May 2022 - Present, Gurgaon
• Working with Spark SQL engine to populate data in Hive tables.
• Analyzed and optimized cluster resources for Spark application for efficient performance.
• Upgrading existing spark application to newer spark version frameworks. Capgemini / Big Data Developer September 2020 - May 2022, Mumbai
• Writing PySpark script to import data into landing zone from third party system.
• Involved in Enrich Layer for data cleaning process using PySpark.
• Spark jobs deployed to AWS EMR cluster and stored the result to Amazon S3 storage
• Extracted the data from MySQL into HDFS using SQOOP.
• Created and worked on SQOOP jobs with incremental load to populate Hive External tables.
• Query designing which involved concept of Partitioning and Bucketing that improved performance by 70%
• Used Spark to load data and create schema RDD and loaded the data into Hive tables. Gemini Solutions Pvt Ltd./ Data Engineer/Data Analyst Feb 2020 - September 2020, Gurgaon
• Used Spark for interactive queries, processing of Streaming data.
• Cleaning and processing unstructured data in Spark and Scala.
• Implemented spark using Scala and Spark SQL for faster testing and processing of data and improved performance by 65%
• Extracted data from different sources and created ETL pipeline using python and visualizing data for Report for management. Wipro Technologies /Trainee(Database Administrator/Python Developer) July 2015 - February 2020, Pune
• Worked as Python Developer and DB administration.