Projects:
● Data Pipeline and ETL (December 2015 – May 2018)
● Designation – Data Engineer
● Organization - Tecblic
● Client – Technology Company, Portland, ME
o Designed and implemented a data pipeline to process semi-structured data from 100 million raw records across 14 data sources.
o Integrated data from third-party APIs to customize landing pages, resulting in a 6% improvement in paid conversion rate.
o Utilized GCP (Google Cloud Platform) services such as Google Cloud Storage and BigQuery for data processing and storage.
o Architected the data pipeline for a new product, supporting rapid scaling to 60,000 daily users.
o Led the migration from Oracle to Redshift, achieving a performance increase of 14% and saving $750,000 in 2017.
● Data Modeling and ETL Pipeline creation (June 2018 – August 2019)
● Designation – Data Engineer
● Organization - Tecblic
● Client – Health Care Company, NY o Enhanced and integrated web-based Electronic Health Records (EHR) for a Health Care Company in NY. o Collected data from SQL RDBMS, Document Files, Webpages, and APIs. o Utilized PySpark for data manipulations and processing data in parallel. o Deployed data for visualization, analytics, and front-end development using Airflow for workflow orchestration.
● ETL Process (August 2019 – Feb 2020)
● Designation – Data Engineer
● Organization - Tecblic
● Client – Payment Processing Company, CA
o The company is Irish-American financial services and software as a service company.
o Ingested streaming and transactional data across 9 diverse primary data sources using Spark, Redshift, S3, and Python.
o Created Python library to parse and reformat data from external vendors, reducing error rate in the data pipeline by 12%
o Automated ETL processes across billions of rows of data, which saved 45 hours of manual hours per month
o Built tools to provide real-time data around international currency exchange, reducing latency by 15%