Shivani, Data Engineer | Bench Resource on Contract

Shivani (RID : yb8plkl669a5)

Data Engineer

Location : Indore, India, India

Experience : 3.5 Year

Rate: $7 / Hourly

Availability : 1 Week

Work From : Offsite

Category : Information Technology & Services

Shortlisted : 0

Total Views : 73

Key Skills

Python PySpark Pandas NumPy Django AWS services MySQL PostgreSQL RDS Oracle MongoDB AWS Data Pipeline Kensis

Discription

Over 3.5+ years of extensive hands-on experience in the IT industry, Python, MySQL, Oracle, AWS services, Redshift, ETL pipelines, Machine Learning Algorithms, Deployment, Kafka, Docker.
Familiarity with cloud technologies such as AWS Lambda, Glue, Athena, Step Functions, EC2, S3, Kinesis, AWS ECS, Cloud Formation, Fargate, DMS Service, ElasticBeanstalk
Designed and developed data pipelines using various technologies such as Python, Redis, Postgres, MySQL, BigQuery, ETL pipelines, Snowflake, Redshift, AWS Services, Docker, and Kubernetes.
Ensure scalability, reliability and security of the backend system using tools Redis, celery, Docker, and Kubernetes
Expertise in Query optimization for redshift, postgres query.
Hands on experience designing and building data models and data pipelines on Data Warehouse focus and Data Lakes.
Experience working with Redshift for running data pipelines that have huge volumes.
Have good experience creating real time data streaming solutions using Spark Streaming and Kafka.
Experience with MVC framework as Django, Flask frameworks.
Experience working with Sequence files, ORC, AVR, Parque, CSV, Fixed width and XML formats.
Good experience working with Python oriented to data manipulation, data wrangling and data analysis using libraries like Pandas, NumPy, Scikit-Learn and Matplotlib.
Knowledge in Databases like MySQL, PostgreSQL, Oracle and AWS Redshift
Expertise in creating data pipelines from S3 to Redshift using AWS Data Pipeline for Linkedin social media data.
Expertise in PySpark SQL to split huge files into smaller files with Transformations and process using warehouse databases.
Expertise in long running queries optimization in different-2 databases to achieve better performance
Ensure scalability, reliability and security of the back-end system using tools Redis, celery, Docker
Hands on experience designing and building data models and data pipelines on Data Warehouse focus and Data Lakes.

Project Details

Title : Get Email

Duration : 18 (Month)

role and responsibileties :

Tech Stack

Python, Py Spark AWS Services: S3, Lambda, Glue, Redshift, RDS Function, Cloud Watch, Redis, SQS, SNS ,Kensis, Quick sight, SAM, EC2, DynamoDB, Redash.

Responsibilities

Make a python script for extracting gz format files and make csv or json and read these files and load data into redshift.
Write scripts for regular ETL jobs using Lambda, Glue and crawler to process millions of unstructured records received from social media
Developed framework for Data cleaning source data using Python, SQL, Glue and crawler
Data Transformation from one source to another based on specific scenarios.
AWS services deployment using SAM.
Developed & migrated the system into Python, RDS PostgreSQL, AWS redshift AWS Lambda, Glue & job scheduling on Redis, Step Functions
Improved the performance of the file processing of 15 million records form 1 day to 5-6 hours
Provided parallel file processing ability to the extent possible using the new framework
Optimized long running queries to achieve better performance
Make data analysis and write SQL scripts for data refinement.
Make data visualization using REDASH and AWS Quick sight.

Team Size

8 Members

Description :

Tool finds the email address of anyone on Earth using the first name, the last name and the domain name of his company. We “scanned” a lot of websites to find the general email format of millions of companies. Then, we recreate the professional email address of the prospect you would like to get in touch with. Our method is extremely simple but the time saving for you is huge! Manually, it would take you around 5 minutes to find an email when it only takes you a few seconds with GetEmail. If you search 20 email addresses per day, GetEmail saves 2 hours of your precious time!

Title : Upstream

Duration : 12 (Month)

role and responsibileties :

Tech Stack

Python, Py Spark AWS Services: S3, Lambda, Glue, Redshift, RDS PostgreSQL, Step Function, Cloud Watch, Redis, SQS, SNS, Kensis, Quick sight, SAM, EC2, DynamoDB.

Responsibilities

Make a python script for extracting gz format files and make csv or json and read these files and load data into redshift.
Analyze user specifications and requirements.
Developing the pipeline for ETL to process data from source to AWS RDS Oracle, SAP and CRM systems to data lake.
Extracting data from multiple sources, transforming and loading into a database.
Testing of code for base and incremental load.
Developed full back-end business logic and required database operation in this project.
Developed and executed a migration strategy to move Data Warehouse from an Oracle platform to AWS Redshif
Configured Spark streaming to receive real time data from the Kafka and store the stream data into AWS S3 using Python
Developed custom aggregate functions using Spark AWS Lambda performed interactive

Team Size

4 Members

Description :

Primary expertise in IBM Infosphere DataStage, Identify different business scenarios and derive potential solutions. Transitioning towards next generation products such as Talend, Golden Gate, Python , AWS and Azure.

Title : Foresightee

Duration : 6 (Month)

role and responsibileties :

Tech Stack

ETL Pipeline Development using Python, Django, AWS Glue, AWS Lambda, AWS S3, Quick Sight, Terraform, Redshift, SQS, MySQL.

Responsibilities

Maintaining the MySQL Database for Daily sales data for various stores.
Automating the entire process from acquiring Sales data to maintaining it in MySQL database.
Checking on to Daily AWS Glue Job running each Sales, Promotion, Prediction Data.
Deployed Script of Email Parser In AWS Lambda Using Terraform from where we receive the Sales, and Promotion File on daily basis.
Also Deployed SFTP Parser from where we acquire the prediction file, using Docker image through terraform.
Team Size 4 Members

Description :

Foresightee has multiple channels (email, FTP, and Amazon S3) for sending unprocessed data to the pipeline. This provided flexibility to the client and ensured the continuous feeding of data to ETL pipelines. Transferred the processed data to the forecasting team for developing forecasting models and made a backup of the processed data on AWS S3. Received the forecast results from the forecasting team and processed them using the developed ETL pipelines as per the acquisition of the client. Backed up the forecast results as well as the processed data in AWS S3. Made use of Amazon QuickSight to enable the user to create visualizations and perform data analysis on processed data.

Title : TicketBird

Duration : 4 (Month)

role and responsibileties :

Tech Stack

Python, Django, SQLite, Django-CMS, South, Pillow, PostgreSQL, Elastic Beanstalk.

Responsibilities

Developing the Code as per the requirements.
Developed full back-end business logic and required database operation in this project.
Redshift uses a columnar storage technique, where data is stored column-wise rather than row-wise.
It allows for efficient compression, faster query performance, and reduced I/O operations, resulting in improved overall query performance.

Team Size 4 Members

Description :

This is an app that supports query against the tickets they raised or quickly find their answer any time or provided by support agents.