SUJEET SINGH, Senior Data Engineer | Bench Resource on Contract

SUJEET SINGH (RID : yb8plkl4vet6)

Senior Data Engineer

Location : Indore, India, India

Experience : 5 Year

Rate: $9 / Hourly

Availability : 1 Week

Work From : Offsite

Category : Information Technology & Services

Shortlisted : 0

Total Views : 79

Key Skills

Python PySpark Hadoop Pandas NumPy SAP Lambda AWS Data Pipeline MySQL PostgreSQL AWS Redshift Redshift CSS3 Bootstrap RDS Postgres

Discription

Over 5 years of extensive hands-on experience in the IT industry, Python, PySpark, MySQL, SQL Server, AWS services, Redshift, ETL pipelines, Machine Learning Algorithms, Deployment, Kafka, Docker.
Designed and developed data pipelines using various technologies such as Python, PySpark, Redis, PostgreSQL, MySQL, ETL pipelines, Redshift, AWS Lambda, AWS Glue and crawler, Docker, and Kubernetes.
Familiarity with cloud technologies such as AWS EMR, EKS, Lambda, Glue, Kinesis, Fargate, DMS Service, EC2, S3, Elastic Beanstalk, AWS ECS, Cloud Formation, Athena, Step Functions.
Expertise in creating data pipelines from S3 to Redshift using AWS Data Pipeline for Linkedin social media data.
Expertise in PySpark SQL to split huge files into smaller files with Transformations and process using warehouse databases.
Experience working with Sequence files, ORC, AVR, Parque, CSV, Fixed width and XML formats.
Expertise in long running queries optimization in different-2 databases to achieve better performance
Ensure scalability, reliability and security of the back-end system using tools Redis, celery, Docker, and Kubernetes
Hands on experience designing and building data models and data pipelines on Data Warehouse focus and Data Lakes.
Experience working with Redshift for running data pipelines that has huge volumes.
Have a good experience inSAP and CRM systems
Have good experience creating real time data streaming solutions using Spark Streaming, Kafka, Hadoop
Good experience working with Python oriented to data manipulation, data wrangling and data analysis using libraries like Pandas, NumPy.
Knowledge in Databases like MySQL, PostgreSQL, Oracle and AWS Redshift.

Project Details

Title : Get Email

Duration : 36 (Month)

role and responsibileties :

Technologies: Python, Py Spark AWS Services: S3, Lambda, Glue, Redshift, RDS Postgres, Step Function, Cloud Watch, Redis, EMR, SQS, SNS, Kensis, Quick sight, SAM, EC2, DynamoDB, Redash

Role & Responsibilities:

Make a python script for extracting gz format files and make csv or json and read these files and load data into redshift.
Write scripts for regular ETL jobs using Lambda, Glue and crawler to process million of unstructured records received from social media
Handled data transformation and processing tasks efficiently by using PySpark on EMR
Developed framework for Data cleaning source data using Python, SQL, Glue and crawler
Data Transformation from one source to another based on specific scenarios.
AWS services deployment using Cloud Formation, SAM, Terraform.
Developed & Migrated the system into Python, RDS PostgreSQL, AWS redshift AWS Lambda, Glue & job scheduling on Redis, Step Functions
Provision, configure, and monitor EMR clusters to ensure optimal performance, scalability, and cost efficiency
Improved the performance of the file processing of 15 Million records form 1 day to 5-6 hours
Provided parallel file processing ability to the extent possible using the new framework
Optimized long running queries to achieve better performance
Make data analysis and write SQL scripts for data refinement.
Make data visualization using REDASH and AWS Quicksight.
Perform crucial transformations and query the loaded data using Hive, SparkSQL and build reporting tables
Maintain a regular ETL dashboard and resolve bugs.
Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive technical requirements.
Perform Text Cleansing by applying various transformations using Spark Data frames and RDDS
Gather business requirements and design and develop data ingestion layer and presentation layer.
Highly motivated and versatile team player with the ability to work independently

Total Team: 8 Members

Description :

Tool finds the email address of anyone on Earth using the first name, the last name and the domain name of his company. We “scanned” a lot of websites to find the general email format of millions of companies. Then, we recreate the professional email address of the prospect you would like to get in touch with. Our method is extremely simple but the time saving for you is huge! Manually, it would take you around 5 minutes to find an email when it only takes you a few seconds with GetEmail. If you search 20 email addresses per day, GetEmail saves 2 hours of your precious time!

Title : Up Stream

Duration : 18 (Month)

role and responsibileties :

Technologies: Python, PySpark AWS Services : S3, Lambda, Glue, Redshift, RDS PostgreSQL, Step Function, Cloud Watch, Redis, SQS, SNS, Kensis, Quicksight, SAM, EC2, DynamoDB.

Role & Responsibilities:

Make a python script for extracting gz format files and make csv or json and read these files and load data into redshift.
Analyze user specifications and requirements.
Developing the pipeline for ETL to process data from source to AWS RDS Oracle, SAP and CRM systems to data lake.
Extracting data from multiple sources, transforming and loading into a database.
Testing of code for base and incremental load.
Develop and maintain PySpark scripts to process, transform, and analyze data on EMR clusters.
Developed full back-end business logic and required database operation in this project.
Developed and executed a migration strategy to move Data Warehouse from an Oracle platform to AWS Redshift
Implement security best practices to protect data and restrict access to EMR clusters based on IAM (Identity and Access Management) policies.
Configured Spark streaming to receive real time data from the Kafka and store the stream data into AWS S3 using Python
Developed custom aggregate functions using Spark AWS Lambda performed interactive

Description :

Primary expertise in IBM Infosphere DataStage, Identify different business scenarios and derive potential solutions. Transitioning towards next generation products such as Talend, Golden Gate, Python , AWS and Azure.

Title : Schlumberger

Duration : 6 (Month)

role and responsibileties :

Technologies: Python, Django, PostgreSQL, Windows server, Cron jobs, Py Spark, SQL Alchemy.

Role & Responsibilities:

Analyze user specifications and requirements.
Developing the Code as per the requirements.
Unit testing

Total Team: 3 Members

Description :

It’s a Django app which has custom modules to process the content from assets to PostgreSQL which is received through mail and later makes it available for analysis purposes.

Title : NoshList (http://www.noshlist.com)

Duration : 6 (Month)

role and responsibileties :

Technologies: Python, Django, DR, Big Query, Webapp2, JavaScript, IOS Push Notifications, MySQL, HTML5, CSS3, Bootstrap.

Role & Responsibilities:

Make a python script for extracting gz format files and make csv or Json and read these files and load data into redshift.
Analyze user specifications and requirements.
Developing the Code as per the requirements.

Total Team: 4 Members

Description :

This is a restaurant wait list management app, easily add customers to a wait list and manage from an iPad or smartphone, Send free text messages to customers' phones to let them know their tables are ready..

Title : TicketBird (http://www.ticketbird.com)

Duration : 6 (Month)

role and responsibileties :

Technologies Used: Python, Django, Django, South, Pillow, PostgreSQL, Elastic Beanstalk.

Roles and Responsibilities: