OnBenchMark Logo

SUJEET SINGH (RID : yb8plkl4vet6)

designation   Senior Data Engineer

location   Location : Indore, India, India

experience   Experience : 5 Year

rate   Rate: $9 / Hourly

Availability   Availability : 1 Week

Work From   Work From : Offsite

designation   Category : Information Technology & Services

Shortlisted : 0
Total Views : 79
Key Skills
Python PySpark Hadoop Pandas NumPy SAP Lambda AWS Data Pipeline MySQL PostgreSQL AWS Redshift Redshift CSS3 Bootstrap RDS Postgres
Discription
  • Over 5 years of extensive hands-on experience in the IT industry, Python, PySpark, MySQL, SQL Server, AWS services, Redshift, ETL pipelines, Machine Learning Algorithms, Deployment, Kafka, Docker.
  • Designed and developed data pipelines using various technologies such as Python, PySpark, Redis, PostgreSQL, MySQL, ETL pipelines, Redshift, AWS Lambda, AWS Glue and crawler, Docker, and Kubernetes.
  • Familiarity with cloud technologies such as AWS EMR, EKS, Lambda, Glue, Kinesis, Fargate, DMS Service, EC2, S3, Elastic Beanstalk, AWS ECS, Cloud Formation, Athena, Step Functions.
  • Expertise in creating data pipelines from S3 to Redshift using AWS Data Pipeline for Linkedin social media data.
  • Expertise in PySpark SQL to split huge files into smaller files with Transformations and process using warehouse databases.
  • Experience working with Sequence files, ORC, AVR, Parque, CSV, Fixed width and XML formats.
  •  Expertise in long running queries optimization in different-2 databases to achieve better performance
  • Ensure scalability, reliability and security of the back-end system using tools Redis, celery,  Docker, and Kubernetes
  •  Hands on experience designing and building data models and data pipelines on Data Warehouse focus and Data Lakes.
  • Experience working with Redshift for running data pipelines that has huge volumes.
  • Have a good experience inSAP and CRM systems
  • Have good experience creating real time data streaming solutions using Spark Streaming, Kafka, Hadoop
  •  Good experience working with Python oriented to data manipulation, data wrangling and data analysis using libraries like Pandas, NumPy.
  •  Knowledge in Databases like MySQL, PostgreSQL, Oracle and AWS Redshift.
Project Details
Title : Get Email
Duration : 36 (Month)
role and responsibileties :

Technologies: Python, Py Spark AWS Services: S3, Lambda, Glue, Redshift, RDS Postgres, Step Function, Cloud Watch, Redis, EMR, SQS, SNS, Kensis, Quick sight, SAM, EC2, DynamoDB, Redash

Role & Responsibilities:

  • Make a python script for extracting gz format files and make csv or json and read these files and load data into redshift.
  • Write scripts for regular ETL jobs using Lambda, Glue and crawler to process million of unstructured records received from social media
  • Handled data transformation and processing tasks efficiently by using PySpark on EMR
  • Developed framework for Data cleaning source data using Python, SQL, Glue and  crawler
  • Data Transformation from one source to another based on specific scenarios.
  • AWS services deployment using Cloud Formation, SAM, Terraform.
  •  
  • Developed & Migrated the system into Python, RDS PostgreSQL, AWS redshift AWS Lambda, Glue & job scheduling on Redis, Step Functions
  • Provision, configure, and monitor EMR clusters to ensure optimal performance, scalability, and cost efficiency
  • Improved the performance of the file processing of 15 Million records form 1 day  to 5-6 hours
  • Provided parallel file processing ability to the extent possible using the new framework
  • Optimized long running queries to achieve better performance
  • Make data analysis and write SQL scripts for data refinement.
  • Make data visualization using REDASH and AWS Quicksight.
  • Perform crucial transformations and query the loaded data using Hive, SparkSQL and build reporting tables
  • Maintain a regular ETL dashboard and resolve bugs.
  • Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive technical requirements.
  • Perform Text Cleansing by applying various transformations using Spark Data frames and RDDS
  • Gather business requirements and design and develop data ingestion layer and presentation layer.
  • Highly motivated and versatile team player with the ability to work independently

Total Team: 8 Members

Description :

Tool finds the email address of anyone on Earth using the first name, the last name and the domain name of his company. We “scanned” a lot of websites to find the general email format of millions of companies. Then, we recreate the professional email address of the prospect you would like to get in touch with. Our method is extremely simple but the time saving for you is huge! Manually, it would take you around 5 minutes to find an email when it only takes you a few seconds with GetEmail. If you search 20 email addresses per day, GetEmail saves 2 hours of your precious time!


Title : Up Stream
Duration : 18 (Month)
role and responsibileties :

Technologies: Python, PySpark AWS Services : S3, Lambda, Glue, Redshift, RDS PostgreSQL, Step Function, Cloud Watch, Redis, SQS, SNS, Kensis, Quicksight, SAM, EC2, DynamoDB.

Role & Responsibilities:

  • Make a python script for extracting gz format files and make csv or json and read these files and load data into redshift.
  • Analyze user specifications and requirements.
  • Developing the pipeline for ETL to process data from source to AWS RDS Oracle, SAP and CRM systems to data lake.
  • Extracting data from multiple sources, transforming and loading into a database.
  • Testing of code for base and incremental load.
  • Develop and maintain PySpark scripts to process, transform, and analyze data on EMR clusters.
  • Developed full back-end business logic and required database operation in this project.
  • Developed and executed a migration strategy to move Data Warehouse from an Oracle platform to AWS Redshift
  • Implement security best practices to protect data and restrict access to EMR clusters based on IAM (Identity and Access Management) policies.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data into AWS S3 using Python
  • Developed custom aggregate functions using Spark AWS Lambda performed interactive
Description :

Primary expertise in IBM Infosphere DataStage, Identify different business scenarios and derive potential solutions. Transitioning towards next generation products such as Talend, Golden Gate, Python , AWS and Azure.


Title : Schlumberger
Duration : 6 (Month)
role and responsibileties :

Technologies: Python, Django, PostgreSQL, Windows server, Cron jobs, Py Spark, SQL Alchemy.

Role & Responsibilities:

  • Analyze user specifications and requirements.
  • Developing the Code as per the requirements.
  • Unit testing

Total Team: 3 Members

Description :

It’s a Django app which has custom modules to process the content from assets to PostgreSQL which is received through mail and later makes it available for analysis purposes.


Title : NoshList (http://www.noshlist.com)
Duration : 6 (Month)
role and responsibileties :

.

Technologies: Python, Django, DR, Big Query, Webapp2, JavaScript, IOS Push Notifications, MySQL, HTML5, CSS3, Bootstrap.

Role & Responsibilities:

  • Make a python script for extracting gz format files and make csv or Json and read these files and load data into redshift.
  • Analyze user specifications and requirements.
  • Developing the Code as per the requirements.

Total Team: 4 Members

 

Description :

This is a restaurant wait list management app, easily add customers to a wait list and manage from an iPad or smartphone, Send free text messages to customers' phones to let them know their tables are ready..


Title : TicketBird (http://www.ticketbird.com)
Duration : 6 (Month)
role and responsibileties :

Technologies Used: Python, Django, Django, South, Pillow, PostgreSQL, Elastic Beanstalk.

Roles and Responsibilities:

  • Analyze user specifications and requirements.
  • Developing the Code as per the requirements.
  • Developed business logic and required database operation in this project.

 

Description :

Is an app that supports query against their tickets they raised or quickly find their answer any time or provided by support agents.


 
Matching Resources
My Project History & Feedbacks
Copyright© Cosette Network Private Limited All Rights Reserved
Submit Query
WhatsApp Icon
Loading…

stuff goes in here!