Amanpreet, Data Engineer | Bench Resource on Contract

Amanpreet (RID : g8qolht2o3s1)

Data Engineer

Location : Delhi, India

Experience : 8 Year

Rate: $16 / Hourly

Availability : Immediate

Work From : Any

Category : Information Technology & Services

Shortlisted : 2

Total Views : 85

Key Skills

Spark Azure Hive hadoop PYSpark Jira Databricks MySQL Jenkins

Discription

PROFESSIONAL SUMMARY

Total 7 Years of IT Experience
4+ Years of Experience on Big data technologies like Hadoop Spark, Scala, Hive, HBase.
2 Years of Experience in Azure
Working with Spark, Hive and HBase for processing millions of data each day.
Working Experience in handling complex AVRO data format.
Exposure of using Hadoop 2.X in Cloudera distribution.
Sound Knowledge on HDFS architecture and distribution data processing.
Capable of processing large sets of structured, semi-structured and unstructured data.
Capable of writing Spark jobs using Scala.
Improved the performance of Spark-HBase Jobs.
Developed Spark/Scala code for validating Semi Structured Data and loading data to System.
Create hive table on complex HBase data for analysing purposes
Developed Orchestrating Data Pipelines using ADF.
Created CI/CD pipeline for ARM Templates.
Created CI/CD for Databricks Notebooks in Azure Dev Ops.
Used multiple Nodes of Azure to run Spark Codes
Hands on Knowledge on Azure, ADLS, ADF, File Share, Blob Storage
Hands on Knowledge on PySpark
Hands on Knowledge on Scala
Hands on Knowledge on DataBricks
Improve the Performance of Job with Iterative Broadcast.
Knowledge on Kafka, Docker, Kubernetes
Knowledge on AWS,EMR,CloudWatch,Sagemaker,Lambda

TECHNICAL SKILLS

Languages

Scala

Tools

Spark, Databricks, Hadoop, Map Reduce, Jenkins, Jira, Hive, Maven, Dremio, Nifi, Azure ADF, Azure ADLS

Scripting

Python

Operating Systems

Linux, Unix and Windows

IDEs

Eclipse, Intellij, PyCharm

Database

MySQL, HBase

Projects Undertaken at Deltacubes Technology Pvt. Ltd.

Project Name: Depletions

Duration: 3 Year(April 2019- Presents)

Description: Diageo is a worldwide liquor manufacturer and distributor. Data lake team helps the Diageo Business to get the data from all customer/distributor across the world.

So Data Lake team collects the data from different Sources then cleansing, standardizes and harmonizes to get meaningful data for business analytics team.

Responsibilities:-

1.) Cleansing and Standardizing Raw data from multiple file formats (xlsx ,csv, JSON) using Spark/Scala.

2.) Generating Parquet/CSV files after Harmonizing Data for Business Analytics Team

3) Generating CSV files for Anaplan Team with EU Calculations in Blob Storage

4) Generating Parquet files for Sellout Team

5.) Orchestrating Data pipelines using ADF.

6) Creating Views on Dremio on top of ADLS.

7) Parsing JSON for Standardizing Raw Data to get the Rules.

8) Trigger the Pipeline from ADF.

9) Created CI/CD pipeline for ARM Templates.

Technology: Spark, Scala, Dremio, Azure(ADF,ADLS, Blob), Databricks

Projects Undertaken at Tavant Technologies

Project Name: Experian BIS SALT

Duration: 2 Years(Jan 2017- Apr 2019)

Description:

Experian is a consumer credit reporting agency. Experian collects and aggregates information on and over one billion people and businesses. It is one of the ‘Big Three’ credit reporting agencies.

This project is to import the SBFE data that Experian got recently into Big Data system and make it available to internal/external customers for analysis and credit score modelling for any business.

Responsibilities:-

1.) Develop Spark/Scala code for validating Semi Structured Data and loading data to system.

2.) Generate Avro file from CSV file for integrating with external System(One Search) using Spark/Scala.

3.) Create Hive tables for Data Management team.

4.) Make data available for Commercial Data Sciences team for analyzing in SAS.

5.) Run validation job on historical data and make data ready for using.

6.) Support functional testing and Bug Fixing in Spark code.

7.) Write Spark Data Frame code to implement product view rule on processed data.

8.) Write Spark Data Frame to read and analyze nested complex Avro data.

Technology: Spark, Scala, HBase, Hive

Projects Undertaken at AMD India Pvt. Ltd(Contingent Worker Through Magna Infotech)

Project Name: Scan-view