Bala (RID : 15cyvln00rbdz)

AWS Data Engineer

Location : Jaipur, India, India

Experience : 8 Year

Rate: $18 / Hourly

Availability : Immediate

Work From : Offsite

Category : Information Technology & Services

Shortlisted : 2

Total Views : 71

Key Skills

AWS Hadoop Hive HDFS Pyspark Spark SQL Spark PL/SQL Informatica Oracle PL/SQL Sql Server Azure GCP

Discription

8.6 years of experience in implementing Complete Big Data solutions, including data acquisition, storage, transformation, analytics using Big Data technology that includes Hadoop, Hive, Spark, Python, Sqoop, PL SQL and Informatica.

Building complete data ingestion (ETL) pipeline from traditional databases and

file systems into Hadoop using Hive, spark, Python, Pyspark, Sqoop, and SFTP/SCP

Experienced with different Relational databases like Oracle, SQL Server.

Have good experience on data modeling.

Extensive SQL experience in querying, data extraction and data transformations.

Involved in Spark query tuning and performance optimization

Experienced in creating UNIX shell scripting for Batch jobs.

Experienced in developing business reports by writing complex SQL queries using views, volatile and global temporary tables.

Identifying long running queries, scripts, Spool space issues etc..., implementing appropriate tuning methods.

Reporting errors in Error tables to clients, rectifying known errors and running the scripts.

Following the given standard approaches while restarting and error handling.

Worked with Explain Command and identified Join strategies, issues and bottlenecks.

Written Unit Test cases and submitted Unit test results as per the quality process.

Strong Problem solving & Communication skills and ability to handle multiple projects and able to work in teams or individually.

Cloud exposure to AWS,AZURE and GCP

Experienced in creating dashboards using Tableau.

Project Details

Title : ACOE Analytics

Duration : 14 (Month)

role and responsibileties :

Ingest the data from sources to redshift using python framework.
Understand the data and curate the required data to redshift.
Load the flat files to redshift using lambda.
Involved in creating jobs for data transformation, aggregations using

Glue.

Worked on the control m and dask to schedule the jobs.
Understanding the specification and analyzed data according to client

Requirement.

Involved in Unit testing and preparing test cases
Worked on AWS services like IAM, EC2, S3, Athena, Aurora etc.

Description :

ACOE analytics responsible for maintaining E2E data flow from various sources to AWS EDGE redshift.
This centralized data used to work on the business use cases to build dashboards and analytics.

Title : StitcherX

Duration : 9 (Month)

role and responsibileties :

Created talend jobs to ingest the Bigquery tables data to redshift.

Involved in creating Spark jobs for data transformation, aggregations using

Python in Glue.

Worked on the airflow dags to schedule the jobs.

Worked on creating Data frames using Spark.

Understanding the specification and analyzed data according to client

Requirement.

Involved in Unit testing and preparing test cases

Description :

StitcherX project mainly gets the ingested data from the Bigquery to redshift using talend jobs. Once the data is available in the redshift will be curated for the business requirement. All the services are running on aws and for scheduling the jobs we are using airflow.

Title : Nitro advanced analytics

Duration : 13 (Month)

role and responsibileties :

Involved in creating Spark jobs for data transformation, aggregations using

Python.

Worked on different file formats like parquet, json, CSV,XML, txt etc.

Worked on creating RDD, Data sets and Data frames using Spark.

Developing the hive analytics for the business specific use cases by designing

tables with appropriate formats, partitions and buckets for efficient query

behavior.

Understanding the specification and analyzed data according to client

Requirement.

Developed various distributed application by using Spark

Extensively worked on Apache Sqoop to import/Export the various types of data

from RDBMS such as Oracle, SQL server and PostgreSQL.

Involved in Unit testing and preparing test cases

Description :

The nitro project is to inject various types of source data into the data lake. The data in the warehouse can be used to build data marts, downstream systems and developing reports and analytical models.

Title : ASTRID

Duration : 16 (Month)

role and responsibileties :

Understanding business requirements and transforming complex business logics into workable codes.
Extensively worked in data Extraction, Transformation and loading from source to target system using Spark, Spark SQL, HDFS, Hive, Sqoop.
Worked on Complex queries, joins, Aggregations and applied different join strategies to improve the performance.
Created various scripts using Python/UNIX Shell scripts to process the source files.
Understanding the specification and analyzing data according to client Requirement.
Worked on waterfall and Agile methodologies.
Worked extensively to the sql sub queries, joins, set operations and Functions
Involved in Unit testing and preparing test cases
Worked extensively to the sql sub queries, joins, set operations and Functions

Description :

The astrid project is asset servicing to multiple banking clients. We will process various MT messages and provide data to down streams. data will be used for developing reports and analytical. Complete process will be done in oracle and data will be curated and used in the front end.

Title : XYZ internal cloud projects

Duration : 52 (Month)

role and responsibileties :

Developed ETL mappings, transformations using Informatica Power Center.
Developed mappings using various transformations like update strategy, lookup, stored procedure, Router, Filter, sequence generator, joiner, aggregate transformation and expression.
Understanding the business specification and analyzing data according to client Requirement.
Understanding business requirements and transforming complex business logics into workable codes.
Did Performance tuning for the compute queries on large tables.
Worked on Complex queries, joins, Aggregations and applied different join strategies to improve the performance.
Involved in Unit testing and preparing test cases
Worked extensively to the sql sub queries, joins, set operations and Functions
Created temporary tables Like Volatile Table, Global temporary table to test the data
Preparation of the Technical Design Document, Data Modelling,
Coding in Oracle PL SQL, Unit Testing.
Analyzing SQL statements using EXPLAIN PLAN to improve the performance.
Tuning the performance of the application by rewriting SQL queries.

Description :

Purpose of this project is to build the various applications for the different clients and make sure the code will be reusable in future clients as well.