Data Engineer
Location : Jaipur, India
Experience : 9 Year
Rate: $20 / Hourly
Availability : 1 Week
Work From : Any
Category : Information Technology & Services
Projects
Project Name: StitcherX
Environment: Spark SQL, Python, Pyspark, Talend, ELK, Airflow,
Role : Data Engineer
Duration: (Sep-2021 to till date)
Brief description of the project:
StitcherX project mainly gets the ingested data from the Bigquery to redshift using talend jobs. Once the data is available in the redshift will be curated for the business requirement all the services are running on aws and for scheduling the jobs we are using airflow.
Responsible for:
Created talend jobs to ingest the Bigquery tables data to redshift.
Involved in creating Spark jobs for data transformation, aggregations using Python in Glue.
Worked on the airflow dags to schedule the jobs.
Worked on creating Data frames using Spark.
Understanding the specification and analyzed data according to client Requirement.
Involved in Unit testing and preparing test cases
Project Name: Nitro advanced analytics
Environment : Spark SQL, Python, Hive, Pyspark, Oracle and SQL server
Role : Big data Developer
Subject Area : Banking & Finance
Duration : (Jul-2020 to Aug-2021)
Brief description of the project:
The nitro project is to inject various types of sources data into the data lake. The data in the warehouse can be used to build data marts, downstream systems and developing reports and analytical models.
Responsible for:
Involved in creating Spark jobs for data transformation, aggregations using Python.
Worked on different file formats like parquet, json, CSV,XML, txt etc.
Worked on creating RDD, Data sets and Data frames using Spark.
Developing the hive analytics for the business specific use cases by designing tables with appropriate formats, partitions and buckets for efficient query behavior.
Understanding the specification and analyzed data according to client Requirement.
Developed various distributed application by using Spark
Extensively worked on Apache Sqoop to import/Export the various types of data from RDBMS such as Oracle, SQL server and PostgreSQL.
Involved in Unit testing and preparing test cases