OnBenchMark Logo

Swapnil (RID : a49xldv8vi0f)

designation   Data Engineer

location   Location : Jaipur

experience   Experience : 7 Year

rate   Rate: $26 / Hourly

Availability   Availability : Immediate

Work From   Work From : Offsite

designation   Category : Information Technology & Services

Shortlisted : 1
Total Views : 129
Key Skills
Hadoop Azure Data Engineer Big Data Ecosystems Python Informatica Power BI





Professional with 7+ years of experience in Big Data Ecosystems like (HDFS, YARN, Hive, Impala, Spark) and Cloud technologies like (Azure Databricks, Azure Data factory, Azure Data Lake Storage, Azure Synapse) , Python, Informatica, SQL, PowerBI.

  • Installation, Configuration, and Administration of Hadoop distributions like Cloudera (CDH)
  • Experience in cluster deploying, performance tuning, and administering and monitoring the Hadoop ecosystem
  • Knowledge of Hadoop Ecosystem - HDFS, Yarn, MapReduce, Hive, Hue, Sentry, Impala, Zookeeper, Spark.
  • Hands on Experience in Cloud Technologies like Azure Databricks, Azure Data Factory, Azure Synapse Analytics, Azure Blob Storage, Azure Data Lake Storage.
  • In-depth understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames.
  • Hands on experience in various Big Data application phases like data ingestion, data analytics.
  • Expertise in using Spark-SQL with various data sources like JSON, Parquet, and Hive.
  • Experience in using Hadoop distribution like Cloudera, Azure Databricks.
  • Experience in transferring data from RDBMS to HDFS and Hive table using Azure Data Factory.
  • Experience in creating tables, partitioning, bucketing, loading, and aggregating data using Hive/Impala.
  • Uploaded and processed terabytes of data from various structured and semi structured sources into HDFS.
  • Worked on Informatica Designer Components, created Tasks, Sessions and Workflow using Workflow manager and Workflow Monitor to monitor the Workflows.
  • Extensive experience in extraction, transformation and loading of data directly from different heterogeneous source systems like flat files, Oracle, Netezza.
  • Good knowledge of different schemas (Star and Snowflake) to fit reporting query and business analysis requirements.
  • Developed and supported Informatica mappings with transformation Filter, Router, Expression, Joiner, Aggregator, Lookup, Union, Sequence generator etc.
  • Experience in Slowly Changing Dimensions like SCD1 and SCD2.
  • Involved in Technical and Business meetings with internal teams and high-level management.
  • Hands on experience in Analysis, Design, Coding and Testing phases of Software Development Life Cycle (SDLC).
  • Having good Knowledge of SQL.
  • Worked extensively in team-oriented environment with strong, analytical, interpersonal and communication skills.









Project: Barclays Collect Roles and Responsibilities:

  • Worked on requirement gathering, analysis and translating business requirements into technical design with Hadoop Ecosystem.
  • Extensively used Hive, Spark optimization techniques like Partitioning, Bucketing, Map Join, parallel execution, Broadcast join and Repartitioning.
  • Created partitioned, bucketed Hive tables, loaded data into respective partitions at runtime, for quick downstream access.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Analyzed asset matrix (mapping document) and used to develop PySpark projects in PyCharm.
  • Python modules are built in PyCharm where the entire PySpark logic is developed to achieve business requirements.
  • Used Tivoli to schedule Spark jobs. Also used to trigger spark jobs in both client and cluster mode in lower environments.

Project: SOLAR

Roles and Responsibilities:

  • Worked on multiple Hadoop clusters with 150 nodes on Cloudera distribution 5.x, 6.x
  • Currently working as Hadoop Administrator and responsible for taking care of everything related to the clusters total of 100+ nodes ranges from Non-PROD to PROD clusters.
  • Worked on Kerberized Hadoop cluster.
  • Involved in upgrading CM and CDH.
  • Worked on the cluster, commissioning & decommissioning of Data Nodes.
  • Involved in communication with Cloudera team for cluster tuning.
  • Monitoring Hadoop cluster using Cloudera Manager and make sure that all the services are up and running.
  • Involved in resolving day to day Incidents and implementing change related activities making sure to follow proper ITIL process.
  • Setup data authorization roles for Hive and Impala using Sentry.
  • User management, involving user creation, granting permission for the user to various tables and database, giving group permissions.
  • Managed and reviewed Hadoop log files, fil
Matching Resources
My Project History & Feedbacks
Copyright© Cosette Network Private Limited All Rights Reserved
Submit Query
WhatsApp Icon

stuff goes in here!