Azure Data Engineer
Location : Noida
Experience : 6.5 Year
Rate: $16 / Hourly
Availability : Immediate
Work From : Offsite
Category : Information Technology & Services
WORK EXPERIENCE:
Sr. Data Engineerat Confidential (October 2022 - till Now) :
Currently working as a Sr. Data Engineerat Confidential ..
Environment:
Hadoop, PySPark, MapReduce ,Azure, DBT, ETL, SQL
Responsibilities:
I have worked with strong foundation in Azure cloud technologies, I am well-equipped to design and develop robust, scalable, and secure data solutions on the Azure platform.
I have create a complete pipeline in Azure.
With 3+ years of hands-on experience in Azure Data Factory, Azure Databricks, Azure Data Lake, and Azure Data Warehouse, I've contributed to the seamless integration of structured and unstructured data from various sources, both on-premises and in the cloud. My proficiency in data warehousing, data modeling, and ETL processes has been instrumental in designing efficient data pipelines, optimizing data storage using Azure Data Lake Storage, and orchestrating complex data workflows.
Written multiple PySparkscripts to achieve the desired output as per the requirement.
I've leveraged programming languages like SQL and Python to perform data transformations and analytics, extracting valuable insights for data-driven decision-making. My familiarity with data visualization tools such as Power BI and Tableau enhances my ability to present actionable insights to stakeholders.
Sr. Data Engineer in Impetus (December 2021 - September 2022) :
Worked as a Big Data developer with Impetus Technology.
Environment:
PySpark, Hive, Splunk, Java-Spark, Spark-SQL
Responsibilities:
Developing a Observability Spark utility for the client to use make it available as aopen source code.
This utility is helping developer to see the log file in graphical visualization so that the debug process will be so much easier.
Using Splunk tool to visualize the complete tracing of a data pipeline or Spark/Hive application.
Every granular detail is visualized including Meamory utilization, resource allocation, Tracing, Failed jobs, and Failure reason.
Written multiple PySpark/Hivescripts to test the observability utility output as per the requirement.
Monitoring and trouble shooting the data loading from data lake to give in spark-Java.
Used AWS cloud services to do the PySpark development such as AWS EMR, AWS EC2, AWS S3 and AWS MySql, AWS Athena, AWS Glue, AWS Redshift and AWS Lambda.
Optimized data processing workflows, improving efficiency by 30% and reducing latency by 20%.
Collaborated with cross-functional teams to integrate AWS Glue with other AWS services such as Lambda, S3, and Redshift for end-to-end data processing and analytics.
Written multiple PySparkscripts to achieve the desired output as per the requirement.