OnBenchMark Logo

Shashikumar

CLOUD DATA ENGINEER
placeholder
Location Bangalore
view-icon
Total Views51
availability
Shortlist0
back in time
Member since30+ Days ago
 back in time
Contact Details
phone call {{contact.cdata.phone}}
phone call {{contact.cdata.email}}
Candidate Information
  • User Experience
    Experience 10 Year
  • Cost
    Hourly Rate$11
  • availability
    AvailabilityImmediate
  • work from
    Work FromOffsite
  • check list
    CategoryInformation Technology & Services
  • back in time
    Last Active OnMay 19, 2024
Key Skills
PySparkAWSGCPGlueAirflowLambdaDynamodbRedshifSQLShellscriptBigquery
Summary

Project Experience: Senior Cloud Consultant | Pythian Feb 2022-Present

✔ Worked for a Booking.com project. o Understanding the requirement and implementing the data pipeline streaming project. o Creating Kafka topic and deploying Kafka s3 sink connector. o Developed a Glue job using pyspark for compacting small size files with respect to the given size required and it has handled the outage. o Developed an python application for automate testing with respect to validate the data o Optimizing and tuning the Glue job and s3 sink connectors. o Using DynamodB for the audit purpose in the Glue job. o Documentation and providing KT to other teams

✔ Worked on an EDP project. o Used Airflow Composer for creating tasks and scheduling the jobs. o Developed an application for transferring data from s3 bucket to gcs storage using cloud run, docker template and python post method using flask app. o Developed a dataflow job using Apache Beam pipeline for processing data and finally writing output data to Bigquery. o Developed a Data pipeline for the Retail project where the data moved from source (gcs storage) to big query tables using Reference postgres tables using data proc for creating spark clusters and Airflow, Cloud function triggers, Pub/sub. o Source system Kafka pushes the data to GCS storage and whenever the data resides the cloud function triggers the pub sub with the event and depending on metadata table information the data will be loaded to big query table. Senior Production Software Engineer | Cerner Jan 2021-Feb 2022

✔ Developed a Data pipeline for the synapse project using Hadoop ecosystem (Hive, Oozie, Spark), Python, Git, Jenkins and AWS. ✔ Refactoring the code, analyzing the data and implementing the code as per catalog or specific client request using Spark, AWS and Python.

✔ Scheduling the job using Oozie, investigating the counter drops for the specific pipeline and providing the justification of the consultants.

✔ Created Splunk alerts to monitor the runs (batch jobs, incremental jobs and readmission jobs) in the productions.

✔ Creating the Splunk dashboard for the RES team for monitoring the staleness in the data and also for monitoring the on prem clusters for the ops team. Software Development Specialist | NTT DATA Services Feb 2019-Dec 2020

✔ Developed a Data pipeline for the multi-layer DNA project for BCBSNC client(Health Care domain) using Hadoop Ecosystem(Hadoop, Hive, Spark), Python, Scala, Dataiku, Unix, Sql, Teradata and AWS.

✔ Migrating the existing projects to cloud based solutions for optimizing the solutions, reducing the cost and also handling the huge amount of data. ✔ Integrated Dataiku and Teradata for Control M job Scheduling.

✔ Worked on many projects like Underwriter, Chub, VBC, and Automation of Datalake Dashboard.

✔ Created a Package in python for SFTP transfer and alias file creation.

✔ Created a framework for data pipeline and data ingestion in python

Copyright© Cosette Network Private Limited All Rights Reserved
Submit Query
WhatsApp Icon