Ø Having more than 10 years of experience in IT Industry as Senior DataAnalyst,extenseivly worked in Big-Data analytics,Hadoop Ecosystems,PySpark,Estimations, requirement analysis, Architecture, design development ,extraction, storage, processing, querying.
Ø Hadoop Architect/developer with 8+ years of working experience on Hadoop Components including Hadoop, Spark, SparkSql, Hadoop Eco-System,Hive,Hue,Sqoop, Spark Streaming,Kafka, Impala and Alteryx Data Analytical tool.
Ø Built a Machine learning workbench for classification problems; For comparison of model accuracy used SVM, NVB, KNN and Logistic Regression.
Ø Algorithms were developed using PySpark(Liner Regression,3Sigma,Seasonality and Detrending) code and implemented as one Framework.
Ø Experience in designing and developing applications in Spark using Scala.
Ø Have the knowledge on GCP and AWS environments.
Ø Applying performance optimization techniques on Hive, Sqoop and Spark.
Ø Design, Develop, validate ETL processes in Spark using Scala with EZFLOW Framework.
Ø Built a predictive models from Document data to predict using Machine learning.
Ø Pre-processing done by python Pandas and NumPy; Parameter tuning with kernels linear, polynomial.
Ø Designing and Development of different facts and dimension tables based on the client requirement.
Ø Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase.
Ø Experience on importing and exporting data using Kafka stream processing platforms.
Ø Data cleansing for validation/verification and report generation using Alteryx.
Ø Currently working on Agile Development Software process.
Ø Having experience on Alterx Data Analytics tool and implemented the Macro and Apps.
Ø Developed Alteryx applications for most of the requirements.
Ø Agility and ability to adapt quickly to changing requirements and scope & priorities
Ability to quickly learn new technologies in a dynamic environment
Sep 2017 – Till Date – Hadoop Architect/Developer(Lead)-BankingProject
Environment Hadoop,HDFS,Hive,PySpark,Sparksql,Scala,Hue,Autosys,JIRA,Agile-Model, Regression,3Sigma,Seasonality and Detrending.
Ø Requirements Gathering: Involved in blueprint document, effort estimate, project sprint planning, identifying gaps in the system, & discuss at kick-off meeting, and explaining the insight which helps to expand Business Opportunities.
Ø Customer Meeting: Frequent/Bi-weekly calls with Key Stake Holders, make sure of the business requirement is captured correctly and scheduling the delivery as per plan.
Ø Exploratory Data Analysis: By analysing the input datasets/features, implemented multiple plots/subplots using python. For missing data, applied Imputer/ label encoder/ one-hot encoder/ get dummies to get proper data management.
Ø Problem Identifying: Identified either classification or regression problem by analysing the dependent variables.
Ø Bigdata Processing: Worked on huge amount of data using Spark, SparkSQL, RDD, Data frames. And created partitions, bucketing in Hive to handle structured data. Involved in moving all log files generated from various sources to HDFS for further processing through Kafka, Spark.
Ø SQL to Dataframe: Involved in converting Hive/SQL into Spark transformations using Spark RDDs.
Ø Feature Engineering Methods: Identified the better relevant features using feature selection methods & engineering them as per the business requirement.
Ø Algorithm Selection: Upon identifying the problems, implemented the necessary ML Algorithms like Logistic Regression, Random Trees, Support Vector Machine, Naïve Bayes.
Model Training: Used Stratified KFold/Cross Validation, data is being split into train and test dataset and then trained the model