OnBenchMark Logo

Sankha Subhra (RID : 6v6vlv55xqua)

designation   Machine Learning Engineer

location   Location : Chennai, India

experience   Experience : 4 Year

rate   Rate: $10 / Hourly

Availability   Availability : Immediate

Work From   Work From : Any

designation   Category : Information Technology & Services

Shortlisted : 1
Total Views : 9
Key Skills
R SAS C C++ Java Tensorflow OpenCV, Scikit, Se PyTorch LangChain Git SVN, BitBucket Tableau QlikView, PowerBI SQL server MongoDB, MySQL, Po AWS Azure, GCP AWS S3 AWS Redshift, AWS Glue AWS Glue Airflow, Apache Beam Dataiku Alteryx Flask FastAPI
Discription

 



SANKHA1.github.io
Skills
Programming Languages:– Python, R, SAS, C, C++, Java
Deep Learning frameworks:– PyTorch, LangChain
Programming libraries:– Tensorflow, OpenCV, Scikit, Seaborn, Numpy, Pandas, Matplotlib, PySpark, Theano
Configuration management:– Git, SVN, BitBucket
Visualization Tools:– Tableau, QlikView, PowerBI
Databases:– SQL server, MongoDB, MySQL, PostgreSQL, Neo4J,AWS DynamoDB
CI/CD tools:– Jenkins, Github Actions
Cloud Platforms:– AWS, Azure, GCP
Cloud Tools:– AWS S3, AWS Redshift, AWS Glue
Data Engineering Tools:– AWS Glue, Airflow, Apache Beam, BigTable,Hadoop, Hive
Analytics Platform:– Dataiku, Alteryx
API Frameworks:– Flask, FastAPI
Work Experience
Chryselys|Data Insights Analyst|ChennaiMay 2022 – Jan 2024
•Project:-Chatbot creation with LLAMA2 7B model
Objective:-Worked in the area ofConversational AI. Developed a chatbot usingLLAMA 2 7Bmodel with0.9 BLEU scoreoverPyTorchframework.
Details:-
i)Employed theSentence Transformersmodel ofHuggingFaceto generate embeddings from textual chunks within documents.
ii) Established aVector Database, organized into multiple collections for various studies, and storing all embeddings within usingGraphDB Neo4j. TheLLAMA-Indexfacilitated efficient querying and comparison of multiple documents.
iii) Integrated Neo4j withLangchainand queried the graph database and got results.
iv) PerformedInstruction finetuningover theLLAMA 2 7Bmodel. UsedRAGbased approach over the finetunedLLAMA 2 7Bmodel and passed the results through it to get the outputs.
v) Deployed theDockerimage of the model usingAmazon Sagemakeron aGPUbased instance and createdCI/CD pipelinefor the same usingJenkins. UsedKubeflowfor creating the data pipeline. UseGithubfor versioning. Created aCustom APIto create the frontend of the chatbot.
vi) Implemented monitoring and logging solutions forKubernetesclusters, utilizingPrometheusand ELK stack, contributing to enhanced observability and proactive issue resolution.
vii) Saved the chat history inMongoDBandAWS DynamoDB.
viii) Provided ongoing support forKubernetesclusters, troubleshooting issues, and applying updates to maintain optimal performance and security.
•Project:-Abstractive Text Summarization of interview videos
Objective:-DevelopedNLP modelto doAbstractive Text Summarization, Sentiment Analysisof interview videos and to know people from which race and ethnicity are getting included in a clinical trial by the doctors.
Details:-
i) Extracted audio from videos usingMoviepymodule and usedGoogle Speech RecognitionAPI to transcribe the audio files.
ii) PerformedNER(Named Entity Recognition) over the text to get the names of specific places mentioned in the text.
iii) Broke down the text into individual words (tokens) to create a vocabulary. Ensure that all input sequences have the same length by padding shorter sequences or truncating longer ones. Convertsentiment labels(positive/negative) into numerical values (e.g., 0 for negative, 1 for positive).
iv) Used theReinforcement Learning model GPT-3andBERTto summarize the text. UsedTensorflowfor the same.
•Project:-Patient Classification model for drug usage
Objective:-
Worked in theHealthcaredomain and created aMachine Learningmodel for aClassification Problemwhich predicted the number of patients who are going to switch their drugs in the next 3 months using historical drug usage data and demographic data.
Details:-
i) PerformedExploratory Data Analysis,Data preprocessing,Data cleaning,Model buildingandModel Validationover historical data fromIQVIA LAADdataset.
ii) UsedCatBoostandXGBoostmodel with 82% accuracy.
iii) Published the final report inPowerBIdashboard.
•Worked extensively onDataikufor deployment of ML models.
•Collaborated cross-functionally with teams to design, implement, and iterate on data-driven products and solutions, aligning with organizational goals and priorities.
•Performed advancedData Miningtechniques likeRegression Analysis.
•CreatedREST APIs usingFlask. AutomatedQlikViewandMS Excelreports usingPython.UsedOOPSconcept for the same.
•Developed aWeb Scrapping pipelineto automatically log into customer website usingBeautifulSoup(BS4)and find specific informations from images usingPytesseract.
•Created acron jobusingAmazon CloudWatchwhich will run a python script inAmazon Sagemakeronce every week on a specified time and create report and send them to stakeholders.
•Developed framework for converting existingPowerCentermappings toPySparkjobs.
•CreatedPySparkframe to bring data

 
Matching Resources
My Project History & Feedbacks
Copyright© Cosette Network Private Limited All Rights Reserved
Submit Query
WhatsApp Icon
Loading…

stuff goes in here!