About

With a Master's degree in Computer Science from Northeastern University and over 2 years of professional experience, I am a dedicated Machine Learning Engineer driven by a passion for leveraging data science and artificial intelligence to solve real-world challenges. My expertise lies in automated machine learning (AutoML) and natural language processing (NLP), where I excel in developing and optimizing state-of-the-art models, pipelines, and workflows tailored to diverse sectors.

Specializing in optimizing model performance and tuning neural networks, I have successfully improved model accuracy and training speed, accelerating data science efficiency by implementing advanced ML techniques and leveraging distributed training systems. My experience spans end-to-end ML pipeline optimization and real-time deployment, particularly for Kerberos attack detection.

Throughout my career, I have consistently demonstrated a commitment to innovation and excellence. By transforming complex data into actionable insights, I aim to enhance decision-making processes, streamline operations, and drive meaningful progress.



Skills

Data Analysis, Visualization & Statistics: Python (NumPy, Pandas, Scikit-learn, SciPy, matplotlib, Seaborn, Plotly), R, Oracle SQL Databases, MS SQL, Tableau, Power BI, NoSQL, Spark (PySpark, MapReduce, Hadoop)

Deep Learning Frameworks: PyTorch, TensorFlow, Hugging Face Transformers, MLOps, Spacy, NLTK, OpenCV

Big Data Technologies: Spark (PySpark, Hadoop, MapReduce), Databricks

Machine Learning & Deep Learning: Time Series, Machine learning algorithms for Classification (KNN, Naïve Bayes), Regression (Linear, Decision Trees, Logistic), K-means Clustering, neural networks

Certifications

Google Cloud Practitioner Machine Learning Certification

AWS-Cloud Practitioner

Education & Experience

Education

Master of Science & Computer Science

2021 - 2023

Northeastern University, Khoury College of Computer Science, Boston, MA

Bachelor of Technology & Computer Science and Engineering

2016 - 2020

Amrita Vishwa Vidyapeetham,Tamilnadu, India

Volunteering

Delivery and Quality Assurance(DQA) Team

2021 - Present

Statistics without Borders, USA

Volunteering as a DQA Analyst for Auditing Statistical Projects and Data Science Projects

Professional Experience

Machine Learning Engineer

Dec 2023 - Present

Narwal Inc, Ohio

  • Worked on the Kerberos attack detection by integrating MLFlow in Databricks, reducing manual work by 5 hours per case, and developed ETL pipelines to process over 4.3 million Active Directory event logs from Azure Data Lake Storage. I utilized supervised and unsupervised learning techniques, achieving a 96% AUC-ROC score, and delivered insights through 12 interactive dashboards, reducing false positive rates by 22%. Implemented a machine learning model performance monitoring and retraining framework, improving prediction accuracy and reducing performance degradation by 15%, while optimizing the ML pipeline for a 30% increase in efficiency and scalability, successfully deploying and maintaining multiple models in production.

Technologies Used: Python, PySpark, SQL, Databricks, MLOps, AutoML, AWS Sagemaker, HuggingFace, matplotlib, pandas, scikit-learn

Data Scientist

Jul 2022 - Dec 2022

Reboot Rx, Boston MA

  • Developed data pipelines, built dashboards, and collaborated with cross-functional biomedical teams, significantly enhancing the BERT pipeline model's performance and conducted a comparison study of biomedical BERT models, leading to the identification of the best-performing model with an F1 score of 77%.

Technologies Used: Pandas, scikit-learn, Pytorch Huggingface, NLTK

Machine Learning Researcher

Aug 2020- Aug 2021

Peregrine Cancercure Technologies

  • Enhanced data extraction from published articles by 31% using Named Entity Recognition, boosted adverse event reaction detection efficiency to 83% with large language models (LLMs), and optimized product feedback analysis with an 89% efficiency in categorizing product types through a multi-label classification pipeline.

Technologies Used: Pandas, scikit-learn, Pytorch, Huggingface, NLTK, spaCy

Lead Machine Learning Engineer

Sept 2019 - Mar 2020

Omdena Inc, New York

  • Led a global team of 15 data scientists in web scraping and cleaning ~1M cybercrime chat logs using Python, performed data cleaning with word tokenization, stemming, and lemmatization, and spearheaded the detection of online abuse crimes with word embeddings and LSTM models, achieving a 12% model loss, while effectively communicating insights to stakeholders.

Technologies Used: Pandas, scikit-learn, Pytorch, NLTK, matplotlib, seaborn

Data Science Intern

Apr 2019 - July 2019

Sierra ODC Private Ltd, India

  • Performed statistical analysis on electricity data to inform energy management strategies, enhancing predictive accuracy for consumption by 15% using time-series models (ARIMA, Prophet, LSTMs), and spearheaded an automated solar power generation forecast pipeline with a Streamlit-powered dashboard, reducing report generation time by 80% and optimizing operational efficiency.

Technologies Used: Pandas, scikit-learn, Prophet, ARIMA, scipy, Flask, Heroku

Projects

  • All
  • Data Science
  • NLP
  • LLM
  • Computer Vision

Publications

DEPRESSION DETECTION USING TRANSFER LEARNING

The chapter "Distress-Level Detection Using Deep Learning and Transfer Learning Methods" presents a thorough approach to identifying depression levels by analyzing interview transcripts. The authors implement deep learning models like ELMo, ULMFit, and BERT, which are large language models (LLMs) enhanced with transfer learning techniques. Their primary focus is on predicting depression severity using the textual components of the DAIC-WOZ dataset, which also includes audio and video data. To improve the model’s performance, they deploy an ensemble learning technique. Furthermore, they developed a user-friendly web application where individuals can input text and receive predictions about their distress level. This work showcases the effectiveness of LLMs and transfer learning in mental health applications, offering a scalable, text-based solution to predict depression severity.

FRAUDULENT DETECTION IN HEALTHCARE INSURANCE USING HYBRID MACHINE LEARNING APPROACH

The paper "Fraudulent Detection in Healthcare Insurance" outlines a machine-learning approach to detect fraudulent healthcare claims using a publicly available Medicare dataset. The tasks include classifying providers as fraudulent or non-fraudulent, addressing class imbalance with Synthetic Minority Over-sampling Technique (SMOTE), and implementing a hybrid clustering and classification model. Additionally, several machine learning algorithms are tested to identify the most efficient approach for fraud detection.

Contact

Contact Me

Social Profiles

Email Me

kalyansrijha@gmail.com

Loading
Your message has been sent. Thank you!