With a Master's degree in Computer Science from Northeastern University and over 2 years of professional experience, I am a dedicated Machine Learning Engineer driven by a passion for leveraging data science and artificial intelligence to solve real-world challenges. My expertise lies in automated machine learning (AutoML) and natural language processing (NLP), where I excel in developing and optimizing state-of-the-art models, pipelines, and workflows tailored to diverse sectors.
Specializing in optimizing model performance and tuning neural networks, I have successfully improved model accuracy and training speed, accelerating data science efficiency by implementing advanced ML techniques and leveraging distributed training systems. My experience spans end-to-end ML pipeline optimization and real-time deployment, particularly for Kerberos attack detection.
Throughout my career, I have consistently demonstrated a commitment to innovation and excellence. By transforming complex data into actionable insights, I aim to enhance decision-making processes, streamline operations, and drive meaningful progress.
Data Analysis, Visualization & Statistics: Python (NumPy, Pandas, Scikit-learn, SciPy, matplotlib, Seaborn, Plotly), R, Oracle SQL Databases, MS SQL, Tableau, Power BI, NoSQL, Spark (PySpark, MapReduce, Hadoop)
Deep Learning Frameworks: PyTorch, TensorFlow, Hugging Face Transformers, MLOps, Spacy, NLTK, OpenCV
Big Data Technologies: Spark (PySpark, Hadoop, MapReduce), Databricks
Machine Learning & Deep Learning: Time Series, Machine learning algorithms for Classification (KNN, Naïve Bayes), Regression (Linear, Decision Trees, Logistic), K-means Clustering, neural networks
Google Cloud Practitioner Machine Learning Certification
AWS-Cloud Practitioner
I work across Merchant Solutions and Global Payment Solutions to improve ISV pricing decisions and increase recovered transaction revenue. I build cloud-based OCR and advanced RAG pipelines, along with Smart Retry ranking models (PAW), enabling accurate pricing insights and more reliable payment recovery workflows.
I worked with clients across healthcare, finance, and cybersecurity to design and deploy enterprise-grade AI systems, including advanced RAG pipelines, security threat-detection models, and predictive job-prioritization solutions. I built modular hybrid-retrieval RAG architectures, MLflow-based monitoring systems, and production-ready ML models on cloud platforms like AWS and Databricks, using LightGBM, cross-encoder reranking, and scalable cloud-native pipelines to deliver domain-specific AI automation.
Tech: Python, PySpark, SQL, Databricks, MLOps, AutoML, AWS SageMaker, HuggingFace, pandas, scikit-learn
Developed data pipelines and dashboards with cross-functional biomedical teams, improving a BERT-based text pipeline for clinical literature.
Ran comparison experiments across biomedical BERT models and selected the best-performing model, achieving an F1 score of 77%.
Tech: Pandas, scikit-learn, PyTorch, HuggingFace, NLTK
Information extraction of adverse event reactions from pharmaceutical products used by patients, using Spacy and Scispacy
The chapter "Distress-Level Detection Using Deep Learning and Transfer Learning Methods" presents a thorough approach to identifying depression levels by analyzing interview transcripts. The authors implement deep learning models like ELMo, ULMFit, and BERT, which are large language models (LLMs) enhanced with transfer learning techniques. Their primary focus is on predicting depression severity using the textual components of the DAIC-WOZ dataset, which also includes audio and video data. To improve the model’s performance, they deploy an ensemble learning technique. Furthermore, they developed a user-friendly web application where individuals can input text and receive predictions about their distress level. This work showcases the effectiveness of LLMs and transfer learning in mental health applications, offering a scalable, text-based solution to predict depression severity.
The paper "Fraudulent Detection in Healthcare Insurance" outlines a machine-learning approach to detect fraudulent healthcare claims using a publicly available Medicare dataset. The tasks include classifying providers as fraudulent or non-fraudulent, addressing class imbalance with Synthetic Minority Over-sampling Technique (SMOTE), and implementing a hybrid clustering and classification model. Additionally, several machine learning algorithms are tested to identify the most efficient approach for fraud detection.
kalyansrijha@gmail.com