Kaveh Abdi's Portfolio

I am a PhD student with several years of experience collaborating with industry to develop mathematical models for predictive purposes. My expertise spans engineering tools such as optimization, machine learning, and process development, as well as scientific domains including statistics, chemistry, and mathematics. I am deeply passionate about tackling challenges and exploring diverse approaches to problem-solving. In addition to my academic and industrial experience, I have gained significant expertise in leveraging cloud platforms for data science and engineering solutions. This includes deploying machine learning models on cloud infrastructure, containerization with Docker, and utilizing services such as AWS Lambda, SageMaker, and EC2 for scalable and efficient model training and deployment. I am adept at creating APIs for seamless integration and have experience with both relational and NoSQL database design to support data-driven projects. In this portfolio, I showcase several personal data science projects inspired by my curiosity about the world.

Overviews of my Projects

I have done several data-science-based projects that will provide a summary in the following. The link for the full details are provided for each summary so you can see the details of how I did them. These projects involve a wide variety of topics, including the use of applied statistics such as maximum likelihood argument to develop especial regression objective functions, using transfer learning for natural language processing purposes, using recurrent neural networks for the time series analysis, and using other clustering and classification algorithms to predict outcome of the sport events and customer behaviours.

Deploying Scalable Machine Learning Models for Drug Discovery and LLM Fine-Tuning Using AWS Services

Using AWS services including AWS ECR, Lambda, and API Gateway, I deployed machine learning models that have been used for drug discovery associated with Dengue Fever. I also fine-tuned a small-sized LLM and deployed it on AWS using AWS Lambda. In this project, Docker-based Lambda functions were deployed, which is a serverless service that is ultimately great for scalable deployment of machine learning models. I created APIs associated with each of the models using three different approaches: i) using Flask, ii) using API Gateway, and iii) using SageMaker endpoints. The UI of these models was developed using Streamlit.[More]

Fine-Tuning T5-Small Model with LoRA

In this project, I demonstrate fine-tuning the T5-small language model using LoRA (Low-Rank Adaptation) for text generation tasks. The smaller and lightweight T5-small variant is ideal for experimentation on moderate datasets and can be fine-tuned efficiently on standard hardware setups. The project is implemented in a Jupyter Notebook to ensure ease of reproducibility and modularity. This project serves as a base example for fine-tuning text generation models and can be adapted for other transformer-based architectures or tasks. It showcases my ability to handle end-to-end workflows for modern NLP models, from data preprocessing to hyperparameter tuning and evaluation. The use of LoRA demonstrates a focus on computationally efficient adaptation techniques, highlighting my commitment to optimizing performance for real-world applications.[More]

Designing a RAG System to Extract and Utilize Insights from Research Papers on Dengue Fever

This project applies a Retrieval-Augmented Generation (RAG) approach to explore drug discovery solutions targeting Dengue fever. Utilizing the power of machine learning, natural language processing, and large language models, this repository combines advanced text retrieval and generative models to identify potential protein targets and drug candidates.[More]

Atmosphere Temperature Increase in 2030

Using a LSTM-based RNN, I trained a model which predicts average temperature increase using the hystorical temperature and hystorical Carbon dioxide emission. This model was then used to predict the world atmosphere temperature until 2030.[More]

Prediction of World Cup Results (Accuracy of Common Sense)

Using Data from world cup 1998 to 2018, as well as the geographical information, I developed a classfication model to predict the game by game results of the matches in the upcomming world cup in Qatar. To train this model, I used only features based on public's common sense (e.g., number of previous championship and fifa world ranking). I also did some explanatory data analysis to answer some of the questions that I had regarding the previous world cups.[More]

Prediction of Dependent Variables Variances in Total Least Squares (a.k.a. Errors-in-variables method) Regression .(Link to the Original Paper)

Using the theory of statistics in our research group at Queen's University, we developed an algorithm for training models and dependent variable variance estimation in situations where independent variables are uncertain. Total Least Squares is a method which unlike conventional regression methods in which it is not assumed that independent variables are perfectly known.[More]

Nonlinear Regression for a Mechanistic Model Describing Ethanol Production in a Batch Bioreactor

Nonlear regression was used to estimate the parameters of a mechanistic model describing production of ethanol from Glucose in a bioreactor. Parameter estimation and a Monte Carlo uncertainty quantification technique were used to obtain values of parameter estimates and their uncertianties. Recommendation was provided and it turned out that a model-based design of experiments technique should be used to implement some experiments so that kinetic parameters could be estimated more reliably.[More]

Systematic Regression and Model Prediction using Industrial Data

Using the theories in Machine Learning (variance-bias tradeoff), Chemical Engineering fundamental knowledge, Polymer Chemistry knowledge and Optimization, we systematically estimated parameters from a mathematical model describing photopolymerization of 1,6 hexane dioldicrylate with a bifunctional initiator. [More]
In a second study, we trained the mathematical model using data that accounts for oxygen effects.[More]

Setiment Analysis of Spotify Reviews

Using a transfer-learning-based method and a bidirectional LSTM-based RNN network, I developed a classification model which was able to predict the corresponding sentiment of the reviews left by the app users.[More]

Car Customers Determination

Using sklearn library in python for machine learning and flask (along with html and css) I deployed a web application on heroku which is very user friendly for hypothesized car salespersons. This web application is able to predict the potential customers by a 90% accuracy. You can find the web application here![More!]

Credit Card Customers' Segmentation

Using principal component analysis (PCA) and a k-means clusteirng algorithm, I categorized the customers into seven groups. Based on the information from PCA, I determined the important features that divide the customers into different groups.[More]

Clustering the Performance of Countries in Olympics

I did an explanatory data analysis as well as clustering to determine the pattern for the performance of different countries in the olympics. The features that have been used for this analysis were number of the gold, silver and bronze medals of the countries in the summer and winter olympics. I also added the tempeprature and the GPD of the countries because these are very important features when it comes to the types of the sports that the countries are successful in and the variety of the sports in which they succeed. [More]

Clustering the Text Reviewes

Using a combination of a tranfer-learning-based approach, the principal component analysis and the k-means clustering, I developed a model to cluster the text reviewes into different categories. [More]

Contact

Email: Kaveh.abdi1993@gmail.com