Ali Chehrazi, Ph.D.

Logo

Data Scientist, University of Waterloo

View My GitHub Profile

DATA SCIENTIST

Data scientist with a Ph.D. in engineering and advanced research capabilities and problem-solving skills. Proficient in Python and SQL, with expertise in leading Python libraries crucial for data science such as NumPy, Pandas, Scikit-learn, and Seaborn. Proficient in utilizing key data scientist tools like Tableau, Power BI, Google Colab, Jupyter Notebook, and GitHub.

SELECTED PROJECTS

House Price Prediction

This project utilized XGBRegression to forecast property prices based on a comprehensive real estate dataset. This dataset contains extensive property attributes crucial for predicting sale prices. The project encompassed preprocessing, including handling missing values. The XGBoost regression model was fine-tuned, resulting in an R-squared score of 0.8549 during cross-validation and an impressive 0.9047 on the test dataset. This project exemplifies the effectiveness of data science techniques in predicting real estate prices and offers valuable insights into leveraging advanced algorithms for housing market analysis.

Credit Card Fraud Detection

This project uses a Random Forest classification model to detect fraudulent credit card transactions from a dataset documenting European cardholders' transactions. The dataset contains 284,807 transactions, with only 492 (0.172%) being fraudulent. It primarily consists of numerical variables derived from PCA transformation, 'Time,' 'Amount,' and the binary 'Class' variable. After importing libraries, data analysis highlighted the class imbalance issue. A Random Forest model, focusing on the f1 score, was employed, with hyperparameter tuning. The project achieved an accuracy of 0.836 on the training set and 0.845 on the test set.

Bridge Inspection Data Manipulation

The aim of this project is to collect a clean dataset with a set of standards/criteria from real bridge inspection data reported from 1992 to 2023 for the bridges in the state of New York. Each year had a separate CSV file; the data had missing values; and most of the inspection data did not meet the required standard. The output of the project was a CSV file with the selected records and a folium map with the selected bridges.

Titanic

The aim of this project is to build a classification model to determine whether a passenger with a set of features survives or not. In this project, first, the dataset is explored and the effect of different parameters on the survival rate is evaluated. Following that, several classification models including Logistic Regression, KNN, and Random Forest were employed to build different classification models for this problem.

CERTIFICATES & COURSE WORKS

While I have been doing data analysis and programming since the beginning of my graduate studies in September 2015, I had several courses in recent years to sharpen my skill in data analysis and learn about state of the art tools for data scientists. Especially, I would like to highlight the following certificate and courses.

List of courses in this program