Office of Advanced Research Computing
- This event has passed.
Machine Learning Series by OARC—Random Forest Workshop
September 28, 2021 @ 1:00 pm - 4:00 pm
Please register to attend this workshop at the bottom of this page. After filling out the registration form, we will email you the Zoom link.
Amarel account: You need an Amarel account to participate in the lab section Apply here as soon as possible.
VPN setup: You have to be connected on Rutgers’ network or be on VPN to access Amarel resources.
If you have questions or need help, please contact Janet Chang.
Topics:
1. Introduction to Machine Learning (ML)
- Big data and Machine Learning
- Relate Machine Learning to other disciplines
- Machine Learning algorithms
- Classification and Regression
2. Understanding Random Forest (RF)
- Applications of Random Forests
- Why Random Forests
- The Random Forest Algorithm
- Fundamental concepts – ML, RF
3. Implementing Random Forest
- Feature Importance and Feature Selection
- Dealing with missing data, and imbalanced data
- Best split of the node–node impurity
- Over-fitting and underfitting
- The model performance
- The model interpretability
4. Lab Exercise
We will use cancer health data combined with gene expression data to build random forest models, predicting output variables. Both classifier and regressor will be addressed.
Lab 1, Set up, and launch R
Lab 2, Data preparation
- 2a. pre-processing,
- 2b. data partition,
- 2c. missing data imputation,
- 2d. feature selection
Lab 3, Building the RF model:
- 3b. handling imbalanced data
- 3c. building the RF model
- 3d. turning the parameters
Lab 4, Validation and model performance
- 4a. prediction and Confusion Matrix — test data
- 4b. ROC curve and AUC
- 4c. k-fold cross-validation
- 4d. parallel computing
Lab 5, Visualization and the model interpretation
- 5a. plotting the random forest tree
- 5b. plotting feature importance
- 5c. partial dependence plot (PDP)
- 5d. MDS – multi-dimensional scaling plot of proximity matrix
To Register: