Random Forest — Business Insights

Draw Business Insights from RF 1. Var Imp: Look at the rank of important variables, if the top one are the least actionable variable, meaning that it’s impossible for company to change that variable, delete it and re-build RF check whether the top variable are continuous or categorical variable continuous variables tend to show upContinue reading “Random Forest — Business Insights”

Random Forest — Method and Application (Python)

Advantage of RF: Only little time is needed for optimization (the default param are good enough) Strong with outliers, correlated variables For continuous variables, it’s able to segmentize it Method: Create a bootstrapped dataset (Sample with replacement) Create a decision tree using the bootstrapped datasetBut only use a random subset of variables at each splitContinue reading “Random Forest — Method and Application (Python)”

EDA and Feature Engineering

this will provide basic distribution of each variable. We can use this to observe some extreme or unreasonable values, and process it before further investigations. 2.Query and Merge 3.Group and Plot Data Note: agg applies function to each group, apply applies function to each column in group 4.fill na, replace and assign values If fewContinue reading “EDA and Feature Engineering”