Changing Refence Level of a Factor Changes Results

Issue

This Content is from Stack Overflow. Question asked by mapleleaf

I am trying to understand why changing the reference level of a factor changes the results of a model. Consider this example:

library(liver)
library(caret)
library(glmnet)
library(dplyr)

data(churn)
head(churn)

# set reference levels
churn$state <- relevel(churn$state, ref = "NE")
churn$area.code <- relevel(churn$area.code, ref = "area_code_408")
churn$intl.plan <- relevel(churn$intl.plan, ref = "yes")
churn$voice.plan <- relevel(churn$voice.plan, ref = "no")

# split into train and test
set.seed(1)
train.index <- createDataPartition(churn$churn, p = 0.8, list = FALSE)
train_churn <- churn[train.index,]
test_churn  <- churn[-train.index,] 

# add class weights
my_weights = train_churn %>%
  select(churn) %>%
  group_by(churn) %>%
  count()

weight_for_yes = (1 / my_weights$n[1]) * ((my_weights$n[1] + my_weights$n[2]) / 2.0)
weight_for_yes
weight_for_no = (1 / my_weights$n[2]) * ((my_weights$n[1] + my_weights$n[2]) / 2.0)
weight_for_no

model_weights <- ifelse(train_churn$churn == "yes", weight_for_yes, weight_for_no)

# tuning grid
myGrid <- expand.grid(
  alpha = 0, 
  lambda = seq(0,1,0.01)  
)

set.seed(1)
mod_1 <- train(churn ~
                 state +
                 area.code +
                 intl.plan +
                 voice.plan,
               data = train_churn,
               method = "glmnet",
               tuneGrid = myGrid,
               weights = model_weights)

mod_1

Tuning parameter ‘alpha’ was held constant at a value of 0
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were alpha = 0 and lambda = 0.8

prediction <- predict(mod_1, newdata = test_churn)
confusionMatrix(prediction, test_churn$churn)

enter image description here

I also look at a single new prediction

new_data = data.frame(state = c("CA"),
                      area.code = c("area_code_510"),
                      intl.plan = c("yes"),
                      voice.plan = c("no"))

predict(mod_1, newdata = new_data, type = "prob")

enter image description here

Now I restart R, set new reference levels, and rerun all the code. This is the output

# set new reference levels
churn$state <- relevel(churn$state, ref = "OR")
churn$area.code <- relevel(churn$area.code, ref = "area_code_415")
churn$intl.plan <- relevel(churn$intl.plan, ref = "no")
churn$voice.plan <- relevel(churn$voice.plan, ref = "yes")

Tuning parameter ‘alpha’ was held constant at a value of 0
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were alpha = 0 and lambda = 0.62.

enter image description here

I expected the lambdas to change, but I did not expect the confusion matrix or classification probabilities to changes. Is this normal?



Solution

This question is not yet answered, be the first one who answer using the comment. Later the confirmed answer will be published as the solution.

This Question and Answer are collected from stackoverflow and tested by JTuto community, is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?