Question 4
Domain 6You are a junior Data Scientist working on a logistic regression model to break down customer text messages into two categories: important / urgent and unimportant / non-urgent. You want to find a metric that allows you to evaluate your model for how well it separates the two classes. You are interested in finding a method that is scale invariant and classification threshold invariant. Which of the following is the optimal methodology?
Correct answer: C
Explanation
ROC-AUC measures how well a model separates two classes by ranking predicted scores, so it is "scale invariant" and does not depend on a chosen classification threshold. It summarizes performance across all thresholds using the area under the ROC curve, making it ideal for evaluating logistic regression on important/urgent versus unimportant/non-urgent messages.
Why each option is right or wrong
A. Log Loss
B. One-hot encoding
C. ROC-AUC
ROC-AUC is the correct choice because the ROC curve is built from the true positive rate and false positive rate across all possible decision thresholds, so the score does not depend on selecting any single cutoff. In binary classification, the area under that curve is threshold-independent and also invariant to any monotonic rescaling of the model’s output probabilities, which is exactly what makes it suitable for evaluating a logistic regression separator.
D. Mean Square Error
E. Mean Absolute Error