PMLE Practice Q35

A. Use sparse representation in the test set.

B. Randomly redistribute the data, with 70% for the training set and 30% for the test set.

C. Apply one-hot encoding on the categorical variables in the test data.

Under standard supervised-learning preprocessing practice, the model must receive the same feature vector shape at prediction time as it saw during training; otherwise the input matrix dimensions will not align. One-hot encoding the test set with the training-set category schema ensures the encoded columns match, even if one category has zero occurrences in the test split, which is the correct way to avoid a feature-mismatch error when the model expects that column.

D. Collect more data representing all categories.

Question 35

Explanation

Why each option is right or wrong