Question 39
Content Domain 2: Exploratory Data AnalysisA Machine Learning Specialist is building a prediction model for a large number of features using linear models, such as Linear Regression and Logistic Regression. During exploratory data analysis, the Specialist observes that many features are highly correlated with each other. This may make the model unstable. What should be done to reduce the impact of having such a large number of features?
Correct answer: C
Explanation
Principal Component Analysis (PCA) reduces dimensionality by transforming many correlated features into a smaller set of uncorrelated components. This helps stabilize linear models because it addresses the problem that “many features are highly correlated with each other,” which can make the model unstable.
Why each option is right or wrong
A. Perform one-hot encoding on highly correlated features.
One-hot encoding expands categorical variables; it does not fix correlation among numeric features.
B. Use matrix multiplication on highly correlated features.
Matrix multiplication is a math operation, not a feature-reduction method for multicollinearity.
C. Create a new feature space using Principal Component Analysis (PCA).
Principal Component Analysis is the standard dimensionality-reduction method used when linear models are affected by multicollinearity: it transforms the original correlated variables into orthogonal principal components, so the predictors are no longer linearly dependent. In practice, the first few components often retain most of the variance, allowing the model to be fit on a smaller feature space and reducing coefficient instability caused by highly correlated inputs.
D. Apply the Pearson correlation coefficient.
Pearson correlation measures linear relationship strength; it identifies correlation but does not reduce features.