Question 26
Content Domain 2: Exploratory Data AnalysisMachine Learning Specialist is building a model to predict future employment rates based on a wide range of economic factors. While exploring the data, the Specialist notices that the magnitude of the input features vary greatly. The Specialist does not want variables with a larger magnitude to dominate the model. What should the Specialist do to prepare the data for model training?
Correct answer: C
Explanation
Apply feature scaling before training so large-valued inputs do not dominate the model. Standardization is the method that makes each field have "a mean of 0 and a variance of 1," which removes magnitude differences and puts features on a comparable scale.
Why each option is right or wrong
A. Apply quantile binning to group the data into categorical bins to keep any relationships in the data by replacing the magnitude with distribution.
Quantile binning discretizes values into buckets; it changes representation, not proper feature scaling.
B. Apply the Cartesian product transformation to create new combinations of fields that are independent of the magnitude.
Cartesian products create interaction features between variables; they do not address unequal numeric scales.
C. Apply normalization to ensure each field will have a mean of 0 and a variance of 1 to remove any significant magnitude.
The appropriate preprocessing step is feature standardization under the general data-scaling requirement in machine learning, because the inputs are on very different numeric ranges and the model would otherwise weight high-magnitude variables disproportionately. In standard statistical terms, this is the z-score transform, defined as x' = (x - μ)/σ, which produces features with mean 0 and standard deviation 1; that directly addresses the stated concern about magnitude dominance before training.
D. Apply the orthogonal sparse bigram (OSB) transformation to apply a fixed-size sliding window to generate new features of a similar magnitude.
OSB is a text-style feature extraction idea, not a standard method for scaling numeric economic variables.