ML Associate Practice Q39

A. Flag values as outliers only when they appear more than once in the DataFrame.

Outlier detection is based on statistical spread, not repetition count.

B. Treat all values below the column average as outliers and remove them.

Values below the average are not automatically outliers; spread-based thresholds are required.

C. Remove rows only after sorting the column and discarding a fixed percentage from each end.

The provided methods are standard deviation-based rules and the IQR method, not fixed trimming.

D. Use either a standard deviation-based threshold or the interquartile range to identify extreme values.

The source material states that outliers in a Spark DataFrame can be identified and removed using either standard deviation-based rules or the interquartile range (IQR) method. Because the question asks for a provided spread-based approach for detecting unusually extreme values, this matches both accepted methods named in the material.

Question 39

Explanation

Why each option is right or wrong