Question 22
Domain 2: Data ProcessingYou want quick descriptive statistics such as count, mean, stddev, min, and max for numeric columns in a Spark DataFrame. Which method is the most direct fit?
Correct answer: A
Explanation
`df.summary()` is the direct Spark DataFrame method for quick descriptive statistics. It returns summary metrics like "count", "mean", "stddev", "min", and "max" for numeric columns, which matches the request for a fast overview.
Why each option is right or wrong
A. df.summary()
`DataFrame.summary()` is the Spark SQL API method designed to produce a compact descriptive-statistics table for numeric columns, including the standard rows `count`, `mean`, `stddev`, `min`, and `max`. In the PySpark DataFrame API, this is the direct built-in call for a quick overview; by contrast, `describe()` is the related method but `summary()` is the one that explicitly supports these summary statistics in one call, with no extra parameters needed for the basic numeric profile.
B. df.write.format("delta")
df.write.format("delta") writes data in Delta format; it does not compute descriptive statistics.
C. mlflow.search_runs()
mlflow.search_runs() queries experiment tracking metadata, not Spark DataFrame column summaries.
D. FeatureEngineeringClient.create_table()
FeatureEngineeringClient.create_table() creates or registers feature tables; it is not for summary stats.