Question 8
Domain 1: Data Preparation for Machine Learning (ML)A company is using an Amazon S3 bucket to collect data that will be used for ML workflows. The company needs to use AWS Glue DataBrew to clean and normalize the data. Which solution will meet these requirements?
Correct answer: B
Explanation
AWS Glue DataBrew works on a "dataset" and uses a "recipe" to define data transformations. Creating a DataBrew dataset from the S3 path lets DataBrew access the bucket data, and a recipe job applies the cleaning and normalization steps needed for ML workflows.
Why each option is right or wrong
A. Create a DataBrew dataset by using the S3 path. Clean and normalize the data by using a DataBrew profile job.
Profile jobs analyze data quality and statistics, not perform the main transformation workflow.
B. Create a DataBrew dataset by using the S3 path. Clean and normalize the data by using a DataBrew recipe job.
AWS Glue DataBrew operates on a defined dataset, and an Amazon S3 location is a supported source for creating that dataset; the service then applies transformations through a recipe and executes them in a recipe job. In practice, the S3 path is used to point DataBrew at the bucket contents, and the recipe job is the mechanism that performs the cleaning and normalization steps required for the ML workflow.
C. Create a DataBrew dataset by using a Java Database Connectivity (JDBC) driver to connect to the S3 bucket. Clean and normalize the data by using a DataBrew profile job.
Amazon S3 is object storage, so JDBC is not the normal way to define a DataBrew dataset.
D. Create a DataBrew dataset by using a Java Database Connectivity (JDBC) driver to connect to the S3 bucket. Clean and normalize the data by using a DataBrew recipe job.
Recipe jobs transform data, but the dataset connection method is wrong because S3 is not accessed by JDBC.