Question 3
Domain 1: Data Preparation for Machine Learning (ML)A company wants to build a real-time analytics application that uses streaming data from social media. An ML engineer must implement a solution that ingests and transforms 5 GB of data each minute. The solution also must load the data into a data store that supports fast queries for the real-time analytics. Which solution will meet these requirements?
Correct answer: D
Explanation
Amazon Kinesis Data Streams is built for real-time ingestion of streaming data at scale, and Amazon Managed Service for Apache Flink provides continuous stream processing to transform data as it arrives. Amazon DynamoDB supports fast, low-latency queries, making it suitable for storing transformed records for real-time analytics.
Why each option is right or wrong
A. Use Amazon EventBridge to ingest the social media data. Use AWS Glue to transform the data. Store the transformed data in Amazon ElastiCache (Memcached).
EventBridge is for event routing, not heavy high-throughput stream ingestion; Memcached is cache, not primary analytics storage.
B. Use Amazon Simple Queue Service (Amazon SQS) to ingest the social media data. Use AWS Lambda to transform the data. Store the transformed data in Amazon S3.
SQS is a message queue, and S3 is object storage rather than a fast real-time query store.
C. Use Amazon Simple Notification Service (Amazon SNS) to ingest the social media data. Use Amazon EMR to transform the data. Store the transformed data in Amazon RDS.
SNS is pub/sub fanout, not durable stream ingestion; RDS is generally less suited to massive real-time event writes.
D. Use Amazon Kinesis Data Streams to ingest the social media data. Use Amazon Managed Service for Apache Flink to transform the data. Store the transformed data in Amazon DynamoDB.
Amazon Kinesis Data Streams is the AWS service designed for continuous, low-latency ingestion of streaming records, and it scales by shard capacity to handle high-throughput feeds such as 5 GB per minute. Amazon Managed Service for Apache Flink is the managed choice for stateful, real-time stream transformation before persistence, and Amazon DynamoDB provides single-digit millisecond read/write performance for the fast query access needed by a real-time analytics datastore.