Happy to Announce Our New Discussion Forum | Join Now

Which one is correct?


A game company needs to properly scale its game application, which is backed by DynamoDB. Amazon Redshift has the past two years of historical data. Game traffic varies throughout the year based on various factors such as season, movie release, and holiday season. An administrator needs to calculate how much read and write throughput should be provisioned for DynamoDB table for each week in advance.
How should the administrator accomplish this task?
A. Feed the data into Spark Mlib and build a random forest modest.
B. Feed the data into Amazon Machine Learning and build a regression model.
C. Feed the data into Apache Mahout and build a multi-classification model.
D. Feed the data into Amazon Machine Learning and build a binary classification model.


What about B since we are looking for a number? read/write throughput.


Normally, when predict some numbers like salary, price we use the regression model based on the histrory data. So personally, i agree with the B as it is used to predicted the number of the read and write throughput.


I analyze the question and event if I believe the answer A may work it’s not cost effective because requires a previous configuration perhaps an EMR cluster that will underlay Spark. So my option is B.

now C and D. Well, C is not because is Mahout that required underlying infrastructure and the model doesnt apply for what they pretend to predict. The option D is not at all fulfilling the question neither.


Prediction of metrics(numbers) would typically require regression model and it would be very cost effective to implement this in AWS Sagemaker(Serverless), available as part of Amazon Maching Learning service


Nice one

Spark and Scala Online Training