In general, how does ML researcher scale GPU with different algorithm?


#1

Hi,

I saw a big data practice exam question similar like this -

You have an task running LSTM (Long short term memory) with RNN MXNET running on EC2, what is the best strategy to arrange GPU resources (Select 2):

A. Data parallelism with multiple GPU core distributed
B. Model parallelism with multiple GPU core distributed
C. One compute EC2 instance with elastic GPU
D. Process parallelism with multiple GPU core distributed
E. One general cluster with multiple GPU cores

Which two I should choose? Could some ML expert elaborate this?