Happy to Announce Our New Discussion Forum | Join Now

Aws big data secuirty question


Company A operates in Country X. Company A maintains a large dataset of historical purchase orders that contains personal data of their customers in the form of full names and telephone numbers. The dataset consists of 5 text files, 1TB each. Currently the dataset resides on-premises due to legal requirements of storing personal data in-country. The research and development department needs to run a clustering algorithm on the dataset and wants to use Elastic Map Reduce service in the closest AWS region. Due to geographic distance, the minimum latency between the on-premises system and the closet AWS region is 200 ms.

Which option allows Company A to do clustering in the AWS Cloud and meet the legal requirement of maintaining personal data in-country?

A. Anonymize the personal data portions of the dataset and transfer the data files into Amazon S3 in the AWS region. Have the EMR cluster read the dataset using EMRFS.
B. Establish a Direct Connect link between the on-premises system and the AWS region to reduce latency. Have the EMR cluster read the data directly from the on-premises storage system over Direct Connect.
C. Encrypt the data files according to encryption standards of Country X and store them on AWS region in Amazon S3. Have the EMR cluster read the dataset using EMRFS.
D. Use AWS Import/Export Snowball device to securely transfer the data to the AWS region and copy the files onto an EBS volume. Have the EMR cluster read the dataset using EMRFS.


A. meets legal requirements and technical requirements


To me C or D meets the criteria.

To meet the legal requirement of country, the personal data should be encrypted. Option C specifies it & option D also seems correct because we can select encryption “Always encrypt the data before storing to EBS”.

Can’t figure out between C & D.


In one minute sign up, you could be ready to be a freelancer as a certified AWS cloud services professional

Free Sign up link: https://bit.ly/2vFfiEk


i am thinking C because if we use import n export … it would take atleast couple of days and the minimum latency is 200 ms


Answer : Direct connect.