Apache Hadoop or Apache Spark


#1

Hi,

Can anyone help me to get the confusion resolved?

Actually, I’m a Java programmer with 4.5 years of experience in development. I want a good growth in my development career. I have heard a lot about big data and its technologies. Some people are advising me to learn Hadoop while others are telling to learn Spark. Don’t know which one is easier to learn with my current Java knowledge. Can you please tell me which one should I learn, Hadoop or Spark?

Please help me in taking a right decision!
Thanks in advance.


#2

Hi Nicky,
First of all, it is extremely important to understand what sort of a java developer are you. Based on that I can perhaps make a suggestion to enhance your profile.
Apache Hadoop is an ecosystem, a framework, consisting of several components. There are many supplementary components and perhaps you might have even heard of some of them such as Apache Pig, Sqoop, YARN, and so on(spark is also one of them but we will get to that later). All of these components are run in conjunction on several machines.
Think of Hadoop as a builder’s suitcase. It has many:
a) Pens- To Capture the information data( Data Ingestion)
b) Notebooks- To store the created designs(Data Storage)
c) Tools- such as a screwdriver/hammer- to carry out a process(Data Transformation)
and so on.
Hadoop is similar and it encompasses a whole suite of components, which if used effectively, can be used to carry out transformations on Petabytes and TerraBytes of data. This processing tool used by Hadoop is called as MapReduce, and as a java programmer, I feel you should first understand how MR works and how to proceed developing applications on MapReduce in Java.
Spark is another such tool that is much faster than MapReduce, particularly with the way it handles data. It has different operation modes; so you can use it on your local system, or on Hadoop cluster. While running on a hadoop cluster, applications leverage the resources pooled by the machines of your cluster.
SO COMPANIES USUALLY DEPLOY A HADOOP CLUSTER OF 1000s OF NODES AND RUN SPARK APPLICATIONS IN THEM THAT ARE USED TO TRANSFORM DATA IN REALTIME AND PRESENT THE RESULTS TOO!
So Spark Development can also be done in Java, but for that to be effectively used you must have knowledge on what the purpose for transforming the data is, i.e. an analysts purpose. You can learn what to do with data, but what is to derived from data should be learnt first. So my suggestion to you would be to start programming in MapReduce and then then to understand how Hadoop works, undergo the HDPCA certification.
Once you have knoweldge of how Hadoop functions, focus on deriving insights from data and then you should learn Spark.
Happy programming!


#3

Hi @utsavjjha
Thank you so much for this clear and detailed answer.
It will help me a lot to take a step ahead for career growth.


#4

Really excellent solution you have shared with us. Thank you so much for this helpful topic regarding hadoop. keep it up.