As we discussed on earlier blog that in future its more than SQL-(R)DBMS, as data is growing we have to know BIGDATA as well, and when we talk about BIGDATA we should be aware of HADOOP.
I have gone through many blogs and webcasts but its little complicated on hadoop related stuff. but I want to go for concept and easy way of learning. this blog is not an deep dive but an overview of Hadoop. Hadoop works on HDFS(Hadoop Distributed File System), where it has Named node and Data node. Named node contains the metadata information where which data stores, and Data node contains actual data.Name node should be high capacity cluster with big configuration and data nodes can be a multiple (Many..Many) stand alone system to distribute the data on multiple servers… that’s call “Distributed File System”.
Working on HDFS is quite difficult requires MapReduce programming and retrieving and saving data on it is requires expertise in programming, so to overcome it there are several supported tools been used, Some of them are as follows:
- Apache Pig
All the above products are open source (Apache) and do not have vendor support.
There are 3 Vendors who has worked on this open source and build the enterprise product and they provide support to HDFS system they are as follows:
Cloudera – using the Cloudera Director Plugin for Google Cloud Platform
Hortonworks – using bdutil support for Hortonworks HDP
MapR – using bdutil support for MapR
This is the basic and quite important information if you want to go with Hadoop system. so when we see these tools we should know that these are based on Hadoop file system-HDFS.
I will talk more on these in future blogs also will write on other NoSQL technologies, Like MongoDB which doesnot use HDFS.
I just started writing on BIG Data/NoSQL. so Appriciate your comment/feedback