In our last blog on Hadoop Big Data we have discussed about Hadoop and its tools/utility to connect to HDFS. on continue to that hadoop is mostly for big data and the on hadoop is stored in HDFS with named node contains pointers /address of data location and data stores contains the actual data. there are multiple data stores and the data on data store is replicated to multiple data nodes for redundancy purpose. and accessing the data on multiple nodes would be faster.
we can store or use one node static to store data as a backup node. Hadoop is used for data warehouse purpose and as its a BIG Data, data stored on it is in bulk /huge and used for read purpose. so if we use hive/impala or any other tool HDFS data can be mostly be used for READ-ONLY data warehouse and used to generate the report and get the data once data is inserted into HDFS. There are mappers to read the data on data nodes.
*HDFS /BIG DATA is effective on data reads and not work best for UPDATES.