Future DBA – Hadoop Big Data 1

In our last blog on Hadoop Big Data we have discussed about Hadoop and its tools/utility to connect to HDFS. on continue to that hadoop is mostly for big data and the on hadoop is stored in HDFS with named node contains pointers /address of data location and data stores contains the actual data. there are multiple data stores and the data on data store is replicated to multiple data nodes for redundancy purpose. and accessing the data on multiple nodes would be faster.

we can store or use one node static to store data as a backup node. Hadoop is used for data warehouse purpose and as its a BIG Data, data stored on it is in bulk /huge and used for read purpose. so if we use hive/impala or any other tool HDFS data can be mostly be used for READ-ONLY data warehouse and used to generate the report and get the data once data is inserted into HDFS. There are mappers to read the data on data nodes.

*HDFS /BIG DATA is effective on data reads and not work best for UPDATES.


This entry was posted in BIGDATA, Others and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.