Future DBA – Hadoop Big Data

As we discussed on earlier blog that in future its more than SQL-(R)DBMS, as data is growing we have to know BIGDATA as well, and when we talk about BIGDATA we should be aware of HADOOP.

I have gone through many blogs and webcasts but its little complicated on hadoop related stuff. but I want to go for concept and easy way of learning. this blog is not an deep dive but an overview of Hadoop. Hadoop works on HDFS(Hadoop Distributed File System), where it has Named node and Data node. Named node contains the metadata information where which data stores, and Data node contains actual data.Name node should be high capacity cluster with big configuration and data nodes can be a multiple (Many..Many) stand alone system to distribute the data on multiple servers… that’s call “Distributed File System”.

Working on HDFS is quite difficult requires MapReduce programming and retrieving and saving data on it is requires expertise in programming, so to overcome it there are several supported tools been used, Some of them are as follows:

  • Apache Pig
    Apache Hive
    Apache HBase
    Apache Phoenix
    Apache Spark
    Apache ZooKeeper
    Cloudera Impala
    Apache Flume
    Apache Sqoop
    Apache Oozie
    Apache Storm

All the above products are open source (Apache) and do not have vendor support.

 

There are 3 Vendors who has worked on this open source and build the enterprise product and they provide support to HDFS system they are as follows:

Cloudera – using the Cloudera Director Plugin for Google Cloud Platform

Hortonworks – using bdutil support for Hortonworks HDP

MapR – using bdutil support for MapR

This is the basic and quite important information if you want to go with Hadoop system. so when we see these tools we should know that these are based on Hadoop file system-HDFS.

I will talk more on these in future blogs also will write on other NoSQL technologies, Like MongoDB which doesnot use HDFS.

I just started writing on BIG Data/NoSQL. so Appriciate your comment/feedback

Reference:

https://en.wikipedia.org/wiki/Apache_Hadoop#cite_note-8

 

Advertisements
This entry was posted in Future DBA, NoSQL, Others and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s