Learning for this week.
Learning for this week.
As you know Microsoft has retrieve the data from Hadoop with by polybase and extended the R language to External language(R and Phython). now Microsoft is wanted to make sure that it has feasibility for all the environment whichever is available in the market. earlier I blog on MongoDB which is happening and favorable to developers for document store and containers concept, CosmosDB is the similar architecture of MongoDB for horizontal data storage , container and Document with the help of DocumentDB.
Microsoft has introduced many things recently(SQL 2016/17):
SQL Server on Linux
SQL Server – PolyBase (Hadoop Compatibility- data retrieval)
Azure SQL Server – CosmosDB – MongoDB
This indicates that Microsoft do not want to be isolated on Windows platform nor want to restrict to small scale… and Its been proved that Microsoft is doing great on this area… the only thing I believe is the SME for those new concepts which has been introduced unless we do not know how to manage it we may not be expert of doing it so.
Yes thats true, SQL Server New version will be release this year and call it “SQL Server 2017” where sql can also work on Windows, MacOS and Linux
Now vNext SQL Server on Linux supports Availability Group HA/DR functionality supported.
SQL Server is running on Linux now with SQLPAL- SQL Platform Abstraction Layer(SQLPAL) -it will work as a virtual Windows server on Linux so I think now Microsoft should able to include things which we are doing on Windows Server.
on that note, Now vNext with CTP 1.4 Microsoft has introduce the SQL Server Agent functionality on vNext.
Yes that is true, now SQL Server is on Linux.
I am learning it and would love to write more blogs soon.
In our earlier blog we discussed an introduction to MongoDB. so, how is MongoDB is into BigData, MongoDB has a concept called sharding with replication, so here using sharding it uses a cluster like configuration and data will be load-balance- equally distribute to multiple shard with the shard key.
so if we consider the HDFS- hadoop concept here unlike named node we have config server and shard is like data node. but here we have data is distributed to multiple shards but in hadoop system data is replicated to multiple data nodes. and MongoDB maintain the redundancy by using Replication.
Balancer makes sure that data is distruted equally to all the shards if data is not balanced balancer will run the processes at background and balance the data.
here shard key plays a very important role.
will write more on it later.
We have discussed hadoop and its HDFS management tools for big data system, which works horizontal scaling for data distribution and can be used for data warehousing. can manage big data/heavy data
There is another BIG Data system called MongoDB, which is again an open source. this is very developer friendly NoSQL system. as you know about RDBMS it has pre defined record structure and rows size is static. so for developer if they want to think and make some changes in the metadata by adding any columns or make changes in data type of the column will intern has to make the changes into the complete data and its related indexes. MongoDB is document oriented NoSQL.
MongoDB consider record as a document and you can dynamically add the columns into it and for developer its not necessary to input all the column information into one record/document. this way developer likes this database system.
MongoDB is written in C, C++, and java scripts. and it works same as developments.
In our last blog on Future DBA, we discussed on HADOOP -HDFS system. how as we know HDFS management is quite difficult so with the help of Vendors -Cloudera/Hortonworks/MapR we can integrate the tools/utility in a GUI way and can be manage easily and efficiently.
This HDFS data can be retrieved and inserted using HIVE utility which will provide us the access to HDFS data in a SQL like way and we can create a access the data just like sql queries.
Hive requires the Meta store system, can be any RDBMS opensource -MySQL or PostgreSQL or any other RDBMS which will store the metadata on the HIVE and actual data would be stored in HDFS.
HIVE uses Map-Reduce process for retrieving data from HDFS.
So for DBA we can work on HDFS data efficiently and just like our RDBMS.
In our last blog on Hadoop Big Data we have discussed about Hadoop and its tools/utility to connect to HDFS. on continue to that hadoop is mostly for big data and the on hadoop is stored in HDFS with named node contains pointers /address of data location and data stores contains the actual data. there are multiple data stores and the data on data store is replicated to multiple data nodes for redundancy purpose. and accessing the data on multiple nodes would be faster.
we can store or use one node static to store data as a backup node. Hadoop is used for data warehouse purpose and as its a BIG Data, data stored on it is in bulk /huge and used for read purpose. so if we use hive/impala or any other tool HDFS data can be mostly be used for READ-ONLY data warehouse and used to generate the report and get the data once data is inserted into HDFS. There are mappers to read the data on data nodes.
*HDFS /BIG DATA is effective on data reads and not work best for UPDATES.