Future DBA – Big Data -MongoDB 2

In our earlier blog we discussed an introduction to MongoDB. so, how is MongoDB is into BigData, MongoDB has a concept called sharding with replication, so here using sharding it uses a cluster like configuration and data will be load-balance- equally distribute to multiple shard with the shard key.

so if we consider the HDFS- hadoop concept here unlike named node we have config server and shard is like data node. but here we have data is distributed to multiple shards but in hadoop system data is replicated to multiple data nodes. and MongoDB maintain the redundancy by using Replication.

Balancer makes sure that data is distruted equally to all the shards if data is not balanced balancer will run the processes at background and balance the data.

here shard key plays a very important role.

will write more on it later.

 

 

Posted in BIGDATA, Future DBA, Others | Tagged , | Leave a comment

Future DBA – Big Data -MongoDB 1

We have discussed hadoop and its HDFS management tools for big data system, which works horizontal scaling for data distribution and can be used for data warehousing. can manage big data/heavy data

There is another BIG Data system called MongoDB, which is again an open source. this is very developer friendly NoSQL system. as you know about RDBMS it has pre defined record structure and rows size is static. so for developer if they want to think and make some changes in the metadata by adding any columns or make changes in data type of the column will intern has to make the changes into the complete data and its related indexes. MongoDB is document oriented NoSQL.

MongoDB consider record as a document and you can dynamically add the columns into it and for developer its not necessary to input all the column information into one record/document. this way developer likes this database system.

MongoDB is written in C, C++, and java scripts. and it works same as developments.

https://en.wikipedia.org/wiki/MongoDB

 

 

 

 

Posted in BIGDATA, Others | Tagged , | Leave a comment

Future DBA – Hive Big Data 2

In our last blog on Future DBA, we discussed on HADOOP -HDFS system. how as we know HDFS management is quite difficult so with the help of Vendors -Cloudera/Hortonworks/MapR we can integrate the tools/utility in a GUI way and can be manage easily and efficiently.

This HDFS data can be retrieved and inserted using HIVE utility which will provide us the access to HDFS data in a SQL like way and we can create a access the data just like sql queries.

Hive requires the Meta store system, can be any RDBMS opensource -MySQL or PostgreSQL or any other RDBMS which will store the metadata on the HIVE and actual data would be stored in HDFS.

HIVE uses Map-Reduce process for retrieving data from HDFS.

So for DBA we can work on HDFS data efficiently and just like our RDBMS.

 

Posted in BI, BIGDATA, Future DBA, NoSQL | Tagged , , | Leave a comment

Future DBA – Hadoop Big Data 1

In our last blog on Hadoop Big Data we have discussed about Hadoop and its tools/utility to connect to HDFS. on continue to that hadoop is mostly for big data and the on hadoop is stored in HDFS with named node contains pointers /address of data location and data stores contains the actual data. there are multiple data stores and the data on data store is replicated to multiple data nodes for redundancy purpose. and accessing the data on multiple nodes would be faster.

we can store or use one node static to store data as a backup node. Hadoop is used for data warehouse purpose and as its a BIG Data, data stored on it is in bulk /huge and used for read purpose. so if we use hive/impala or any other tool HDFS data can be mostly be used for READ-ONLY data warehouse and used to generate the report and get the data once data is inserted into HDFS. There are mappers to read the data on data nodes.

*HDFS /BIG DATA is effective on data reads and not work best for UPDATES.

 

Posted in BIGDATA, Others | Tagged , | Leave a comment

Future DBA – Hadoop Big Data

As we discussed on earlier blog that in future its more than SQL-(R)DBMS, as data is growing we have to know BIGDATA as well, and when we talk about BIGDATA we should be aware of HADOOP.

I have gone through many blogs and webcasts but its little complicated on hadoop related stuff. but I want to go for concept and easy way of learning. this blog is not an deep dive but an overview of Hadoop. Hadoop works on HDFS(Hadoop Distributed File System), where it has Named node and Data node. Named node contains the metadata information where which data stores, and Data node contains actual data.Name node should be high capacity cluster with big configuration and data nodes can be a multiple (Many..Many) stand alone system to distribute the data on multiple servers… that’s call “Distributed File System”.

Working on HDFS is quite difficult requires MapReduce programming and retrieving and saving data on it is requires expertise in programming, so to overcome it there are several supported tools been used, Some of them are as follows:

  • Apache Pig
    Apache Hive
    Apache HBase
    Apache Phoenix
    Apache Spark
    Apache ZooKeeper
    Cloudera Impala
    Apache Flume
    Apache Sqoop
    Apache Oozie
    Apache Storm

All the above products are open source (Apache) and do not have vendor support.

 

There are 3 Vendors who has worked on this open source and build the enterprise product and they provide support to HDFS system they are as follows:

Cloudera – using the Cloudera Director Plugin for Google Cloud Platform

Hortonworks – using bdutil support for Hortonworks HDP

MapR – using bdutil support for MapR

This is the basic and quite important information if you want to go with Hadoop system. so when we see these tools we should know that these are based on Hadoop file system-HDFS.

I will talk more on these in future blogs also will write on other NoSQL technologies, Like MongoDB which doesnot use HDFS.

I just started writing on BIG Data/NoSQL. so Appriciate your comment/feedback

Reference:

https://en.wikipedia.org/wiki/Apache_Hadoop#cite_note-8

 

Posted in Future DBA, NoSQL, Others | Tagged , | Leave a comment

Operational Analytics -SQL Server 2016

What is Operational Analytics:-

Operational Analytics is a combination of two words “Operational” and “Analytics”. so your OLTP system is a operational system where day to day task eg order table keeps on updating. and Analytics is a OLAP where analysis of the order table can be done after ETL – moving the operational data with the help of nightly jobs… or other ways to OLAP system eg. Analysis services or BI system and analyse OLTP data, which is used by manager or decision maker and decide.

So earlier days when we have to analyse the data we have to wait for some time as querying on the OLTP system is quite expensive and cost a lot and makes system hung due to Locking and un-compatible locks.

Now Management team wanted to analyse the data as soon as any order takes place to decide how things are happening and understand the system and decide on it.

hence “Operationsl Analytics” is place and both system or task can be done at a time. this has been incorporated by other system, so as SQL Server.

SQL Server can achieve this in SQL Server 2016 with the help of :

  • In-Memory System.
  • Updateable Non cluster Column store Index (NCCI)
  • Compression Delay (Filtered Indexes)

So considering the critical/hot data in in-memory tables and use those tables as a NCCI and use compression delay so that the column store data will be compressed after that delay to maintain if that data is getting changes.

the detail is in following blog:

https://msdn.microsoft.com/en-IN/library/dn817827.aspx

this is happening things and would like to write more on it.

*btw: Sunil Agarwal has written/webcast quite more on this.

https://social.technet.microsoft.com/Profile/Sunil%2bAgarwal/activity

https://blogs.technet.microsoft.com/dataplatforminsider/

 

 

 

 

 

Posted in Future DBA, Others, sql 2016, Whats New | Tagged , | Leave a comment

Distributed AG

going though Allan Hirt’s 24 SQLPASS  recording, he has explained the new feature of SQL Server Availability group you can get it here.

http://www.sqlpass.org/24hours/2016/summitpreview/Sessions.aspx

he is expert in Clustering /AG/ and published several blogs and books.

Distributed AG is advance and extended to AG its like AG to AG

one AG which will be on WSFC between two instances.

eg.

WSFC 1 :Instance A is AG with Instance B and

WSFC 2 :Instance C is AG with Instance D

so we can perform Distributed AG from Instance A to Instance C

then all four instances are in sync with each other. but it has some issues if we make it sync. so its recommendation to make WSFC 1 to WSFC 2 with async so that things will be working efficiently else Instance A will be quite slow to get sync all.

It has its limitations:

important limitation is we can configure it only with T-SQL queries. not with GUI and Powershell(this may overcome in future release)

supported in Enterprise edition only.

Automatic fail over to secondary AG is not supported for now

Thanks Allan🙂

reference:

https://msdn.microsoft.com/en-us/library/mt651673.aspx?f=255&MSPPError=-2147217396

Posted in High Avaliability, Others, sql 2016 | Tagged , , , , | Leave a comment

Future DBA part 2

I do not say you should be expert in other technologies but you should be aware of what it is and whats the purpose and whats its characteristics and basics.

when we talk of RDBMS (DBMS) like SQL most of the things are common unlike that for NoSQL has different categories.

Its important to understand the different types of NoSQL and its purpose if you want to learn it.

From wiki (https://en.wikipedia.org/wiki/NoSQL)

more on NoSQL

http://www.nosql-database.org/

and

A funny picture.

nosql

More coming next

Posted in Future DBA, Others | Tagged | Leave a comment

Future DBA Part 1

As I stated in my earlier blog on future DBA, we should learn many things as we can to be stand strong in the market.. Being DBA from last around 15 Years. and still learning… so cool. So far I blog only on SQL Server but now I am learning other technologies as well and I will be blogging about it. initially i was thinking to have a separate site for it but later realize that this blog has my name and not specific to MS SQL so decided to continue blogging in here. I have been blogging from last 8 Years…(July 2008) canot believe it,  and I love it very much. Thanks all..

I will be blogging more about data now with continue on Microsoft SQL Server(My first love). includes SQL Standard, NoSQL,

Wiki has a very nice comparison about RDBMS is available :

https://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems

and here is the NOSQL

http://www.nosql-database.org/

sky is the limit… so its important to understand that things are huge.

 

 

 

 

Posted in Future DBA, Others | Tagged | Leave a comment

World is changing so DBA should

I was going through SQLPASS recording and on first time I could see a session as “MongoDB for the SQL Server Professional” which make me think that even community is also realizing that things are changing and DBA has to be more than DBA (should having knowledge of other technologies) and and to learn new things. So do I.

Yes, I am exploring new things, I have to , to stand in the market, because things are not the same. I could see that now a days if you are expert like me SQL DBA. it is not enough you have to know so many other things a well how it works and what all it has and related, and if you learn you may realize that the similar thing is already exists in your and other technology as well.

for example, if you observed Microsoft has invested many $(Dollar)  in SQL Server 2016 and when you explore it all are their in the marking and to stand Microsoft has to be capable of delivering what market demands. like in-memory, support of R, JSON, Polybase, Azure. StretchDB, SSDT, PowerBI…. SQL on Linux, open Powershell and so on… that makes SQL Server stand one of the best in market.

I blog on SQL 2016 here … https://thakurvinay.wordpress.com/category/sql-2016/

BUT….. as stated earlier things are not same. we have to explore more to be stand strong. when you learn other technologies you will realize it.

I would like to make this as a series in category “Future DBA” which includes NOSQL, BIGDATA, Oper source, …. Stay tuned…

Posted in Future DBA, Others | Tagged | Leave a comment