Apache Hadoop Ozone+HDDs and Kubernetes

As Hadoop is already a great on BIG Data, now as you know hadoop has addition stuff in it with concept called “Ozone” works with HDFS Datanode plugin and HDDs.

“Ozone is designed to scale to tens of billions of files and blocks and, in the future, even more.
Small files or huge number of datanodes are no longer a limitation”

So now Hadoop has gone futher Biggerrrrr. with this. it stores the information of named node in “Blocks” and can save trillions of data and unlimited data nodes.

with the of Kubernetes it can be feasible to use the system for efficient for distribution of the system.



Posted in BIGDATA, Others, Whats New | Tagged , | Leave a comment


Have you heard the name Kubernetes, it is another open source in the market by Google and works one Containers, like Docker on Cloud.
Kubernetes v1.0 was released on July 21, 2015 by Cloud Native Computing Foundation (CNCF).
Future is changing on application and cloud so you should be knowing what is going on. Kubernetes is supported by almost all the cloud computing’s,
Microsoft for its Azure Kubernetes Service (AKS), Amazon for its Elastic Container Service for Kubernetes (EKS), Google for its Google Kubernetes Engine (GKE), IBM for its IBM Cloud Kubernetes Service (IKS)

My understanding is the Kubernetes is more of a load balancer, distributed system for containers for heavy loaded application.

Keep yourself updated.


Posted in Cloud, Others | Tagged | Leave a comment

Microsoft Acquire GitHub after Linkedin

Microsoft acquires GitHub is a very big news after LinkedIn… that way now Microsoft will grow like anything as considering the information it has at GitHub, it will have a great help of open source information at GitHub will provide Microsoft to get improve in all aspects includes:

Big data

we may see a major changes in Microsoft products. hope to see those.

GITHUB (2018)
$7.5 BIL

$28.1 BIL

NOKIA (2013)*
$7.2 BIL

SKYPE (2011)
$8.5 BIL

VISIO (1999)
$1.3 BIL

Microsoft would be definitely going strong considering the market requirement, and they are investing quite a lot for it.


Posted in Basic, News, Others, Whats New | Tagged , , | Leave a comment

Day 25 Optimization

Locking is depends upon Isolation level and Storage Engine.

MySQL uses table level locking (instead of page, row, or column locking) for all storage engines except InnoDB, which uses row level locking

locking system in MySQL:

Implicit Locking:

To maintain the ACID property every transaction has to maintain the locking internally so that end user will retrieve the valid data to achieve it engine has to have some implicit locking like write lock readers.

Explicit locking:

For some of the transactions user wanted to make sure that their transaction will be uninterrupted so they make an explicit lock on transaction.

Blocking is locking on session level, one transaction is writing on the object and if other transaction is trying to access same object, it will not until first write transaction completes (first session is blocking second session)

To get the blocking information:


MySQL>Show Full PROCESSLIST; ## extra information.

Following command will provide additional detail:

SELECT     pl.id    ,pl.user    ,pl.state    ,it.trx_id     ,it.trx_mysql_thread_id     ,it.trx_query AS query   ,it.trx_id AS blocking_trx_id    ,it.trx_mysql_thread_id AS blocking_thread    ,it.trx_query AS blocking_queryFROM information_schema.processlist AS pl INNER JOIN information_schema.innodb_trx AS it    ON pl.id = it.trx_mysql_thread_idINNER JOIN information_schema.innodb_lock_waits AS ilw    ON it.trx_id = ilw.requesting_trx_id         AND it.trx_id = ilw.blocking_trx_id

  • MySQL uses memory accordingly to value assigned to variable “innodb_buffer_pool_size “ standard value should be 50 % – 70% of total OS memory.the value to this variable is dynamic – no need to restart mysql to get effect.


To know how many times we have locks on table and required wait


| Table_locks_immediate | Table_locks_waited

To get the benchmark information about how much time required to retrieve the data for calculation

SELECT BENCHMARK(1000000,1+1)1 row in set (0.32 sec)

To get structure information about table


Similarly SHOW CREATE TABLESHOW TABLE STATUS, and SHOW INDEX statements to get information about tables.


To get the execution plan for the statement explain is the command.

{EXPLAIN tbl_name [col_name | wild]




{explainable_stmt | FOR CONNECTION connection_id}

explain_type: {



explainable_stmt: {

SELECT statement

| DELETE statement

| INSERT statement

| REPLACE statement

| UPDATE statement


Analyze table is used to update the statistics of the table and analyze it. During this operations table will be locked.

Statistics information is are stored in mysql.innodb_table_stats and mysql.innodb_index_stats

and also can be used using mysqlstats plugin OPTIMIZE TABLE Syntax

Optimize table is used to reorganized the indexes and data on the table for better performance. CHECK TABLE Syntax

It check the table if any error exists CHECKSUM TABLE Syntax

To maintain the consistency of the table this command it used, it also used for validation for backup data. REPAIR TABLE Syntax

This command is used for repair of any issue on table.





Posted in DeadLock, Lock/Blocking, MySQL, Others, Performance Tuning, What I learned today | Tagged , , , | Leave a comment

Day 24 Replication Setup

Today we will discuss how setup a replication:

  1. Master – Slave Replication:


  • 2 (Linux) System (MasterServer and SlaveServer)
  • MySQL Installed
  • Connection to both servers are accessible.

On Master Server:

  • Edit my.cnf with following information:


Log-Bin =MySQL_Binlog


  • Create a replication login which has access to REPLICATION SLAVE

mysql> CREATE USER ‘repl’@’%.%’ IDENTIFIED BY ‘mypass’;

mysql> GRANT REPLICATION SLAVE ON *.* TO ‘repl’@’%.%’;


  • Make sure before backup gather data for replication. Like bin-log file name and position.



| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |


| mysql_Binlog.006 | 100       | mysql       |                  |



This information is important from this information only we can setup the replication correctly(making both master and slave in sync).


  • Copy the data from master to slave and restore it to slave

mysqldump –all-databases –master-data > Master_Dump.bkp

It is important to include –master-data option which automatically appends the CHANGE MASTER TO statement required on the slave to start the replication.

We have to restart the mysql to effect my.cnf (bin-log enable) and other settings.

sudo /etc/init.d/mysql restart

On Slave Server:

  • Make system aware that this is slave, by making server-id =2 in my.cnf.




Restore the database

mysql –uroot –p < Master_Dump.bkp

Now we are ready to setup the replication configuration and provide the relation between master an stave. Here we have to provide information which we captured from show master status


   ->     MASTER_HOST=’ MasterServer,

   ->     MASTER_USER=’repl’,

   ->     MASTER_PASSWORD=’mypass’,

   ->     MASTER_LOG_FILE= ‘mysql_Binlog.006’,

   ->     MASTER_LOG_POS=100;


Once we run change master to command replication is setup between master and slave.

Now to start the replication we have to start slave



That it we have replication running.

To see the status of replication

mysql> SHOW SLAVE Status\G;


It will show all the information, some of the important information would be:

Slave_IO_State: Waiting for master to send event

Means replication is running good and no lag

Slave_IO_Running: Yes   Slave_SQL_Running: Yes

Means slave MySQL is running and connections is also good – replication is active.

Replicate_Do_DB: DBReplicate_Ignore_DB:Replicate_Do_Table:Replicate_Ignore_Table:Replicate_Wild_Do_Table:Replicate_Wild_Ignore_Table:

This Variables/options will filter or provide information about database/table specific replication.


Shows error occurred lastly.

Seconds_Behind_Master: 0

Another very important variable to validate replication is up to date

Read_Master_Log_Pos: xxx  Relay_Log_File: xxx.xx  Relay_Log_Pos: xx  Relay_Master_Log_File: xxx.xx


Provide information about Slave logging, how slave is working on the logs received from Master.


For setup master – master replication:

You have to enable bin-log at slave as well (second master).

And server_id=1 for that server.

Follow the same show master status parameters create repl user

And change master to configuration with master status parameters.




Posted in Disaster Recovery, High Avaliability, Internal, MySQL, Others, Replication, What I learned today | Tagged , , | Leave a comment

Day 23 Replication

Replication means making copy of the objects and moving transaction from primary server to secondary/standby/Slave server. There are two type of Replication

Master-Master Replication

Master-Slave Replication

Like other RDBMS MySQL works replication on Transactions. The pre requisite of replication is to enable the log-bin (bin log) transactions on the server. In MySQL when you enable the Bin log it will logged all the transactions from all the schemas/Databases, so we can setup a replication on all the databases. To enable replication for a specific database we have to make filter/specify the database or exclude the database for replication (we can even exclude/include the table/objects for replication).

Master-Master replication: in this type of replication both the server are accepting connections and changes can be performed on either of the server will replicate/effect to other node. Means on both the nodes you can make the changes.

For this Replication it could be possible that the conflict on the transactions/object may slow down or sometime it could be possible the replication may failed due to not managing the transactions (simultaneously)

Master-Slave Replication: In this type of replication master will be read-write and slave will be read only. So transactions will be moved from master to slave but slave will be worked as a reporting or taking a backup purpose to make master more efficient OLAP operations. This is most useful and commonly used replication.

Master should user with REPLICATION SLAVE permission on master and that user slave can use to connect to Master.

mysql> CREATE USER ‘repl’@’%’ IDENTIFIED BY ‘password’;

mysql> GRANT REPLICATION SLAVE ON *.* TO ‘repl’@’%’;


Finally once we backup the master and restore on to slave we need to configured the replication with log-bin position from where replication will start and Slave will start reading transactions from master.



Using this command we can get the Log-bin file name and the position of the log-bin to retrieve the transaction.

To enable or setup the replication


->     MASTER_HOST=’master_host_name‘,

->     MASTER_USER=’replication_user_name‘,

->     MASTER_PASSWORD=’replication_password‘,

->     MASTER_LOG_FILE=’recorded_log_file_name‘,

->     MASTER_LOG_POS=recorded_log_position;


*Replication cannot use UNIX socket files. You must be able to connect to the master MySQL server Host using TCP/IP

Start and stop slave/Replication can be done using following command:


mysql> START Stop;

There is other way of configuration replication using GUID



Posted in MySQL, Others, Replication, What I learned today | Tagged , | Leave a comment

Day 22 Other Storage Engines

So far we have discussed InnoDB, MyIASM and NDB Storage engine which are most standard/common and important storage engine MySQL has, there are some other Storage engine which are specific to the business requirement and used only for their special purpose. These Storage engine may require specific configuration as well and may not be for general purpose but we can integrate these storage engine tables with standard/common storage engine to make efficient use of our system, that is the specialty of MySQL Storage engines…

  • Memory: As the name implies, this storage engine stores data all in memory, it is also know Heap Engine. This provides great performance on data but less durable.


  • CSV: This storage engine stores the text data (Comma separated Data), it is used mostly to import and export the data from CSV format. In this format tables are not indexed.


  • Archive: This storage engine is used for large data to get archival for better performance. Data will be un-indexed and used to store the historical data generally not being used but we have to keep it for reference.


  • Blackhole: It does not store data, similar to the Unix /dev/null device. This is used for replication configurations where DML statements are sent to slave servers, but the master server does not keep its own copy of the data.


  • Merge: It is logically group of identical MyISAM tables and reference them as one object. recommended for VLDB like OLAP


  • Federated: Offers the ability to link separate MySQL servers to create one logical database from many physical servers. Good for distributed or data mart environments.


  • Example: This engine serves as an example in the MySQL source code that illustrates how to begin writing new storage engines.




Posted in MySQL, Others, Performance Tuning | Tagged , | Leave a comment

Day 21 NDB Storage Engine (Cluster)

MySQL also support High Availability, NDB Storage engine provides high availability- shared-nothing system, NDB Cluster integrates the standard MySQL server with an in-memory clustered storage engine called NDB (which stands for “Network DataBase”

NDB Cluster required 3 Nodes required complete the setup:

  • Management node(mgmd):

This node is used to manage the cluster, using this node we can configured, start and stop the cluster, using this node only we can run the backup- ndb_mgmd.

As the name suggest, this node contains the actual data. There should be more than 1 data node required for data redundancy (replica). Default set to 2- these replca’s will contain the same information and if any one node goes down data will be available on other node.

Eg. If we have 4 data node with replica of 2, then their will be 2 set of 2 replica’s and each contains part of data(set). ndbd (data node daemon), ndbmtd (multi-threaded)

NDB Cluster tables are normally stored completely in memory rather than on disk (this is why we refer to NDB Cluster as an in-memory database). The data will be flushed from memory to data nodes periodically using LCP and GCP.

Local Checkpoint (LCP):

This is checkpoint to data node, it save the data from memory to disk occurs every few minutes. It depends upon the amount of data stored by the node, the level of cluster activity, and other factors.

Global Checkpoint (GCP):

GCP occurs every few seconds, when transactions for all nodes are synchronized and the redo-log is flushed to disk.

This node contains MySQLD mysql services, using this node Cluster data will be accessed. This is also called as API node.

Each data node or SQL node requires a my.cnf file as follows:










The management node needs a config.ini file, this is important and contains all the information about NDB Cluster (SQL Node, Data Node and management node) For our representative setup, the config.ini file should read as follows:

[ndbd default]

NoOfReplicas=2   # replicas

DataMemory=80M   # memory allocate – data storage

IndexMemory=18M   # memory allocate – index storage



HostName=HostNm # MGM node

DataDir=/var/lib/mysql-cluster                               #MGM node log files location















The management node should be started first, followed by the data nodes, and then finally by any SQL nodes:

It support sonly READ COMMITTED transaction isolation level




Posted in Disaster Recovery, High Avaliability, Isolation Level, MySQL, Others | Tagged , , , | Leave a comment

Day 20 Innodb Storage engine

InnoDB is the major Storage engine and is default Storage engine after MySQL 5.5 version. As standard RDBMS requires ACID properties and to make the Enterprise level of the system requires Transaction management. Before this MySQL was known to be part of LAMP and mostly used by Web Developers and small systems. During this period (2010) MySQL was part of SUN and they were progressing on Enterprise support quite well. (This could be one of the reason might Oracle acquired Sun Microsystem – my guessJ.

InnoDB is a complete RDBMS and support all the feature which all the other major RDBMS system supports. When discuss MySQL it will be mostly about InnoDB. So in general MySQL is InnoDB.

When you create the table with InnoDB storage engine there will be 4 files created, 1 .FRM file at database directory, 2 Log files .ibd Files and 1 Data file ibdData. We can change the setting innodb_file_per_table=1 to create 1 data file per table (similar to myIASM storage engine).

As stated, it fulfilled all the properties as other RDBMS includes:

  1. Support ACID Transaction property
  2. Support Recovery and point in time Recovery.
  3. High standard Troubleshooting utilities (information_Schema, Performance_Schema, Sys Databases).
  4. Stable Consistency system with enterprise support and most Bug-free.
  5. Highly Query optimizers/tuning mechanism.
  6. Highly used Storage Engine




Posted in MySQL, Others | Tagged , , | Leave a comment

Day 19 MyIASM Storage engine

MyIASM storage engine is available from the early stage of MySQL this engine is developed from IASM language and it is used for Read intensive operations. As described in yesterday this Engine was default till MySQL 5.5 and is widely used, and can able to store huge amount of data as there will be no transactions only heavy reading the data.

When we use MyIASM engine for table it creates 3 physical files for that table:

.FRM: Format File – stores Metadata information about the table/object

.MYD:Data File – Stores actual data from the objects

.MYI: Index File – stores index data /information.

MyIASM has following utilities:

mysqlcheck/myisamchk – for checking consistency,  it can recover the corruptions

compressMyISAM / myisampack for compression for faster data retrieval and optimum space utilization.

  • MySQL Table can have max rows =(232)2
  • Max Indexes per table = 64
  • Max columns per index =16
  • Index can be on BLOB and TEXT columns.
  • Indexed columns can contains NULL.

Storage formats:

Fixed /Static – having fixed row length or column has fixed size.

Dynamic – Variable row length, variable types column (dynamics)

Compressed – compressed by myisampack utility, data will be read-only (only DDL allowed)

When you use CREATE TABLE or ALTER TABLE for a table that has

Using ROW_FORMAT option while creating a table we can specify type of storage format.

To un-compressed the data- myisamchk –unpack.

Consider following points before choosing MyIASM:

  1. Locking is table level.
  2. For Big extensive read Database
  3. Good for TEXT and BLOB data type.
  4. Perform better on Fixed format for faster reads(could take more disk space)
  5. Performance is great on index as the statistics are accurate.
  6. Corruption handles well
  7. No Referential integrity (foreign key)




Posted in General, History, Internal, MySQL, Others | Tagged , , , , | Leave a comment