Denali – Day 23: Data Quality Services(DQS)
Data intelligence vs. human intelligence: Sql server RDBMS is a database system which has data in it, most of the time, we enter the data but as a user they enter the data assuming things are good, like “USA, US or America are same for user as a knowledgeable person but for the system it is not, so using DQS we can integrate the human intelligence to make our data become more productive and can cleanse or correct the data.
Not installed by default.
Custom Install/activate using sql server Engine setup, to check the option.
Once setup done, need to activate or install the DQS using GUI – Data Quality Installer or DQSInstaller.exe. to complete the installation.
Need to provide strong password for master key.
It has two types
Useful for BI.
SSIS for data cleansing or TOOL: DQSInstaller.exe
Metadata DATABASE: DQS_Main, DQS_Projects, DQS_Staging_data
To DQS Tool have three major components in it.
- Knowledge Base Management
- Data Quality Projects:
KB is to let DQS build the knowledge and let it declare what is correct, means educate or knowledge the system, once system is having data, we can build the project to produce the required output like correct the data or make the report or produce the duplicate data, and to administrate the same using Administration.
Knowledge Base Management
Data Quality Knowledge Base (DQKB) Build KB to be used into projects.
KB – steward
Knowledge Discovery: DQKB… discovery
It is the first step to analyze your system to generate the KB.
Select source and table
Table columns to kb domain
Create a new domain for each column to validate
Reviewed — have three options correct, error, invalid
Check Domain Management
Create domain rules
Domain Management/ composite domain:
Once you analyze the system you can create a Domain Management and build the Business rules.
Right click KB and select a Kb, select domain management…
Create a composite domain.
Build a Business Rule
We can even create a composite domain
Composite domain: eg. Address.
*use this to clean the data, using DQS Project Cleansing.
Matching policy is a KB to check for duplicate on your system.
Open the Created KB.
And select Matching policy.
Check for duplication or matching per requirement.
Data Quality Projects:
Once you have KB ready, you can use that KB in a project, Cleansing is a project to validate the KB and produce the required result. Create a project and cleanse with the specific Rules, start the cleansing will provide you the results. Which can be corrected (approved) and export the output result into xml or sql server.
In xls sheet. Ongoing. New DQP , project
Xls columns to kb domain.
Job Title: Suggested, new, invalid, corrected, correct -à Approve/Reject
We can correct/approval the correction
Lastly: exporting the cleansing data.
Destination … xls or sql data (cannot to existing table)
“It uses KB gathering in a data quality KB. Cleansing results can be exported to xls or sql table”
*Composite Domain to cleanse data
Matching: CTP3. :
Matching is used to check for duplicates /matching records and produced and exports it.
Reflects business requirement to find duplicate records
Better KB more matching results would be.
*You can install DQS tool independently with SSIS on separate server without Engine.
DQS is integrated with both SQL Server Integration Services and Master Data Services.
Eg. If we have an .xls file which has some data, which is not clear and we want it to be produced meaningful output, so we will create a KB with that xls and linked the columns with our domain. Create rules,
Create a new cleansing project and use that xls file and use the kb, and associate it, and start the cleansing; it will provide the result which matches with the rules.
We can approve the resulted data to corrected, and then after completion of cleansing, we can export back the resulted information to xls or sql server new table.
If we want to configure “Azure” can be done in an administration part.
*I am learning it so please suggest if I am wrong here.