Thrilling news of public preview availability of Azure Data Catalog spread the word from Joseph Sirosh’s blog post yesterday; Azure Data Catalog will be available for public preview this Monday 13th of July. This is great step forward for using metadata alongside tools for extracting and visualization of the data. I would like to share my thoughts about what is a data catalog and what to expect from it on Monday.
What is Data Catalog; Data Catalog contains metadata related to the data source, metadata can contains tags, comments, descriptions, annotations…. about data source, tables, views, indexes, and all other objects in the data source. Most of you worked with databases (readers of my blog are database pros usually 😉 ). In all database environments there are two sides; business and IT (or let’s say owner, producers, or consumers of the database technology). Business usually understand concepts, while IT understand database structure. Metadata stored in a data catalog is the connection between these two.
Azure Data Catalog comes with the promise of crowd-sourced catalog. This means people will be able to define, edit and update annotations, tags and metadata information stored for a dataset. This is a big advantage as the data source metadata will be kept as a single version of truth. However there will be a requirement for a metadata keeper or someone like data steward who verify the definition of metadata, otherwise everyone might write some stuff about the data source which might not be valid. So my expectation is that Azure Data Catalog will manage that somehow (maybe with utilizing a data steward in the way that MDS does for keeping master data).
Video above explains the highlights of Azure Data Factory.
Azure Data Factory able users from business users, to developers, and administrators to explain the data, produce the metadata information in the catalog and also view and use it. With data analytics and data analysis solutions and products in the market nowadays, a metadata catalog such as Azure Data Catalog fits best to serve other products. Products such as Power Query for mash up, Power View for data visualization and many other tools can gain benefits from Azure Data Catalog.
In my view Azure Data Catalog is a great enhancement step forward towards better data management and information management. It is a missed step that connects the chain of information management from data extraction to delivery and insights. Thanks to Microsoft team for stepping forward towards this and I am looking forward to work with it this coming Monday.