25 July 2022 - 1 MINS READ
What is a data catalog?
A catalog, in its literal sense, refers to a book or document containing a complete list of things, usually arranged systematically. A data catalog, in the context of data management, is an organised inventory of all data assets inside an organisation to help data professionals easily locate the most relevant data for the aim of gaining business insights.
A data catalog is a collection of metadata combined with data management (e.g., access permissions) and search tools. We can also call it a metadata catalog. The metadata summarise or describe the underlying data assets. Data catalogs have emerged as a powerful tool for data management and data governance.
These data assets consist of, but are not limited to:
- Structured (tabular) data
- Unstructured data
- Reports and query results
- Data visualisations and dashboards
- Machine learning models
A data catalog should, at the very least, respond to:
- Where can I get my data?
- Are these data relevant and important?
- What do these data indicates?
- How can I make use of this data?
Generally, the data catalog gives users access to tools that let them accomplish the following:
- Search the catalog with flexible searching and filtering options
- Data discovery
- Data governance in compliance with organisational or governmental rules
Metadata Indexing: The underlying data assets are not indexed by the data catalog. Only the metadata describing these data assets is indexed.