Unity Catalog is a unified governance solution for data and AI assets on the Databricks lakehouse. It provides centralized access control, auditing, lineage, and data discovery capabilities across Azure Databricks workspaces 1. Unity Catalog offers a single place to administer data access policies that apply across all workspaces, and its security model is based on standard ANSI SQL, allowing administrators to grant permissions in their existing data lake using familiar syntax, at the level of catalogs, databases (also called schemas), tables, and views 1.
The hierarchy of primary data objects in Unity Catalog flows from metastore to table or volume. Metastore is the top-level container for metadata, and each metastore exposes a three-level namespace (catalog.schema.table) that organizes your data 1. Catalog is the first layer of the object hierarchy, used to organize your data assets. Schema, also known as databases, are the second layer of the object hierarchy and contain tables and views. Tables, views, and volumes are at the lowest level in the object hierarchy, with volumes providing governance for non-tabular data 1.
Unity Catalog lets you tag and document data assets, and provides a search interface to help data consumers find data. It also automatically captures user-level audit logs that record access to your data, and captures lineage data that tracks how data assets are created and used across all languages 1. Additionally, Unity Catalog lets you easily access and query your account’s operational data, including audit logs, billable usage, and lineage 1.
In conclusion, Unity Catalog is a powerful tool that provides centralized access control, auditing, lineage, and data discovery capabilities across Azure Databricks workspaces. It offers a single place to administer data access policies that apply across all workspaces, and its security model is based on standard ANSI SQL, allowing administrators to grant permissions in their existing data lake using familiar syntax. Unity Catalog lets you tag and document data assets, and provides a search interface to help data consumers find data. It also automatically captures user-level audit logs that record access to your data, and captures lineage data that tracks how data assets are created and used across all languages1.