Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

GitHub - unitycatalog/unitycatalog: Open, Multi-modal Catalog for Data & AI

Jun 14, 2024 - github.com
The article introduces Unity Catalog, an open and interoperable catalog for data and AI. It supports any format, engine, and asset, including Delta Lake, Apache Iceberg, Apache Parquet, CSV, and more. It also supports unstructured data and AI assets, and is extensible to Iceberg REST Catalog and HMS interface for client compatibility. The catalog is fully open, with OpenAPI spec and OSS implementation, and provides unified governance for data and AI.

The article also provides a quick start guide to using Unity Catalog. It guides the user through running the UC server, exploring its contents, and operating on Delta tables with the CLI and DuckDB. It also mentions prerequisites, such as cloning the repository and ensuring the JAVA_HOME environment variable is configured to point to JDK11+. The article concludes by mentioning a full tutorial for more details, the evolving APIs and compatibility, and instructions for compiling and testing.

Key takeaways:

  • Unity Catalog is an open and interoperable catalog for data and AI, supporting any format, engine, and asset.
  • It provides multi-format table support, unstructured data, AI assets, and plugin support for client compatibility.
  • The Unity Catalog offers unified governance for data and AI, with asset-level access control enforced through temporary credential vending via REST APIs.
  • Users can operate on Delta tables with the CLI and DuckDB, and the Unity Catalog can be connected to DuckDB by specifying a secret.
View Full Article

Comments (0)

Be the first to comment!