Solutions · Open Source

Argus Catalog

An integrated AI·Data·API metadata platform that governs data, models, APIs, and AI agents in a single catalog. With strong support for air-gapped and on-premises environments, it secures enterprise-wide data sovereignty without ever sending data outside.

Apache License 2.0 · Open SourceGitHub Repository

Concept Diagram

Highlights

Unified governance of data, models, APIs & AI

Brings the data catalog, ML model registry, API catalog, and AI Agent catalog together to deliver an enterprise-wide single source of truth (SSOT).

Auto-sync across 11 data sources

Automatically collects metadata from Hive, Impala, Kudu, Trino, StarRocks, Greenplum, Iceberg REST, PostgreSQL, MySQL, Oracle, and MSSQL, keeping schemas, statistics, and lineage up to date.

Enterprise

Column-level cross-platform lineage

Automatically traces end-to-end lineage at the dataset and column level via SQL parsing, and generates ER diagrams from DDL parsing.

Enterprise

Air-gapped / on-prem + local LLMs

Integrates with OpenAI and Anthropic as well as local LLMs such as Ollama, enabling full AI governance even in closed networks where data never leaves.

Platform Architecture

An end-to-end metadata platform where Catalog UI, Server, Extensions, and SDK work organically together.

Catalog UI

Next.js · React

Dataset discovery & management

Lineage & ERD visualization

Model registry dashboard

Quality dashboard

API & AI Agent catalog

Semantic search & AI assistant

Catalog Server

FastAPI · PostgreSQL

REST API (v1)

pgvector hybrid search

S3/MinIO model store

MLflow & OCI compatible

Data quality engine

AI metadata generation

Extensions

Enterprise

Sync · Plugins · Analyzer

Metadata Sync (11 sources)

Impala Query Agent

Trino Query Listener

StarRocks Audit Plugin

Source code analysis (Java/Python)

LDAP user sync

SDK & CLI

Python SDK

argus-model CLI

OCI-based model Push/Pull

HuggingFace import

Air-gapped transfer workflow

Presigned URL upload

Manifest management

Supported Data Sources (11)

HiveImpalaKuduTrinoStarRocksGreenplumIceberg RESTPostgreSQLMySQLOracleMSSQL

Core Capabilities

From data catalog and search to quality & governance, ML model registry, and AI — the six pillars of enterprise metadata management in a single platform.

Data Catalog

The core for discovering, trusting, and governing datasets.

URN-based dataset registration, search, tags & ownership

Column-level lineage & DDL-based ERD

Data standards dictionary & glossary (morphological analysis)

pgvector keyword + semantic hybrid search

Search & Discovery

Find data fast with hybrid search that blends keywords and meaning.

pgvector embedding-based keyword + semantic hybrid search

Unified search across datasets, APIs, models & glossary

Faceted filters by tag, owner & domain

Morphological-analysis-tuned Korean search

Data Quality

Profiles source databases directly and validates with rules.

Profiling (incl. mode) & 10 validation rule types

CUSTOM_SQL / CUSTOM_PYTHON user-defined rules

Auto-synced quality scores (GOOD/WARN/BAD) & trends

Upstream quality-propagation warnings via lineage

Metadata Governance

Catalogs not just data but APIs and AI agents too.

API catalog — OpenAPI spec registration, version diff & lint

AI Agent catalog — tools/MCP, evaluation & metering

URN-based unified metadata management

Schema-change impact analysis & webhook alerts

ML Model Registry

MLflow/OCI-compatible model governance with air-gapped import.

MLflow integration & version/stage management (STAGING/PRODUCTION)

Metric comparison & model cards

OCI model hub (HuggingFace-style browser)

argus-model CLI & air-gapped import

AI

Auto-generates metadata with LLMs and queries the catalog.

AI metadata generation (descriptions, tags, PII detection; approval-based)

Tool-use AI assistant (catalog/schema/quality/lineage tools)

Answers grounded in real data

OpenAI, Anthropic & Ollama (local LLM) integration

Catalog Federation

Federate multiple Argus instances into one for unified search and browse — with air-gap-friendly HARVEST mirroring and local promotion.

LIVE / HARVEST / HYBRID federation modes

Unified search · browse · cross-instance lineage

HARVEST mirror · hub-model re-embedding · sample mirroring

Promote (import) mirrored datasets to local

Enterprise

Query-based lineage & relationship collection

Automatically collects lineage and relationships from real queries on operational SQL engines.

Query event collection for Hive, Impala, Trino & StarRocks

Automatic column-level runtime lineage extraction

Usage-based column JOIN relationship analysis

Multi-dialect SQL parser (incl. Impala)

Enterprise

Static source-code analysis

Extracts DB table mappings from application source code to enrich lineage.

Java — JPA, Hibernate, MyBatis & Spring JDBC

Python — SQLAlchemy, Django ORM & DB-API

Automatic ORM/SQL → table mapping extraction

Automatic catalog lineage enrichment

Enterprise

Enterprise connector sync

Bulk-syncs metadata from a wide range of sources automatically.

Metadata collection across 11 data sources

Support for Greenplum, Iceberg REST, Kudu & more

Schema, statistics & DDL synchronization

CLI/cron batch operation

Enterprise

LDAP/AD user sync

Auto-manages catalog users with the corporate directory as the source of truth.

OpenLDAP & Active Directory integration

User add, deactivate, reactivate & department update

Dry-run preview & cron batch

Safeguard against deactivating local accounts

Editions

Use the open-source core freely with Community, or step up to Enterprise when you need extension modules and dedicated technical support — available in two editions.

Community

Apache License 2.0 · Free

Use the entire open-source core without restrictions and run it yourself.

GitHub Repository

Recommended

Enterprise

Enterprise customer support

Everything in Community, plus extension modules and SLA-backed dedicated technical support.

Feature Comparison

Community

Enterprise

Core Capabilities

Data Catalog · Search & Discovery · Data Quality

Metadata Governance (API · AI Agent)

ML Model Registry · AI metadata/assistant

Metadata sync for 11 data sources

Catalog Federation (instance federation · mirror · local promotion)

Extension Modules (Enterprise)

Query-based lineage & relationship collection

Hive Query CollectorImpala Query CollectorTrino Query CollectorStarRocks Query CollectorQuery Collection & Processing ServiceColumn Relationship AnalyzerSQL ParserImpala SQL Parser

Static source-code analysis

Java Source-Code AnalyzerPython Source-Code Analyzer

Enterprise connector sync

Metadata Sync Service

LDAP/AD user sync

LDAP/AD User Sync

Support & Services (Enterprise)

SLA-backed dedicated technical support

Priority hotfixes & security patches

Installation, deployment & migration support

Training, onboarding & architecture consulting

Air-gapped deployment support · roadmap priority

Support Channel

Support channel

GitHub Issues

Dedicated support

Apache License 2.0 · Open Source

An open-source metadata platform

Argus Catalog is fully open-sourced on GitHub under the Apache License 2.0. Apart from the metadata ingestion connectors, the entire core engine — backend, frontend, SDK, AI agent, and quality batch — is public, so enterprises can verify the code directly, extend it to fit their environment, and operate it without any external data leakage.

Apache 2.0 with no commercial-use restrictions
Verify and extend the code yourself
Self-host in air-gapped / on-premises

GitHub Repository Read the open-source announcement