Solutions · Open Source

Argus Catalog

An integrated AI·Data·API metadata platform that governs data, models, APIs, and AI agents in a single catalog. With strong support for air-gapped and on-premises environments, it secures enterprise-wide data sovereignty without ever sending data outside.

Apache License 2.0 · Open SourceGitHub Repository

Concept Diagram

Argus Catalog Platform Architecture

Highlights

01

Unified governance of data, models, APIs & AI

Brings the data catalog, ML model registry, API catalog, and AI Agent catalog together to deliver an enterprise-wide single source of truth (SSOT).

02

Auto-sync across 11 data sources

Automatically collects metadata from Hive, Impala, Kudu, Trino, StarRocks, Greenplum, Iceberg REST, PostgreSQL, MySQL, Oracle, and MSSQL, keeping schemas, statistics, and lineage up to date.

Enterprise
03

Column-level cross-platform lineage

Automatically traces end-to-end lineage at the dataset and column level via SQL parsing, and generates ER diagrams from DDL parsing.

Enterprise
04

Air-gapped / on-prem + local LLMs

Integrates with OpenAI and Anthropic as well as local LLMs such as Ollama, enabling full AI governance even in closed networks where data never leaves.

Platform Architecture

An end-to-end metadata platform where Catalog UI, Server, Extensions, and SDK work organically together.

Catalog UI
Next.js · React
Dataset discovery & management
Lineage & ERD visualization
Model registry dashboard
Quality dashboard
API & AI Agent catalog
Semantic search & AI assistant
Catalog Server
FastAPI · PostgreSQL
REST API (v1)
pgvector hybrid search
S3/MinIO model store
MLflow & OCI compatible
Data quality engine
AI metadata generation
Extensions
Enterprise
Sync · Plugins · Analyzer
Metadata Sync (11 sources)
Impala Query Agent
Trino Query Listener
StarRocks Audit Plugin
Source code analysis (Java/Python)
LDAP user sync
SDK & CLI
Python SDK
argus-model CLI
OCI-based model Push/Pull
HuggingFace import
Air-gapped transfer workflow
Presigned URL upload
Manifest management
Supported Data Sources (11)
HiveImpalaKuduTrinoStarRocksGreenplumIceberg RESTPostgreSQLMySQLOracleMSSQL

Core Capabilities

From data catalog and search to quality & governance, ML model registry, and AI — the six pillars of enterprise metadata management in a single platform.

Data Catalog

The core for discovering, trusting, and governing datasets.

URN-based dataset registration, search, tags & ownership
Column-level lineage & DDL-based ERD
Data standards dictionary & glossary (morphological analysis)
pgvector keyword + semantic hybrid search

Search & Discovery

Find data fast with hybrid search that blends keywords and meaning.

pgvector embedding-based keyword + semantic hybrid search
Unified search across datasets, APIs, models & glossary
Faceted filters by tag, owner & domain
Morphological-analysis-tuned Korean search

Data Quality

Profiles source databases directly and validates with rules.

Profiling (incl. mode) & 10 validation rule types
CUSTOM_SQL / CUSTOM_PYTHON user-defined rules
Auto-synced quality scores (GOOD/WARN/BAD) & trends
Upstream quality-propagation warnings via lineage

Metadata Governance

Catalogs not just data but APIs and AI agents too.

API catalog — OpenAPI spec registration, version diff & lint
AI Agent catalog — tools/MCP, evaluation & metering
URN-based unified metadata management
Schema-change impact analysis & webhook alerts

ML Model Registry

MLflow/OCI-compatible model governance with air-gapped import.

MLflow integration & version/stage management (STAGING/PRODUCTION)
Metric comparison & model cards
OCI model hub (HuggingFace-style browser)
argus-model CLI & air-gapped import

AI

Auto-generates metadata with LLMs and queries the catalog.

AI metadata generation (descriptions, tags, PII detection; approval-based)
Tool-use AI assistant (catalog/schema/quality/lineage tools)
Answers grounded in real data
OpenAI, Anthropic & Ollama (local LLM) integration

Catalog Federation

Federate multiple Argus instances into one for unified search and browse — with air-gap-friendly HARVEST mirroring and local promotion.

LIVE / HARVEST / HYBRID federation modes
Unified search · browse · cross-instance lineage
HARVEST mirror · hub-model re-embedding · sample mirroring
Promote (import) mirrored datasets to local
Enterprise

Query-based lineage & relationship collection

Automatically collects lineage and relationships from real queries on operational SQL engines.

Query event collection for Hive, Impala, Trino & StarRocks
Automatic column-level runtime lineage extraction
Usage-based column JOIN relationship analysis
Multi-dialect SQL parser (incl. Impala)
Enterprise

Static source-code analysis

Extracts DB table mappings from application source code to enrich lineage.

Java — JPA, Hibernate, MyBatis & Spring JDBC
Python — SQLAlchemy, Django ORM & DB-API
Automatic ORM/SQL → table mapping extraction
Automatic catalog lineage enrichment
Enterprise

Enterprise connector sync

Bulk-syncs metadata from a wide range of sources automatically.

Metadata collection across 11 data sources
Support for Greenplum, Iceberg REST, Kudu & more
Schema, statistics & DDL synchronization
CLI/cron batch operation
Enterprise

LDAP/AD user sync

Auto-manages catalog users with the corporate directory as the source of truth.

OpenLDAP & Active Directory integration
User add, deactivate, reactivate & department update
Dry-run preview & cron batch
Safeguard against deactivating local accounts

Editions

Use the open-source core freely with Community, or step up to Enterprise when you need extension modules and dedicated technical support — available in two editions.

Community

Apache License 2.0 · Free

Use the entire open-source core without restrictions and run it yourself.

Recommended

Enterprise

Enterprise customer support

Everything in Community, plus extension modules and SLA-backed dedicated technical support.

Feature Comparison
Community
Enterprise
Core Capabilities
Data Catalog · Search & Discovery · Data Quality
Metadata Governance (API · AI Agent)
ML Model Registry · AI metadata/assistant
Metadata sync for 11 data sources
Catalog Federation (instance federation · mirror · local promotion)
Extension Modules (Enterprise)
Query-based lineage & relationship collection
Hive Query CollectorImpala Query CollectorTrino Query CollectorStarRocks Query CollectorQuery Collection & Processing ServiceColumn Relationship AnalyzerSQL ParserImpala SQL Parser
Static source-code analysis
Java Source-Code AnalyzerPython Source-Code Analyzer
Enterprise connector sync
Metadata Sync Service
LDAP/AD user sync
LDAP/AD User Sync
Support & Services (Enterprise)
SLA-backed dedicated technical support
Priority hotfixes & security patches
Installation, deployment & migration support
Training, onboarding & architecture consulting
Air-gapped deployment support · roadmap priority
Support Channel
Support channel
GitHub Issues
Dedicated support
Apache License 2.0 · Open Source

An open-source metadata platform

Argus Catalog is fully open-sourced on GitHub under the Apache License 2.0. Apart from the metadata ingestion connectors, the entire core engine — backend, frontend, SDK, AI agent, and quality batch — is public, so enterprises can verify the code directly, extend it to fit their environment, and operate it without any external data leakage.

  • Apache 2.0 with no commercial-use restrictions
  • Verify and extend the code yourself
  • Self-host in air-gapped / on-premises