Programme Overview
Training Description
Who Should Attend
This course is ideal for;
- Data Stewards and Metadata Managers
- Data Engineers and Architects
- Data Governance Professionals
- Business Intelligence Developers
- Data Scientists and Analysts
- IT Compliance Officers
- Enterprise Architects
- Information Management Officers
Session Objectives
- Understand the principles and value of metadata management
- Learn to implement enterprise metadata standards and models
- Gain skills in selecting and deploying data catalog platforms
- Establish metadata-driven data governance frameworks
- Automate metadata ingestion and lineage capture
- Create and manage business glossaries and taxonomies
- Enhance data discoverability and classification
- Enable collaborative metadata stewardship workflows
- Integrate metadata into analytics and ML environments
- Enforce data access and usage policies via metadata
- Monitor metadata quality, versioning, and impact analysis
About the Course
In today’s data-driven landscape, organizations are increasingly dependent on accurate, accessible, and governed metadata to drive value from their data assets. The Managing Metadata and Data Catalogs Training Course is designed to equip professionals with the tools and techniques needed to organize, maintain, and leverage metadata effectively using enterprise-grade data catalog platforms. This 10-day intensive course explores strategies for metadata lifecycle management, active data cataloging, automated lineage, policy enforcement, and business glossary integration. Through hands-on practice and expert-led modules, participants will gain the critical competencies to support data discovery, governance, compliance, and analytics initiatives at scale.
Curriculum & Topics
15 Topics | 10 Days
-
Subtopic 1.1: What is metadata and why it matters
-
Subtopic 1.2: Types of metadata: technical, business, operational
-
Subtopic 1.3: Metadata lifecycle stages
-
Subtopic 1.4: Benefits to governance, analytics, and compliance
-
Subtopic 1.5: Common metadata management challenges
-
Subtopic 2.1: Metadata modeling concepts
-
Subtopic 2.2: Standards like ISO 11179 and Dublin Core
-
Subtopic 2.3: Entity relationship modeling for metadata
-
Subtopic 2.4: Aligning metadata models with business needs
-
Subtopic 2.5: Mapping technical assets to business terms
-
Subtopic 3.1: What is a data catalog and how it works
-
Subtopic 3.2: Catalog components: search, lineage, tagging
-
Subtopic 3.3: Comparing open-source and commercial catalogs
-
Subtopic 3.4: Use cases across industries
-
Subtopic 3.5: Cataloging structured and unstructured data
-
Subtopic 4.1: Automated metadata discovery techniques
-
Subtopic 4.2: Tagging, classification, and categorization
-
Subtopic 4.3: Sensitivity detection and PII tagging
-
Subtopic 4.4: Leveraging AI/ML for smart classification
-
Subtopic 4.5: Organizing metadata using taxonomy
-
Subtopic 5.1: Techniques for ingesting metadata from sources
-
Subtopic 5.2: APIs, connectors, and crawlers
-
Subtopic 5.3: Scheduling metadata synchronization
-
Subtopic 5.4: Metadata harvesting from lakes and warehouses
-
Subtopic 5.5: Event-driven ingestion workflows
-
Subtopic 6.1: Difference between glossary and catalog
-
Subtopic 6.2: Business term definition and approval
-
Subtopic 6.3: Relationships and synonyms between terms
-
Subtopic 6.4: Integrating glossary with catalog search
-
Subtopic 6.5: Governance of glossary changes
-
Subtopic 7.1: Capturing and visualizing data lineage
-
Subtopic 7.2: Lineage at schema, table, and column levels
-
Subtopic 7.3: Upstream and downstream dependency analysis
-
Subtopic 7.4: Using lineage for impact and root cause analysis
-
Subtopic 7.5: Change propagation and notification
-
Subtopic 8.1: Roles and responsibilities in metadata governance
-
Subtopic 8.2: Setting policies and stewardship workflows
-
Subtopic 8.3: Governance models: centralized, federated, hybrid
-
Subtopic 8.4: Stewardship tools and collaboration
-
Subtopic 8.5: Ensuring metadata quality and accuracy
-
Subtopic 9.1: Data catalog components and deployment models
-
Subtopic 9.2: On-premise, cloud, and hybrid deployments
-
Subtopic 9.3: Integration with security and identity systems
-
Subtopic 9.4: Metadata storage and scalability
-
Subtopic 9.5: Access control and role-based views
-
Subtopic 10.1: Open-source options: Amundsen, DataHub, Apache Atlas
-
Subtopic 10.2: Commercial platforms: Alation, Collibra, Informatica
-
Subtopic 10.3: Feature comparison and selection criteria
-
Subtopic 10.4: Licensing and pricing considerations
-
Subtopic 10.5: Vendor support and extensibility
-
Subtopic 11.1: Faceted search, autocomplete, and filtering
-
Subtopic 11.2: Metadata enrichment for better UX
-
Subtopic 11.3: Recommendation engines in data catalogs
-
Subtopic 11.4: Bookmarking, annotations, and feedback
-
Subtopic 11.5: Personalization and usage tracking
-
Subtopic 12.1: Metadata pipeline automation using CI/CD
-
Subtopic 12.2: Policy triggers and automated validation
-
Subtopic 12.3: Scheduling lineage refresh and quality scans
-
Subtopic 12.4: Workflow orchestration with Airflow or Prefect
-
Subtopic 12.5: Alerting on stale or incomplete metadata
-
Subtopic 13.1: Linking metadata to access control systems
-
Subtopic 13.2: Tag-based access and masking rules
-
Subtopic 13.3: Conditional access based on sensitivity
-
Subtopic 13.4: Auditing metadata usage
-
Subtopic 13.5: Policy enforcement and compliance reporting
-
Subtopic 14.1: Using metadata in feature engineering
-
Subtopic 14.2: Discovering reusable data assets
-
Subtopic 14.3: Tracking model input/output lineage
-
Subtopic 14.4: Ensuring explainability and transparency
-
Subtopic 14.5: Metadata in MLOps pipelines
-
Subtopic 15.1: Trends: active metadata, knowledge graphs
-
Subtopic 15.2: Metadata interoperability and APIs
-
Subtopic 15.3: AI/ML enhancements for catalogs
-
Subtopic 15.4: Self-service and crowdsourced metadata
-
Subtopic 15.5: Building an enterprise metadata strategy