Nairobi, Kenya

254728269396

Data Governance and Lineage Training: Building Trustworthy, Compliant & Transparent Data Ecosystems

Modern enterprises rely on massive volumes of data for decision-making, analytics, and AI innovation. However, without robust data governance and transparent data lineage, even the most advanced data...

Click to Register

ONSITE OR VIRTUAL

Programme Overview
Training Description

Who Should Attend

This course is ideal for;

  1. Data Engineers and Data Architects
  2. Data Governance Officers
  3. Compliance and Risk Management Professionals
  4. Cloud Data Engineers
  5. Metadata Managers and Data Stewards
  6. DevOps and DataOps Teams
  7. Machine Learning and AI Practitioners
  8. Business Intelligence Developers
Session Objectives
  • Understand the foundational principles of data governance
  • Implement automated data lineage tracking across environments
  • Manage metadata using data catalogs and governance tools
  • Integrate data governance into CI/CD and DevOps workflows
  • Ensure data privacy and regulatory compliance
  • Monitor and improve data quality and policy enforcement
  • Build collaborative governance frameworks
  • Design architectures for scalable governance
  • Apply governance to ML and AI data pipelines
  • Visualize governance KPIs and audit metrics
  • Enable real-time and hybrid data governance
About the Course

Modern enterprises rely on massive volumes of data for decision-making, analytics, and AI innovation. However, without robust data governance and transparent data lineage, even the most advanced data platforms risk compliance breaches, poor data quality, and organizational mistrust. The Data Governance and Lineage Training Course is tailored to equip engineers and data professionals with the critical knowledge and tools to implement governance frameworks, establish traceable data flows, and align with data privacy regulations. With a focus on hybrid, multi-cloud, and real-time environments, this course bridges the gap between technical engineering and enterprise compliance, enabling scalable, secure, and auditable data ecosystems.
Participants will explore how to automate metadata tracking, integrate governance into modern DevOps pipelines, and ensure regulatory alignment across industries. The course uses real-world scenarios and hands-on exercises with platforms like Apache Atlas, Collibra, and DataHub to empower learners to create a resilient data governance infrastructure.

Curriculum & Topics

15 Topics | 10 Days

  • play Subtopic 1.1: Core concepts and business importance of data governance

  • play Subtopic 1.2: Governance frameworks and maturity models

  • play Subtopic 1.3: Key governance roles and accountability structures

  • play Subtopic 1.4: Policies, stewardship, and compliance alignment

  • play Subtopic 1.5: Aligning data strategy with business goals

  • play Subtopic 2.1: Types of lineages: business, technical, operational

  • play Subtopic 2.2: Capturing data flow from source to destination

  • play Subtopic 2.3: Visualizing and mapping transformations

  • play Subtopic 2.4: Tools for automated lineage capture

  • play Subtopic 2.5: Practical use cases: audit, debugging, reporting

  • play Subtopic 3.1: Role of metadata in governance

  • play Subtopic 3.2: Enterprise metadata harvesting techniques

  • play Subtopic 3.3: Evaluating and deploying data catalogs

  • play Subtopic 3.4: Metadata synchronization across tools

  • play Subtopic 3.5: Organizing metadata taxonomies and glossaries

  • play Subtopic 4.1: Dimensions of data quality

  • play Subtopic 4.2: Techniques for profiling and validation

  • play Subtopic 4.3: Automating quality monitoring and alerts

  • play Subtopic 4.4: Root cause analysis of quality issues

  • play Subtopic 4.5: Reporting data health and metrics

  • play Subtopic 5.1: Regulatory frameworks: GDPR, CCPA, HIPAA, etc.

  • play Subtopic 5.2: Access control and data masking

  • play Subtopic 5.3: Creating and managing policy rules

  • play Subtopic 5.4: Auditing and forensic tracing

  • play Subtopic 5.5: Data retention and deletion policies

  • play Subtopic 6.1: Reference architecture for governance tooling

  • play Subtopic 6.2: Centralized vs federated governance

  • play Subtopic 6.3: APIs, connectors, and orchestration tools

  • play Subtopic 6.4: Governance across cloud-native and legacy systems

  • play Subtopic 6.5: Event-driven governance design

  • play Subtopic 7.1: Inserting governance checks in data flows

  • play Subtopic 7.2: Documenting ETL/ELT transformations

  • play Subtopic 7.3: Integration with Apache Airflow and dbt

  • play Subtopic 7.4: Tracking schema evolution and data changes

  • play Subtopic 7.5: Versioning pipelines for compliance

  • play Subtopic 8.1: Open-source tools: Apache Atlas, DataHub, Amundsen

  • play Subtopic 8.2: Commercial platforms: Collibra, Alation, Informatica

  • play Subtopic 8.3: Integration with cloud providers and data lakes

  • play Subtopic 8.4: Plugin architectures and extensibility

  • play Subtopic 8.5: Tool comparison matrix and best fit

  • play Subtopic 9.1: Defining governance rules in YAML/JSON

  • play Subtopic 9.2: Integrating with Git, Jenkins, and GitHub Actions

  • play Subtopic 9.3: Approval gates for metadata and access

  • play Subtopic 9.4: Infrastructure as code for policy automation

  • play Subtopic 9.5: CI/CD pipelines for governance assets

  • play Subtopic 10.1: Roles and responsibilities of data stewards

  • play Subtopic 10.2: Workflows for stewardship tasks

  • play Subtopic 10.3: Engaging business teams in governance

  • play Subtopic 10.4: Resolving data ownership conflicts

  • play Subtopic 10.5: Promoting a governance culture

  • play Subtopic 11.1: Governance in model training and deployment

  • play Subtopic 11.2: Capturing lineage in ML features and datasets

  • play Subtopic 11.3: Managing bias, drift, and explainability

  • play Subtopic 11.4: Versioning models and audit trails

  • play Subtopic 11.5: Securing synthetic and sensitive datasets

  • play Subtopic 12.1: Discovery and classification across cloud environments

  • play Subtopic 12.2: Unified policy enforcement

  • play Subtopic 12.3: Metadata sharing between platforms

  • play Subtopic 12.4: Avoiding silos in hybrid architectures

  • play Subtopic 12.5: Data fabric for consistent governance

  • play Subtopic 13.1: Defining governance performance metrics

  • play Subtopic 13.2: Building dashboards for lineage and quality

  • play Subtopic 13.3: SLA tracking for data pipelines

  • play Subtopic 13.4: Reporting for audits and regulators

  • play Subtopic 13.5: Alerting and notification strategies

  • play Subtopic 14.1: Designing data producer-consumer agreements

  • play Subtopic 14.2: Schema enforcement and compatibility testing

  • play Subtopic 14.3: Breaking change detection and rollback

  • play Subtopic 14.4: Automating SLAs in data flows

  • play Subtopic 14.5: Enabling reliable data sharing

  • play Subtopic 15.1: Establishing short- and long-term governance goals

  • play Subtopic 15.2: Prioritizing based on risk and value

  • play Subtopic 15.3: Change management and stakeholder buy-in

  • play Subtopic 15.4: Scaling governance programs

  • play Subtopic 15.5: Staying current with industry trends and tools

img

$ 2,000

Availability Calendar

Find a schedule that works for you. Click any available session to submit a booking.

Selected Session:
Delivery modes & Locations
This Programme Includes

Certificate of completion

Training manual

Reference materials

10 o'clock tea

Lunch

4 o'clock tea

Course Highlights
  • icon 10 Days Intensive Training

  • icon 15 Core Learning Topics

  • icon 10 Days Professional Sessions

  • icon Training Expert-led Delivery

PB Training Institute of Research and Consultancy
FAQs

Frequently Asked Questions

Explore detailed answers to the most common questions about our platform and services.

No questions available at the moment.