Serverless Data Engineering Skills Course – Build Reliable Cloud Data Systems - Pebbles Institute of Research and Technology

Programme Overview

Training Description

Who Should Attend

This course is ideal for;

Cloud Data Engineers
DevOps Engineers working with data infrastructure
Big Data Developers
Cloud Architects
DataOps and MLOps Professionals
Technical Project Managers
Enterprise Data Platform Engineers
Analytics Engineers

Session Objectives

Understand the fundamentals of serverless computing and its benefits for data engineering
Learn how to design serverless data pipelines for batch and streaming data
Explore integration with cloud-native services for data ingestion, transformation, and output
Build event-driven architectures that respond to data triggers efficiently
Master monitoring, logging, and alerting in serverless environments
Apply cost optimization strategies for serverless workflows
Ensure scalability, fault-tolerance, and reliability in serverless pipelines
Implement real-time analytics and data lake ingestion
Enforce security and governance across serverless components
Use infrastructure as code to manage and automate deployments
Gain hands-on experience through practical labs and a capstone project

About the Course

In today’s cloud-native era, Serverless Data Engineering is revolutionizing how organizations build and manage data workflows. By eliminating the need to provision and manage infrastructure, serverless architectures empower data engineers to focus on designing highly scalable, resilient, and cost-efficient pipelines. This course equips participants with the skills to build event-driven data processing systems using serverless technologies such as AWS Lambda, Azure Functions, Google Cloud Functions, and more. Learners will explore the best practices for orchestration, data ingestion, real-time processing, monitoring, and governance in a serverless environment, enabling them to accelerate innovation and reduce operational complexity.

Curriculum & Topics

15 Topics | 10 Days

Subtopic 1.1: Overview of serverless computing
Subtopic 1.2: Benefits and challenges for data processing
Subtopic 1.3: Key cloud provider offerings: AWS, GCP, Azure
Subtopic 1.4: Serverless vs. container-based architectures
Subtopic 1.5: Use cases in modern data engineering

Subtopic 2.1: AWS Lambda, Azure Functions, Google Cloud Functions
Subtopic 2.2: Function lifecycle and execution model
Subtopic 2.3: Writing and deploying serverless functions
Subtopic 2.4: Managing concurrency and limits
Subtopic 2.5: Using frameworks like Serverless Framework and SAM

Subtopic 3.1: Sources of events: file drops, API calls, queues
Subtopic 3.2: Triggering pipelines on data events
Subtopic 3.3: Designing event producers and consumers
Subtopic 3.4: Ensuring idempotency and retry strategies
Subtopic 3.5: Chaining and fan-out patterns

Subtopic 4.1: Using Kinesis, Pub/Sub, and Event Hubs
Subtopic 4.2: Handling structured and unstructured data
Subtopic 4.3: Real-time vs. batch ingestion
Subtopic 4.4: Validating and transforming incoming data
Subtopic 4.5: Integrating with APIs and third-party data sources

Subtopic 5.1: Using Amazon S3, Azure Blob, and GCS
Subtopic 5.2: Object lifecycle management and versioning
Subtopic 5.3: Event notifications on file uploads
Subtopic 5.4: Data partitioning and organization
Subtopic 5.5: Storage security and access policies

Subtopic 6.1: Stream processing with Lambda and Kinesis
Subtopic 6.2: Aggregation and windowing techniques
Subtopic 6.3: Handling late-arriving and out-of-order data
Subtopic 6.4: Delivering processed results to sinks
Subtopic 6.5: Combining stream and batch processing

Subtopic 7.1: AWS Step Functions, Azure Durable Functions
Subtopic 7.2: Defining state machines and workflows
Subtopic 7.3: Error handling and retries in orchestration
Subtopic 7.4: Chaining multi-step pipelines
Subtopic 7.5: Visualizing and monitoring execution

Subtopic 8.1: Querying with Athena, BigQuery, Synapse Serverless
Subtopic 8.2: ETL and ELT design in serverless context
Subtopic 8.3: Leveraging Glue and Dataflow for transformation
Subtopic 8.4: Schema inference and metadata cataloging
Subtopic 8.5: Using PySpark and SQL for data prep

Subtopic 9.1: Logging with CloudWatch, Stackdriver, Azure Monitor
Subtopic 9.2: Tracing and profiling functions
Subtopic 9.3: Creating metrics and dashboards
Subtopic 9.4: Handling cold starts and latency issues
Subtopic 9.5: Alerting and anomaly detection

Subtopic 10.1: Understanding billing for serverless workloads
Subtopic 10.2: Reducing invocations and execution time
Subtopic 10.3: Managing data transfer costs
Subtopic 10.4: Setting budgets and usage alerts
Subtopic 10.5: Comparing serverless vs. managed alternatives

Subtopic 11.1: IAM policies for least privilege
Subtopic 11.2: Securing secrets and API keys
Subtopic 11.3: Encrypting data at rest and in transit
Subtopic 11.4: Managing authentication and authorization
Subtopic 11.5: Reviewing serverless security best practices

Subtopic 12.1: Using Terraform and CloudFormation
Subtopic 12.2: Creating reproducible deployments
Subtopic 12.3: Versioning infrastructure and code
Subtopic 12.4: Managing environments and secrets
Subtopic 12.5: Automated testing and validation

Subtopic 13.1: Creating APIs using API Gateway and Lambda
Subtopic 13.2: Designing REST and GraphQL endpoints
Subtopic 13.3: Rate limiting and throttling
Subtopic 13.4: Integrating with data stores
Subtopic 13.5: Securing APIs with OAuth and tokens

Subtopic 14.1: Deploying lightweight models with serverless
Subtopic 14.2: Triggering predictions on data events
Subtopic 14.3: Integrating with SageMaker, Vertex AI, ML.NET
Subtopic 14.4: Streaming inference vs. batch inference
Subtopic 14.5: Scaling ML workloads with autoscaling functions

Subtopic 15.1: Building a full serverless data pipeline
Subtopic 15.2: Real-world case studies from e-commerce and finance
Subtopic 15.3: End-to-end implementation and demo
Subtopic 15.4: Troubleshooting and final optimization
Subtopic 15.5: Presentation and feedback session

$ 3,000

Availability Calendar

Find a schedule that works for you. Click any available session to submit a booking.

Delivery modes & Locations

Nairobi (453)

On-Site (448)

This Programme Includes

Certificate of completion

Training manual

Reference materials

10 o'clock tea

Lunch

4 o'clock tea

Course Highlights

10 Days Intensive Training
15 Core Learning Topics
10 Days Professional Sessions
Training Expert-led Delivery

Serverless Data Engineering Training Course

Click to Register

Programme Overview

Training Description

Session Objectives

About the Course

Curriculum & Topics

$ 3,000

Availability Calendar

Delivery modes & Locations

This Programme Includes

Course Highlights

Frequently Asked Questions

Quick Links

Useful Links

Talk to Us

Serverless Data Engineering Training Course

Click to Register

Programme Overview

Training Description

Session Objectives

About the Course

Curriculum & Topics

Module 1: Introduction to Serverless Data Engineering

Module 2: Serverless Functions and Frameworks

Module 3: Event-Driven Architecture Patterns

Module 4: Data Ingestion in Serverless Pipelines

Module 5: Storage Services for Serverless Data

Module 6: Real-Time Data Processing with Serverless

Module 7: Serverless Orchestration and Workflow Management

Module 8: Serverless SQL and Data Transformation

Module 9: Monitoring and Observability

Module 10: Cost Optimization and Budgeting

Module 11: Security and Access Management

Module 12: Infrastructure as Code for Serverless

Module 13: Building Data APIs with Serverless

Module 14: Serverless Machine Learning Inference

Module 15: Capstone Project and Real-World Architectures

$ 3,000

Availability Calendar

Delivery modes & Locations

This Programme Includes

Course Highlights

Frequently Asked Questions