Programme Overview
Training Description
Who Should Attend
This course is ideal for;
- Cloud Data Engineers
- DevOps Engineers working with data infrastructure
- Big Data Developers
- Cloud Architects
- DataOps and MLOps Professionals
- Technical Project Managers
- Enterprise Data Platform Engineers
- Analytics Engineers
Session Objectives
- Understand the fundamentals of serverless computing and its benefits for data engineering
- Learn how to design serverless data pipelines for batch and streaming data
- Explore integration with cloud-native services for data ingestion, transformation, and output
- Build event-driven architectures that respond to data triggers efficiently
- Master monitoring, logging, and alerting in serverless environments
- Apply cost optimization strategies for serverless workflows
- Ensure scalability, fault-tolerance, and reliability in serverless pipelines
- Implement real-time analytics and data lake ingestion
- Enforce security and governance across serverless components
- Use infrastructure as code to manage and automate deployments
- Gain hands-on experience through practical labs and a capstone project
About the Course
In today’s cloud-native era, Serverless Data Engineering is revolutionizing how organizations build and manage data workflows. By eliminating the need to provision and manage infrastructure, serverless architectures empower data engineers to focus on designing highly scalable, resilient, and cost-efficient pipelines. This course equips participants with the skills to build event-driven data processing systems using serverless technologies such as AWS Lambda, Azure Functions, Google Cloud Functions, and more. Learners will explore the best practices for orchestration, data ingestion, real-time processing, monitoring, and governance in a serverless environment, enabling them to accelerate innovation and reduce operational complexity.
Curriculum & Topics
15 Topics | 10 Days
-
Subtopic 1.1: Overview of serverless computing
-
Subtopic 1.2: Benefits and challenges for data processing
-
Subtopic 1.3: Key cloud provider offerings: AWS, GCP, Azure
-
Subtopic 1.4: Serverless vs. container-based architectures
-
Subtopic 1.5: Use cases in modern data engineering
-
Subtopic 2.1: AWS Lambda, Azure Functions, Google Cloud Functions
-
Subtopic 2.2: Function lifecycle and execution model
-
Subtopic 2.3: Writing and deploying serverless functions
-
Subtopic 2.4: Managing concurrency and limits
-
Subtopic 2.5: Using frameworks like Serverless Framework and SAM
-
Subtopic 3.1: Sources of events: file drops, API calls, queues
-
Subtopic 3.2: Triggering pipelines on data events
-
Subtopic 3.3: Designing event producers and consumers
-
Subtopic 3.4: Ensuring idempotency and retry strategies
-
Subtopic 3.5: Chaining and fan-out patterns
-
Subtopic 4.1: Using Kinesis, Pub/Sub, and Event Hubs
-
Subtopic 4.2: Handling structured and unstructured data
-
Subtopic 4.3: Real-time vs. batch ingestion
-
Subtopic 4.4: Validating and transforming incoming data
-
Subtopic 4.5: Integrating with APIs and third-party data sources
-
Subtopic 5.1: Using Amazon S3, Azure Blob, and GCS
-
Subtopic 5.2: Object lifecycle management and versioning
-
Subtopic 5.3: Event notifications on file uploads
-
Subtopic 5.4: Data partitioning and organization
-
Subtopic 5.5: Storage security and access policies
-
Subtopic 6.1: Stream processing with Lambda and Kinesis
-
Subtopic 6.2: Aggregation and windowing techniques
-
Subtopic 6.3: Handling late-arriving and out-of-order data
-
Subtopic 6.4: Delivering processed results to sinks
-
Subtopic 6.5: Combining stream and batch processing
-
Subtopic 7.1: AWS Step Functions, Azure Durable Functions
-
Subtopic 7.2: Defining state machines and workflows
-
Subtopic 7.3: Error handling and retries in orchestration
-
Subtopic 7.4: Chaining multi-step pipelines
-
Subtopic 7.5: Visualizing and monitoring execution
-
Subtopic 8.1: Querying with Athena, BigQuery, Synapse Serverless
-
Subtopic 8.2: ETL and ELT design in serverless context
-
Subtopic 8.3: Leveraging Glue and Dataflow for transformation
-
Subtopic 8.4: Schema inference and metadata cataloging
-
Subtopic 8.5: Using PySpark and SQL for data prep
-
Subtopic 9.1: Logging with CloudWatch, Stackdriver, Azure Monitor
-
Subtopic 9.2: Tracing and profiling functions
-
Subtopic 9.3: Creating metrics and dashboards
-
Subtopic 9.4: Handling cold starts and latency issues
-
Subtopic 9.5: Alerting and anomaly detection
-
Subtopic 10.1: Understanding billing for serverless workloads
-
Subtopic 10.2: Reducing invocations and execution time
-
Subtopic 10.3: Managing data transfer costs
-
Subtopic 10.4: Setting budgets and usage alerts
-
Subtopic 10.5: Comparing serverless vs. managed alternatives
-
Subtopic 11.1: IAM policies for least privilege
-
Subtopic 11.2: Securing secrets and API keys
-
Subtopic 11.3: Encrypting data at rest and in transit
-
Subtopic 11.4: Managing authentication and authorization
-
Subtopic 11.5: Reviewing serverless security best practices
-
Subtopic 12.1: Using Terraform and CloudFormation
-
Subtopic 12.2: Creating reproducible deployments
-
Subtopic 12.3: Versioning infrastructure and code
-
Subtopic 12.4: Managing environments and secrets
-
Subtopic 12.5: Automated testing and validation
-
Subtopic 13.1: Creating APIs using API Gateway and Lambda
-
Subtopic 13.2: Designing REST and GraphQL endpoints
-
Subtopic 13.3: Rate limiting and throttling
-
Subtopic 13.4: Integrating with data stores
-
Subtopic 13.5: Securing APIs with OAuth and tokens
-
Subtopic 14.1: Deploying lightweight models with serverless
-
Subtopic 14.2: Triggering predictions on data events
-
Subtopic 14.3: Integrating with SageMaker, Vertex AI, ML.NET
-
Subtopic 14.4: Streaming inference vs. batch inference
-
Subtopic 14.5: Scaling ML workloads with autoscaling functions
-
Subtopic 15.1: Building a full serverless data pipeline
-
Subtopic 15.2: Real-world case studies from e-commerce and finance
-
Subtopic 15.3: End-to-end implementation and demo
-
Subtopic 15.4: Troubleshooting and final optimization
-
Subtopic 15.5: Presentation and feedback session