Programme Overview
Training Description
Who Should Attend
This course is designed for;
1. Data Engineers building scalable data infrastructure
2. Database Administrators transitioning to NoSQL platforms
3. Backend Developers implementing NoSQL-driven applications
4. DevOps Engineers managing distributed systems
5. Cloud Engineers deploying NoSQL services at scale
6. Data Architects designing flexible and performant schemas
7. ETL Developers optimizing data ingestion into NoSQL systems
8. Full-Stack Developers integrating NoSQL into backend APIs
Session Objectives
- Understand the core principles of NoSQL database types and architectures
- Gain deep hands-on experience with MongoDB and Apache Cassandra
- Design high-performance NoSQL schemas for real-world applications
- Implement advanced indexing, sharding, and replication strategies
- Integrate NoSQL databases with modern data pipelines
- Optimize NoSQL query performance and throughput
- Ensure data consistency and availability across distributed clusters
- Monitor and maintain large-scale NoSQL infrastructure
- Secure NoSQL deployments with proper access control and encryption
- Compare and evaluate NoSQL options for specific business needs
- Build resilient, cloud-native NoSQL-powered systems
About the Course
As data engineering increasingly demands handling diverse, large-scale, and high-velocity data, mastering NoSQL databases becomes crucial. This Advanced NoSQL for Data Engineers (MongoDB, Cassandra, etc.) training course equips participants with practical and in-depth expertise to design, optimize, and manage distributed NoSQL systems used in modern big data pipelines. Through a hands-on approach, participants will explore schema design, indexing, querying, replication, sharding, and performance tuning with leading NoSQL platforms such as MongoDB, Apache Cassandra, Redis, and others. Ideal for professionals building real-time analytics platforms, microservices backends, or IoT-scale infrastructure, this course ensures engineers are fully capable of leveraging NoSQL technologies for scalability, flexibility, and high availability.
General Notes
- This course will be delivered by our experts and professionals in data analysis with vast experience. The workshop will be highly interactive
- Training manuals and additional reference materials are provided to the participants.
- Upon successful completion of this course, participants will be issued with a certificate.
- The training will be conducted by PB Institute of Research and Technology
- The training fee covers tuition fees, training materials, lunch and training venue. Accommodation and airport transfer are arranged for our participants upon request.
Curriculum & Topics
15 Topics | 10 Days
-
Subtopic 1.1: Understanding the evolution from relational to NoSQL databases
-
Subtopic 1.2: Classification: key-value, document, wide-column, and graph databases
-
Subtopic 1.3: Use cases and advantages of NoSQL over traditional RDBMS
-
Subtopic 1.4: CAP theorem and its implications on NoSQL systems
-
Subtopic 1.5: Overview of popular NoSQL platforms and ecosystems
-
Subtopic 2.1: Document data model and BSON structure
-
Subtopic 2.2: CRUD operations with MongoDB shell and drivers
-
Subtopic 2.3: Indexing strategies for performance optimization
-
Subtopic 2.4: Schema design best practices for flexible structures
-
Subtopic 2.5: Aggregation framework and pipeline patterns
-
Subtopic 3.1: Replication and replica set configuration
-
Subtopic 3.2: Sharding and horizontal scaling techniques
-
Subtopic 3.3: Transactions and ACID compliance in MongoDB
-
Subtopic 3.4: Backup, restore, and disaster recovery planning
-
Subtopic 3.5: Performance tuning and profiling queries
-
Subtopic 4.1: Introduction to wide-column data models
-
Subtopic 4.2: Understanding Cassandra architecture and write path
-
Subtopic 4.3: Key spaces, tables, and CQL (Cassandra Query Language)
-
Subtopic 4.4: Partitions and clustering for distributed data
-
Subtopic 4.5: Data modeling patterns for time series and events
-
Subtopic 5.1: Replication strategies and consistency levels
-
Subtopic 5.2: Read/write performance optimization
-
Subtopic 5.3: Compaction, caching, and garbage collection settings
-
Subtopic 5.4: Using nodetool and cqlsh for cluster management
-
Subtopic 5.5: Monitoring metrics and alerts in production
-
Subtopic 6.1: Key-value store concepts with Redis
-
Subtopic 6.2: Use cases: caching, pub/sub, real-time counters
-
Subtopic 6.3: Data structures: sets, lists, sorted sets, hashes
-
Subtopic 6.4: Persistence options and memory optimization
-
Subtopic 6.5: Redis Cluster and Sentinel configuration
-
Subtopic 7.1: Designing for reads vs writes in NoSQL systems
-
Subtopic 7.2: Denormalization and embedded document strategies
-
Subtopic 7.3: Modeling one-to-many and many-to-many relationships
-
Subtopic 7.4: Choosing partition keys and avoiding hotspots
-
Subtopic 7.5: Trade-offs between flexibility and consistency
-
Subtopic 8.1: Indexing techniques and query planners
-
Subtopic 8.2: Aggregation tuning and pipeline optimization
-
Subtopic 8.3: Query profiling tools in MongoDB and Cassandra
-
Subtopic 8.4: Latency reduction and throughput scaling
-
Subtopic 8.5: Identifying and resolving anti-patterns
-
Subtopic 9.1: Authentication and role-based access in MongoDB
-
Subtopic 9.2: Secure client connections using TLS/SSL
-
Subtopic 9.3: Auditing and activity logging
-
Subtopic 9.4: Data encryption at rest and in transit
-
Subtopic 9.5: Security hardening of NoSQL clusters
-
Subtopic 10.1: Snapshot-based backup strategies
-
Subtopic 10.2: Point-in-time recovery techniques
-
Subtopic 10.3: Cluster failover and leader election
-
Subtopic 10.4: Data migration across environments
-
Subtopic 10.5: Ensuring uptime with distributed replication
-
Subtopic 11.1: Connecting NoSQL systems with Apache Kafka and Spark
-
Subtopic 11.2: Streaming data ingestion from microservices
-
Subtopic 11.3: ETL workflows with NoSQL as sink or source
-
Subtopic 11.4: Data enrichment and transformation patterns
-
Subtopic 11.5: Real-time analytics architecture
-
Subtopic 12.1: Prometheus and Grafana for MongoDB/Cassandra metrics
-
Subtopic 12.2: Query performance dashboards and latency tracking
-
Subtopic 12.3: Disk I/O, memory, and CPU usage monitoring
-
Subtopic 12.4: Alerting strategies for cluster health
-
Subtopic 12.5: Log aggregation and analysis
-
Subtopic 13.1: Managed services (MongoDB Atlas, Amazon Keyspaces)
-
Subtopic 13.2: Infrastructure as Code for provisioning clusters
-
Subtopic 13.3: Auto-scaling and load balancing
-
Subtopic 13.4: Cost optimization strategies in cloud environments
-
Subtopic 13.5: Multi-region replication and latency considerations
-
Subtopic 14.1: Benchmarking MongoDB vs Cassandra vs Redis
-
Subtopic 14.2: Selecting the right NoSQL database by workload
-
Subtopic 14.3: Hybrid architectures with SQL + NoSQL
-
Subtopic 14.4: Polyglot persistence strategies
-
Subtopic 14.5: Business case evaluation
-
Subtopic 15.1: Designing a high-throughput document and key-value data store
-
Subtopic 15.2: Implementing schema, indexing, and access controls
-
Subtopic 15.3: Integrating the system with real-time ingestion pipelines
-
Subtopic 15.4: Monitoring, securing, and deploying the solution
-
Subtopic 15.5: Presenting project outcomes and scalability plans