4.5 out of 5
4.5
7 reviews on Udemy

System Design for Big Data Pipelines

Analyze, Design and Build scalable, resilient and cost-effective Big Data pipelines with a methodical process
Instructor:
V2 Maestros, LLC
173 students enrolled
English [Auto]
Learn about the building blocks of a big data pipeline, their functions and challenges
Adapt an end-to-end methodical approach to designing a big data pipeline
Explore techniques to ensure overall scaling of a big data pipeline
Study design patterns for building blocks, their advantages, shortcomings, applications and available technologies
Focus additionally on Infrastructure, Operations and Security for Big Data deployments
Exercise the learnings in the course with a Batch and Realtime use case study

Big data technologies have been growing exponentially over the past few years and have penetrated into every domain and industry in software development. It has become a core skill for a software engineer. Robust and effective big data pipelines are needed to support the growing volume of data and applications in the big data world. These pipelines have become business critical and help increase revenues and reduce cost.

Do quality big data pipelines happen by magic? High quality designs that are scalable, reliable and cost effective are needed to build and maintain these pipelines.

How do you build an end-to-end big data pipeline that leverages big data technologies and practices effectively to solve business problems? How do you integrate them in a scalable and reliable manner? How do you deploy, secure and operate them? How do you look at the overall forest and not just the individual trees? This course focuses on this skill gap.

What are the topics covered in this course?

We start off by discussing the building blocks of big data pipelines, their functions and challenges.

We introduce a structured design process for building big data pipelines.

We then discuss individual building blocks, focusing on the design patterns available, their advantages, shortcomings, use cases and available technologies.

We recommend several best practices across the course.

We finally implement two use cases for illustration on how to apply the learnings in the course to a real world problem. One is a batch use case and another is a real time use case.

 

Introduction & Expectations

1
Need for Quality Pipeline Design

Discuss the need for quality pipeline design for big data pipelines. Explore the key activities in building such a design

2
Course Coverage and Pre-requisites

Familiarize with the covered topics, out-of-scope topics and pre-requisites for the course.

3
Cloud Serverless Technologies

Discuss how serverless technologies from cloud providers relate to the contents of this course.

Building Blocks for Big Data Pipelines

1
The Big Data Pipeline Network

Describe the overall pipeline network and the building blocks in the network

2
Data Acquisition Blocks

Discuss the features and challenges for the data acquisition block in a big data pipeline

3
Data Transport Blocks

Discuss the features and challenges for the data transport block in a big data pipeline

4
Data Processing Blocks

Discuss the features and challenges for the data processing block in a big data pipeline

5
Data Storage Blocks

Discuss the features and challenges for the data storage block in a big data pipeline

6
Data Serving Blocks

Discuss the features and challenges for the data serving block in a big data pipeline

7
Data Pipeline Infrastructure

Discuss the features and challenges for the pipeline infrastructure  in a big data pipeline

8
Data Pipeline Operations

Discuss the features and challenges for the  operations block in a big data pipeline

System Design Process

1
System Design Process Overview

Study the overall System Design Process to be followed for Big Data Pipeline Design

2
Analyze Functional Requirements

Explore the functional requirements provided for the use case and look for key indicators that require special attention for big data processing.

3
Analyze Pipeline Input

Analyze the input data to the big data pipeline to understand various characteristics like format, protocol and availability schedules

4
Analyze Non-functional Requirements

Analyze the non-functional requirements for the big data pipelines, especially those that relate to big data like scalability and fault tolerance

5
Draw a Pipeline Flowchart

Create a pipeline flowchart that captures the steps and workflow needed to convert inputs to outputs

6
Create a Skeleton Design

Add Big Data specific patterns and techniques to the flowchart and create a skeleton design

7
Analyze Scaling

Analyze scaling of the skeleton architecture to ensure horizontal scalability and detect bottlenecks.

8
Select Technologies

Choose the right technologies for the building blocks used in the solution

9
Design Infrastructure and Operations

Design infrastructure, Security and Serviceability for the big data pipeline

10
Develop a Test Strategy

Create a test strategy for testing the big data pipeline that covers regression, scaling and automation

Scalable Pipelines - Design Principles

1
Batch vs Realtime Pipelines

Compare the characteristics of Batch Pipelines and Realtime Pipelines and analyze suitability for use cases

2
Distributed Architectures

Distributed Architectures help ensure horizontal scalability for handling big data traffic. Discuss the key features and levers for distributed architectures

3
Microservices based Architectures

The principles of Microservices architectures still apply when designing big data pipelines. Explore key principles and how they apply to big data pipelines.

4
Batch Pipelines - Best Practices

Discuss key best practices when designing batch big data pipelines

5
Realtime Pipelines - Best Practices

Discuss key design practices when designing realtime big data pipelines

6
Performance Benchmarking for Big Data Pipelines

Explore the options for benchmarking performance for a big data pipeline

Data Acquisition Design

1
File Transfer Pattern

Analyze the File Transfer Pattern for Acquisition. Discuss its advantages, shortcomings, use cases and availability technologies.

2
Extraction Client Pattern

Analyze the Extraction Client Pattern for Acquisition. Discuss its advantages, shortcomings, use cases and availability technologies.

3
Ingestion API Pattern

Analyze the Ingestion API Pattern for Acquisition. Discuss its advantages, shortcomings, use cases and availability technologies.

4
Pub Sub Acquisition Pattern

Analyze the Pub Sub Pattern for Acquisition. Discuss its advantages, shortcomings, use cases and availability technologies.

5
Data Acquisition Design Practices

Explore Design Best Practices for Big Data Acquisition

Data Transport Design

1
Extract Load Pattern

Analyze the Extract Load Pattern for Data Transport. Discuss its advantages, shortcomings, use cases and availability technologies.

2
Request Response Pattern

Analyze the Request Response Pattern for Data Transport. Discuss its advantages, shortcomings, use cases and availability technologies.

3
Event Streaming Pattern

Analyze the Event Streaming Pattern for Data Transport. Discuss its advantages, shortcomings, use cases and availability technologies.

4
Data Transport Design Practices

Explore some Best Practices for Big Data Transport Design

Data Processing & Transformation Design

1
Data Processing Patterns

Explore several Data Processing Patterns that can be used for Big Data Processing Design.

2
Distributed Processing with Big Data

Study how Big Data Processing Engines work behind the scenes to process data in a horizontally scalable manner

3
Batch Processing Design Practices - Part 1

Discuss best practices for designing batch processing jobs for big data processing

4
Batch Processing Design Practices - Part 2

Discuss best practices for designing batch processing jobs for big data processing

5
Stream Processing Design Practices

Discuss best practices for designing stream processing jobs for big data processing

6
Batch vs Realtime Processing

Study the differences between batch and realtime when it comes to processing jobs. Explore how design changes based on this criteria

7
Input and Output Considerations for Processing

Discuss the importance and techniques for reading inputs and writing outputs in a scalable manner inside a  processing job

8
Processing Engine Technologies

Compare popular processing engine technologies available in the market today.

Storage Design

1
Distributed File System Pattern

Analyze the Distributed File System Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.

2
Relational Database Pattern

Analyze the Relational Database Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.

3
Document Database Pattern

Analyze the Document Database Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.

4
Columnar Database Pattern

Analyze the Columnar Database Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.

5
Graph Database Pattern

Analyze the Graph Database Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.

6
Distributed Cache Pattern

Analyze the Distributed Cache Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.

7
Data Storage Design Practices - 1

Discuss Data Storage Best Practices when building big data pipelines

8
Data Storage Design Practices - 2

Discuss Data Storage Best Practices when building big data pipelines

Serving Design

1
Query Interface Pattern

Analyze the Query Interface Pattern for Data Serving. Discuss its advantages, shortcomings, use cases and availability technologies.

2
Serving API Pattern

Analyze the Serving API Pattern for Data Serving. Discuss its advantages, shortcomings, use cases and availability technologies.

3
Push Client Pattern

Analyze the Push Client Pattern for Data Serving. Discuss its advantages, shortcomings, use cases and availability technologies.

4
Publish Subscribe Pattern

Analyze the Publish Subscribe Pattern for Data Serving. Discuss its advantages, shortcomings, use cases and availability technologies.

5
Data Serving Design Practices

Discuss Best Practices for Data Serving when building big data pipelines

Infrastructure and Deployments

1
Infrastructure Technologies

Discuss the infrastructure technologies available for deploying and operating big data technologies

2
Microservices Deployments

Use the microservices deployment patterns for building and deploying building blocks in a big data pipeline

3
Processing Jobs Deployments

Discuss the deployment options for deploying processing jobs in a big data pipeline. Compare their benefits and use cases

4
Databases and Queues Deployments

Discuss the deployment options for deploying databases and queues in a big data pipeline. Compare their benefits and use cases

5
Geographical Distribution

Review the use cases where geographically distributed pipelines are needed. Discuss some best practices for the same

Security

1
Pipeline Security by Design

Review the principles of building security by design into big data pipelines

2
Secure External Interfaces

Explore the options and best practices for securing external interfaces in a big data pipeline

3
Secure Data Storage

Explore the options and best practices for securing data storage in a big data pipeline

4
Privacy Considerations

Review the privacy considerations while dealing with data inside a pipeline and the best practices to protect private data

5
Multi-Tenancy Considerations

Discuss the implications on data security and privacy when building multi-tenant applications. Review some best practices for securing multi-tenant applications

Serviceability

1
Elements of Serviceability

Review the elements of building end-to-end serviceability in the big data pipeline

2
Monitoring Pipelines

Explore the components and workflow in a typical monitoring pipeline

3
Data to Monitor

Discuss several types of metrics that need to be collected and monitored when operating a big data pipeline

4
Pipeline Troubleshooting

Discuss the types of problems encountered when operating a big data pipeline. Review the best practices for dealing with those issues

Use Case I : Customer Journey Analytics (CJA)

1
Problem Definition for CJA

Define the business problem to solve for the Customer Journey Analytics use case

2
Study CJA Functional Requirements

Study the requirements for the CJA use case to understand its inputs, outputs and processing requirements

3
Analyze CJA Input Data

Analyze the Input data for the use case to understand its format, protocol and availability

4
Study CJA Non-Functional Requirements

Study the non-functional requirements for the CJA use case to understand elements like scalability, security and resiliency

5
Study CJA Pipeline Flowchart

Draw a Pipeline Flowchart for the CJA use case, to design the data flow and processing aspects

6
Create CJA Skeleton Design

Create a Skeleton Design for the CJA use case using the flowchart. Add design elements for Big Data patterns and integrations

7
Analyze CJA Scaling

Analyze the CJA Skeleton architecture to ensue that the pipeline is horizontally scalable end-to-end. Look for potential bottlenecks

8
Select Technologies for CJA

Select technologies for the building blocks used in the CJA pipeline. Use the selection criteria table to compare alternatives and choose the right technology

9
Design Infrastructure and Operations for CJA

Design the deployment patterns, security measures and serviceability elements for the CJA pipeline

Use Case II : Suspicious Login Alerting (SLA)

1
Problem Definition for SLA

Define the problem for the Suspicious Login Alerting Use Case

2
Study SLA Functional Requirements

Study the Functional requirements for the SLA use case, including its inputs, outputs and processing requirements

3
Analyze SLA Input Data

Analyze the Input Data for the SLA use case to understand its format, source, protocol and limitations

4
Study SLA Non-Functional Requirements

Explore the non-functional requirements for the SLA use case, to understand the scalability, security and other needs

5
Draw SLA Pipeline Flowchart

Use the requirements to draw a pipeline flowchart for the SLA use case, capturing the workflow and data processing steps

6
Create SLA Skeleton Design

Enhance the pipeline flowchart for the SLA use case, by adding patterns and scalable techniques to create a skeleton design

You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.5
4.5 out of 5
7 Ratings

Detailed Rating

Stars 5
3
Stars 4
4
Stars 3
0
Stars 2
0
Stars 1
0
d44d58472d4adab983109079ff06f72e
30-Day Money-Back Guarantee

Includes

7 hours on-demand video
Full lifetime access
Access on mobile and TV
Certificate of Completion