4.5 out of 5

4.5

7 reviews on Udemy

System Design for Big Data Pipelines

Analyze, Design and Build scalable, resilient and cost-effective Big Data pipelines with a methodical process

Instructor:

V2 Maestros, LLC

173 students enrolled

English [Auto]

Learn about the building blocks of a big data pipeline, their functions and challenges

Adapt an end-to-end methodical approach to designing a big data pipeline

Explore techniques to ensure overall scaling of a big data pipeline

Study design patterns for building blocks, their advantages, shortcomings, applications and available technologies

Focus additionally on Infrastructure, Operations and Security for Big Data deployments

Exercise the learnings in the course with a Batch and Realtime use case study

Big data technologies have been growing exponentially over the past few years and have penetrated into every domain and industry in software development. It has become a core skill for a software engineer. Robust and effective big data pipelines are needed to support the growing volume of data and applications in the big data world. These pipelines have become business critical and help increase revenues and reduce cost.

Do quality big data pipelines happen by magic? High quality designs that are scalable, reliable and cost effective are needed to build and maintain these pipelines.

How do you build an end-to-end big data pipeline that leverages big data technologies and practices effectively to solve business problems? How do you integrate them in a scalable and reliable manner? How do you deploy, secure and operate them? How do you look at the overall forest and not just the individual trees? This course focuses on this skill gap.

What are the topics covered in this course?

We start off by discussing the building blocks of big data pipelines, their functions and challenges.

We introduce a structured design process for building big data pipelines.

We then discuss individual building blocks, focusing on the design patterns available, their advantages, shortcomings, use cases and available technologies.

We recommend several best practices across the course.

We finally implement two use cases for illustration on how to apply the learnings in the course to a real world problem. One is a batch use case and another is a real time use case.

Introduction & Expectations

Need for Quality Pipeline Design

Discuss the need for quality pipeline design for big data pipelines. Explore the key activities in building such a design

Course Coverage and Pre-requisites

Familiarize with the covered topics, out-of-scope topics and pre-requisites for the course.

Cloud Serverless Technologies

Discuss how serverless technologies from cloud providers relate to the contents of this course.

Building Blocks for Big Data Pipelines

The Big Data Pipeline Network

Describe the overall pipeline network and the building blocks in the network

Data Acquisition Blocks

Discuss the features and challenges for the data acquisition block in a big data pipeline

Data Transport Blocks

Discuss the features and challenges for the data transport block in a big data pipeline

Data Processing Blocks

Discuss the features and challenges for the data processing block in a big data pipeline

Data Storage Blocks

Discuss the features and challenges for the data storage block in a big data pipeline

Data Serving Blocks

Discuss the features and challenges for the data serving block in a big data pipeline

Data Pipeline Infrastructure

Discuss the features and challenges for the pipeline infrastructure in a big data pipeline

Data Pipeline Operations

Discuss the features and challenges for the operations block in a big data pipeline

System Design Process

System Design Process Overview

Study the overall System Design Process to be followed for Big Data Pipeline Design

Analyze Functional Requirements

Explore the functional requirements provided for the use case and look for key indicators that require special attention for big data processing.

Analyze Pipeline Input

Analyze the input data to the big data pipeline to understand various characteristics like format, protocol and availability schedules

Analyze Non-functional Requirements

Analyze the non-functional requirements for the big data pipelines, especially those that relate to big data like scalability and fault tolerance

Draw a Pipeline Flowchart

Create a pipeline flowchart that captures the steps and workflow needed to convert inputs to outputs

Create a Skeleton Design

Add Big Data specific patterns and techniques to the flowchart and create a skeleton design

Analyze Scaling

Analyze scaling of the skeleton architecture to ensure horizontal scalability and detect bottlenecks.

Select Technologies

Choose the right technologies for the building blocks used in the solution

Design Infrastructure and Operations

Design infrastructure, Security and Serviceability for the big data pipeline

Develop a Test Strategy

Create a test strategy for testing the big data pipeline that covers regression, scaling and automation

Scalable Pipelines - Design Principles

Batch vs Realtime Pipelines

Compare the characteristics of Batch Pipelines and Realtime Pipelines and analyze suitability for use cases

Distributed Architectures

Distributed Architectures help ensure horizontal scalability for handling big data traffic. Discuss the key features and levers for distributed architectures

Microservices based Architectures

The principles of Microservices architectures still apply when designing big data pipelines. Explore key principles and how they apply to big data pipelines.

Batch Pipelines - Best Practices

Discuss key best practices when designing batch big data pipelines

Realtime Pipelines - Best Practices

Discuss key design practices when designing realtime big data pipelines

Performance Benchmarking for Big Data Pipelines

Explore the options for benchmarking performance for a big data pipeline

Data Acquisition Design

File Transfer Pattern

Analyze the File Transfer Pattern for Acquisition. Discuss its advantages, shortcomings, use cases and availability technologies.

Extraction Client Pattern

Analyze the Extraction Client Pattern for Acquisition. Discuss its advantages, shortcomings, use cases and availability technologies.

Ingestion API Pattern

Analyze the Ingestion API Pattern for Acquisition. Discuss its advantages, shortcomings, use cases and availability technologies.

Pub Sub Acquisition Pattern

Analyze the Pub Sub Pattern for Acquisition. Discuss its advantages, shortcomings, use cases and availability technologies.

Data Acquisition Design Practices

Explore Design Best Practices for Big Data Acquisition

Data Transport Design

Extract Load Pattern

Analyze the Extract Load Pattern for Data Transport. Discuss its advantages, shortcomings, use cases and availability technologies.

Request Response Pattern

Analyze the Request Response Pattern for Data Transport. Discuss its advantages, shortcomings, use cases and availability technologies.

Event Streaming Pattern

Analyze the Event Streaming Pattern for Data Transport. Discuss its advantages, shortcomings, use cases and availability technologies.

Data Transport Design Practices

Explore some Best Practices for Big Data Transport Design

Data Processing & Transformation Design

Data Processing Patterns

Explore several Data Processing Patterns that can be used for Big Data Processing Design.

Distributed Processing with Big Data

Study how Big Data Processing Engines work behind the scenes to process data in a horizontally scalable manner

Batch Processing Design Practices - Part 1

Discuss best practices for designing batch processing jobs for big data processing

Batch Processing Design Practices - Part 2

Discuss best practices for designing batch processing jobs for big data processing

Stream Processing Design Practices

Discuss best practices for designing stream processing jobs for big data processing

Batch vs Realtime Processing

Study the differences between batch and realtime when it comes to processing jobs. Explore how design changes based on this criteria

Input and Output Considerations for Processing

Discuss the importance and techniques for reading inputs and writing outputs in a scalable manner inside a processing job

Processing Engine Technologies

Compare popular processing engine technologies available in the market today.

Storage Design

Distributed File System Pattern

Analyze the Distributed File System Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.

Relational Database Pattern

Analyze the Relational Database Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.

Document Database Pattern

Analyze the Document Database Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.

Columnar Database Pattern

Analyze the Columnar Database Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.

Graph Database Pattern

Analyze the Graph Database Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.

Distributed Cache Pattern

Analyze the Distributed Cache Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.

Data Storage Design Practices - 1

Discuss Data Storage Best Practices when building big data pipelines

Data Storage Design Practices - 2

Discuss Data Storage Best Practices when building big data pipelines

Serving Design

Query Interface Pattern

Analyze the Query Interface Pattern for Data Serving. Discuss its advantages, shortcomings, use cases and availability technologies.

Serving API Pattern

Analyze the Serving API Pattern for Data Serving. Discuss its advantages, shortcomings, use cases and availability technologies.

Push Client Pattern

Analyze the Push Client Pattern for Data Serving. Discuss its advantages, shortcomings, use cases and availability technologies.

Publish Subscribe Pattern

Analyze the Publish Subscribe Pattern for Data Serving. Discuss its advantages, shortcomings, use cases and availability technologies.

Data Serving Design Practices

Discuss Best Practices for Data Serving when building big data pipelines

Infrastructure and Deployments

Infrastructure Technologies

Discuss the infrastructure technologies available for deploying and operating big data technologies

Microservices Deployments

Use the microservices deployment patterns for building and deploying building blocks in a big data pipeline

Processing Jobs Deployments

Discuss the deployment options for deploying processing jobs in a big data pipeline. Compare their benefits and use cases

Databases and Queues Deployments

Discuss the deployment options for deploying databases and queues in a big data pipeline. Compare their benefits and use cases

Geographical Distribution

Review the use cases where geographically distributed pipelines are needed. Discuss some best practices for the same

Security

Pipeline Security by Design

Review the principles of building security by design into big data pipelines

Secure External Interfaces

Explore the options and best practices for securing external interfaces in a big data pipeline

Secure Data Storage

Explore the options and best practices for securing data storage in a big data pipeline

Privacy Considerations

Review the privacy considerations while dealing with data inside a pipeline and the best practices to protect private data

Multi-Tenancy Considerations

Discuss the implications on data security and privacy when building multi-tenant applications. Review some best practices for securing multi-tenant applications

Serviceability

Elements of Serviceability

Review the elements of building end-to-end serviceability in the big data pipeline

Monitoring Pipelines

Explore the components and workflow in a typical monitoring pipeline

Data to Monitor

Discuss several types of metrics that need to be collected and monitored when operating a big data pipeline

Pipeline Troubleshooting

Discuss the types of problems encountered when operating a big data pipeline. Review the best practices for dealing with those issues

Use Case I : Customer Journey Analytics (CJA)

Problem Definition for CJA

Define the business problem to solve for the Customer Journey Analytics use case

Study CJA Functional Requirements

Study the requirements for the CJA use case to understand its inputs, outputs and processing requirements

Analyze CJA Input Data

Analyze the Input data for the use case to understand its format, protocol and availability

Study CJA Non-Functional Requirements

Study the non-functional requirements for the CJA use case to understand elements like scalability, security and resiliency

Study CJA Pipeline Flowchart

Draw a Pipeline Flowchart for the CJA use case, to design the data flow and processing aspects

Create CJA Skeleton Design

Create a Skeleton Design for the CJA use case using the flowchart. Add design elements for Big Data patterns and integrations

Analyze CJA Scaling

Analyze the CJA Skeleton architecture to ensue that the pipeline is horizontally scalable end-to-end. Look for potential bottlenecks

Select Technologies for CJA

Select technologies for the building blocks used in the CJA pipeline. Use the selection criteria table to compare alternatives and choose the right technology

Design Infrastructure and Operations for CJA

Design the deployment patterns, security measures and serviceability elements for the CJA pipeline

Use Case II : Suspicious Login Alerting (SLA)

Problem Definition for SLA

Define the problem for the Suspicious Login Alerting Use Case

Study SLA Functional Requirements

Study the Functional requirements for the SLA use case, including its inputs, outputs and processing requirements

Analyze SLA Input Data

Analyze the Input Data for the SLA use case to understand its format, source, protocol and limitations

Study SLA Non-Functional Requirements

Explore the non-functional requirements for the SLA use case, to understand the scalability, security and other needs

Draw SLA Pipeline Flowchart

Use the requirements to draw a pipeline flowchart for the SLA use case, capturing the workflow and data processing steps

Create SLA Skeleton Design

Enhance the pipeline flowchart for the SLA use case, by adding patterns and scalable techniques to create a skeleton design

How long do I have access to the course materials?

You can view and review the lecture materials indefinitely, like an on-demand channel.

Can I take my courses with me wherever I go?

Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!

4.5

4.5 out of 5

7 Ratings

Detailed Rating

Stars 5		3
Stars 4		4
Stars 3		0
Stars 2		0
Stars 1		0

🤝 7-DAY FREE TRIAL MEMBERSHIP

You’re Welcome, Sign up and get a 7-day Trial-free Membership Package.

Do you need help with
anything?

Receive updates, hot deals, tutorials, discounts sent straignt in your inbox every month