All Courses

Streaming Big Data with Spark Streaming, Scala, and Spark 3!

Streaming Big Data with Spark Streaming, Scala, and Spark 3! Course Site

Hands-on examples of processing massive streams of data – in real-time, on a cluster – with Apache Spark Streaming.

What you’ll learn

Streaming Big Data with Spark Streaming, Scala, and Spark 3! Course Site

  • Process massive streams of real-time data using Spark Streaming
  • Integrate Spark Streaming with data sources, including Kafka, Flume, and Kinesis
  • Use Spark 2’s Structured Streaming API
  • Create Spark applications using the Scala programming language
  • Output transformed real-time data to Cassandra or file systems
  • Integrate Spark Streaming with Spark SQL to query streaming data in real-time
  • Train machine learning models with streaming data, and use those models for real-time predictions
  • Ingest Apache access log data and transform streams of it
  • Receive real-time streams of Twitter feeds
  • Maintain stateful data across a continuous stream of input data
  • Query streaming data across sliding windows of time


  • To follow along with the examples, you’ll need a personal computer.
  • We’ll walk through installing the required software in the first lecture: The Scala IDE, Spark, and a JDK.
  • My “Taming Big Data with Apache Spark – Hands On!”.
  • The course includes a crash course in the Scala programming language if you’re new to it; if you already know Scala, then great.


New! Updated for Spark 3.0.0!

“Big Data” analysis is a hot and highly valuable skill. Thing is, “big data” never stops flowing!

You’ll be learning from an ex-engineer and senior manager from Amazon and IMDb.

This course gets your hands on to some real live Twitter data, simulated streams of Apache access logs, and even data used to train machine learning models! You’ll write and run real Spark Streaming jobs right at home on your own PC, and toward the end of the course, we’ll show you how to take those jobs to a real Hadoop cluster and run them in a production environment too.

Across over 30 lectures and almost 6 hours of video content, you’ll:

  • Get a crash course in the Scala programming language
  • Learn how Apache Spark operates on a cluster
  • Use structured streaming to stream into data frames in real-time
  • Analyze streaming data over sliding windows of time
  • Maintain stateful information across streams of data
  • Connect Spark Streaming with highly scalable sources of data, including KafkaFlume, and Kinesis
  • Dump streams of data in real-time to NoSQL databases such as Cassandra
  • Run SQL queries on streamed data in real-time
  • Train machine learning models in real-time with streaming data, and use them to make predictions that keep getting better over time
  • Package, deploy and run self-contained Spark Streaming code to a real Hadoop cluster using Amazon Elastic MapReduce.

Who this course is for:

  • Students with some prior programming or scripting ability SHOULD take this course.
  • Students with no prior software engineering or programming experience should seek an introductory programming course first.
  • Content From:
  • Last updated 12/2019

Streaming Big Data with Spark Streaming, Scala, and Spark 3! Course Site

Add Comment

Click here to post a comment