Streaming Listeners
Streaming listeners are Spark listeners interested in streaming events like batch submitted, started or completed.
Streaming listeners implement org.apache.spark.streaming.scheduler.StreamingListener listener interface and process StreamingListenerEvent events.
The following streaming listeners are available in Spark Streaming:
StreamingListenerEvent Events
-
StreamingListenerBatchSubmittedis posted when streaming jobs are submitted for execution and triggersStreamingListener.onBatchSubmitted(see StreamingJobProgressListener.onBatchSubmitted). -
StreamingListenerBatchStartedtriggersStreamingListener.onBatchStarted -
StreamingListenerBatchCompletedis posted to inform that a collection of streaming jobs has completed, i.e. all the streaming jobs in JobSet have stopped their execution.
StreamingJobProgressListener
StreamingJobProgressListener is a streaming listener that collects information for StreamingSource and Streaming page in web UI.
|
Note
|
A StreamingJobProgressListener is created while StreamingContext is created and later registered as a StreamingListener and SparkListener when Streaming tab is created.
|
onBatchSubmitted
For StreamingListenerBatchSubmitted(batchInfo: BatchInfo) events, it stores batchInfo batch information in the internal waitingBatchUIData registry per batch time.
The number of entries in waitingBatchUIData registry contributes to numUnprocessedBatches (together with runningBatchUIData), waitingBatches, and retainedBatches. It is also used to look up the batch data for a batch time (in getBatchUIData).
numUnprocessedBatches, waitingBatches are used in StreamingSource.
|
Note
|
waitingBatches and runningBatches are displayed together in Active Batches in Streaming tab in web UI.
|
onBatchStarted
|
Caution
|
FIXME |
onBatchCompleted
|
Caution
|
FIXME |
Retained Batches
retainedBatches are waiting, running, and completed batches that web UI uses to display streaming statistics.
The number of retained batches is controlled by spark.streaming.ui.retainedBatches.