Spark Listeners — Intercepting Events from Spark Scheduler

A SparkListener intercepts events from the Spark scheduler that it emits over the course of execution of a Spark application.

A Spark listener is an implementation of the SparkListener developer API that is an extension of SparkListenerInterface where all the callback methods are no-op/do-nothing.

Spark uses Spark listeners for web UI, event persistence (for History Server), dynamic allocation of executors and other services.

You can develop your own custom Spark listener using the SparkListener developer API and register it using SparkContext.addSparkListener method or spark.extraListeners setting. With SparkListener you can focus on Spark events of your liking and process a subset of scheduling events.

Tip
Developing a custom SparkListener is an excellent introduction to low-level details of Spark’s Execution Model. Check out the exercise Developing Custom SparkListener to monitor DAGScheduler in Scala.
Tip

Enable INFO logging level for org.apache.spark.SparkContext logger to see when custom Spark listeners are registered.

INFO SparkContext: Registered listener org.apache.spark.scheduler.StatsReportListener

SparkListenerInterface — Internal Contract for Spark Listeners

SparkListenerInterface is an private[spark] contract for Spark listeners to intercept events from the Spark scheduler.

Note
SparkListener and SparkFirehoseListener Spark listeners are direct implementations of SparkListenerInterface contract to help developing more sophisticated Spark listeners.
Table 1. SparkListenerInterface Methods (listed in alphabetical order)
Method Event Reason

onApplicationEnd

SparkListenerApplicationEnd

SparkContext does postApplicationEnd.

onApplicationStart

SparkListenerApplicationStart

SparkContext does postApplicationStart.

onBlockManagerAdded

SparkListenerBlockManagerAdded

BlockManagerMasterEndpoint has registered a BlockManager.

onBlockManagerRemoved

SparkListenerBlockManagerRemoved

BlockManagerMasterEndpoint has removed a BlockManager.

onBlockUpdated

SparkListenerBlockUpdated

BlockManagerMasterEndpoint has received a UpdateBlockInfo message.

onEnvironmentUpdate

SparkListenerEnvironmentUpdate

SparkContext does postEnvironmentUpdate.

onExecutorMetricsUpdate

SparkListenerExecutorMetricsUpdate

onExecutorAdded

SparkListenerExecutorAdded

DriverEndpoint RPC endpoint (of CoarseGrainedSchedulerBackend) handles RegisterExecutor message, MesosFineGrainedSchedulerBackend does resourceOffers, and LocalSchedulerBackendEndpoint starts.

onExecutorRemoved

SparkListenerExecutorRemoved

DriverEndpoint RPC endpoint (of CoarseGrainedSchedulerBackend) does removeExecutor and MesosFineGrainedSchedulerBackend does removeExecutor.

onJobEnd

SparkListenerJobEnd

DAGScheduler does cleanUpAfterSchedulerStop, handleTaskCompletion, failJobAndIndependentStages, and markMapStageJobAsFinished.

onJobStart

SparkListenerJobStart

DAGScheduler handles JobSubmitted and MapStageSubmitted messages.

onStageCompleted

SparkListenerStageCompleted

DAGScheduler does markStageAsFinished.

onStageSubmitted

SparkListenerStageSubmitted

DAGScheduler does submitMissingTasks.

onTaskEnd

SparkListenerTaskEnd

DAGScheduler handles a task completion.

onTaskGettingResult

SparkListenerTaskGettingResult

DAGScheduler handles GettingResultEvent event.

onTaskStart

SparkListenerTaskStart

DAGScheduler is informed that a task is being started.

onUnpersistRDD

SparkListenerUnpersistRDD

SparkContext does unpersistRDD.

onOtherEvent

SparkListenerEvent

Built-In Spark Listeners

Table 2. Built-In Spark Listeners
Spark Listener Description

EventLoggingListener

Logs JSON-encoded events to a file that can later be read by History Server

StatsReportListener

SparkFirehoseListener

Allows users to receive all SparkListenerEvent events by overriding the single onEvent method only.

ExecutorAllocationListener

HeartbeatReceiver

StreamingJobProgressListener

ExecutorsListener

Prepares information for Executors tab in web UI

StorageStatusListener, RDDOperationGraphListener, EnvironmentListener, BlockStatusListener and StorageListener

For web UI

SpillListener

ApplicationEventListener

StreamingQueryListenerBus

SQLListener / SQLHistoryListener

Support for History Server

StreamingListenerBus

JobProgressListener

results matching ""

    No results matching ""