YarnSchedulerBackend — Foundation for Coarse-Grained Scheduler Backends for YARN
YarnSchedulerBackend is a CoarseGrainedSchedulerBackend that acts as the foundation for the concrete deploy mode-specific Spark scheduler backends for YARN, i.e. YarnClientSchedulerBackend and YarnClusterSchedulerBackend for client deploy mode and cluster deploy mode, respectively.
YarnSchedulerBackend registers itself as YarnScheduler RPC endpoint in the RPC Environment.
YarnSchedulerBackend is ready to accept task launch requests right after the sufficient executors are registered (that varies on dynamic allocation being enabled or not).
|
Note
|
With no extra configuration, YarnSchedulerBackend is ready for task launch requests when 80% of all the requested executors are available.
|
|
Note
|
YarnSchedulerBackend is an private[spark] abstract class and is never created directly (but only indirectly through the concrete implementations YarnClientSchedulerBackend and YarnClusterSchedulerBackend).
|
| Name | Initial Value | Description |
|---|---|---|
Ratio for minimum number of registered executors to claim
|
Minimum expected number of executors that is used to ensure that sufficient resources are available (and start accepting task launch requests). |
|
YarnSchedulerEndpoint object |
||
|
Total expected number of executors that is used to ensure that sufficient resources are available (and start accepting task launch requests). Updated to the final value when Spark on YARN starts (in client mode or cluster mode). |
|
(undefined) |
YARN’s ApplicationAttemptId of a Spark application. Only defined in Set when YarnClusterSchedulerBackend starts (and bindToYarn is called) using YARN’s Used for applicationAttemptId which is a part of SchedulerBackend Contract. |
|
Controls whether to reset Disabled (i.e. |
Resetting YarnSchedulerBackend — reset Method
|
Note
|
reset is a part of CoarseGrainedSchedulerBackend Contract.
|
reset resets the parent CoarseGrainedSchedulerBackend scheduler backend and ExecutorAllocationManager (accessible by SparkContext.executorAllocationManager).
doRequestTotalExecutors Method
def doRequestTotalExecutors(requestedTotal: Int): Boolean
|
Note
|
doRequestTotalExecutors is a part of the CoarseGrainedSchedulerBackend Contract.
|
doRequestTotalExecutors simply sends a blocking RequestExecutors message to YarnScheduler RPC Endpoint with the input requestedTotal and the internal localityAwareTasks and hostToLocalTaskCount attributes.
|
Caution
|
FIXME The internal attributes are already set. When and how? |
Starting the Backend — start Method
start creates a SchedulerExtensionServiceBinding object (using SparkContext, appId, and attemptId) and starts it (using SchedulerExtensionServices.start(binding)).
|
Note
|
A SchedulerExtensionServices object is created when YarnSchedulerBackend is initialized and available as services.
|
Ultimately, it calls the parent’s CoarseGrainedSchedulerBackend.start.
|
Note
|
|
Stopping the Backend — stop Method
stop calls the parent’s CoarseGrainedSchedulerBackend.requestTotalExecutors (using (0, 0, Map.empty) parameters).
|
Caution
|
FIXME Explain what 0, 0, Map.empty means after the method’s described for the parent.
|
It calls the parent’s CoarseGrainedSchedulerBackend.stop.
Ultimately, it stops the internal SchedulerExtensionServiceBinding object (using services.stop()).
|
Caution
|
FIXME Link the description of services.stop() here.
|
Recording Application and Attempt Ids — bindToYarn Method
bindToYarn(appId: ApplicationId, attemptId: Option[ApplicationAttemptId]): Unit
bindToYarn sets the internal appId and attemptId to the value of the input parameters, appId and attemptId, respectively.
|
Note
|
start requires appId.
|
Requesting YARN for Spark Application’s Current Attempt Id — applicationAttemptId Method
applicationAttemptId(): Option[String]
|
Note
|
applicationAttemptId is a part of SchedulerBackend Contract.
|
applicationAttemptId requests the internal YARN’s ApplicationAttemptId for the Spark application’s current attempt id.
Creating YarnSchedulerBackend Instance
|
Note
|
This section is only to take notes about the required components to instantiate the base services. |
YarnSchedulerBackend takes the following when created:
YarnSchedulerBackend initializes the internal properties.
Checking if Enough Executors Are Available — sufficientResourcesRegistered Method
sufficientResourcesRegistered(): Boolean
|
Note
|
sufficientResourcesRegistered is a part of the CoarseGrainedSchedulerBackend contract that makes sure that sufficient resources are available.
|
sufficientResourcesRegistered is positive, i.e. true, when totalRegisteredExecutors is exactly or above minRegisteredRatio of totalExpectedExecutors.