ShuffleExchange Physical Operator

ShuffleExchange is a unary physical operator. It corresponds to Repartition (with shuffle enabled) and RepartitionByExpression logical operators (as translated in BasicOperators strategy).

When created, ShuffleExchange takes a Partitioning, a single child physical operator and an optional ExchangeCoordinator.

Table 1. ShuffleExchange Metrics
Name Description

dataSize

data size total (min, med, max)

nodeName is computed based on the optional ExchangeCoordinator with Exchange prefix and possibly (coordinator id: [coordinator-hash-code]).

Caution
FIXME A screenshot with the node in execution DAG in web UI.

outputPartitioning is the input Partitioning.

While preparing execution (using doPrepare), ShuffleExchange registers itself with the ExchangeCoordinator if available.

Caution
FIXME When could ExchangeCoordinator not be available?

When doExecute, ShuffleExchange computes a ShuffledRowRDD and caches it (to reuse avoiding possibly expensive executions).

doExecute Method

doExecute(): RDD[InternalRow]
Note
doExecute is a part of the SparkPlan contract.
Caution
FIXME

prepareShuffleDependency Internal Method

prepareShuffleDependency(): ShuffleDependency[Int, InternalRow, InternalRow]
Caution
FIXME

prepareShuffleDependency Helper Method

prepareShuffleDependency(
  rdd: RDD[InternalRow],
  outputAttributes: Seq[Attribute],
  newPartitioning: Partitioning,
  serializer: Serializer): ShuffleDependency[Int, InternalRow, InternalRow]

prepareShuffleDependency creates a ShuffleDependency dependency.

Note
prepareShuffleDependency is used when ShuffleExchange prepares a ShuffleDependency (as part of…​FIXME), CollectLimitExec and TakeOrderedAndProjectExec physical operators are executed.

results matching ""

    No results matching ""