doExecute(): RDD[InternalRow]
ShuffleExchange
Physical Operator
ShuffleExchange
is a unary physical operator. It corresponds to Repartition
(with shuffle enabled) and RepartitionByExpression
logical operators (as translated in BasicOperators
strategy).
When created, ShuffleExchange
takes a Partitioning
, a single child
physical operator and an optional ExchangeCoordinator.
Name | Description |
---|---|
data size total (min, med, max) |
nodeName
is computed based on the optional ExchangeCoordinator with Exchange prefix and possibly (coordinator id: [coordinator-hash-code]).
Caution
|
FIXME A screenshot with the node in execution DAG in web UI. |
outputPartitioning
is the input Partitioning
.
While preparing execution (using doPrepare
), ShuffleExchange
registers itself with the ExchangeCoordinator if available.
Caution
|
FIXME When could ExchangeCoordinator not be available?
|
When doExecute, ShuffleExchange
computes a ShuffledRowRDD and caches it (to reuse avoiding possibly expensive executions).
doExecute
Method
Note
|
doExecute is a part of the SparkPlan contract.
|
Caution
|
FIXME |
prepareShuffleDependency
Internal Method
prepareShuffleDependency(): ShuffleDependency[Int, InternalRow, InternalRow]
Caution
|
FIXME |
prepareShuffleDependency
Helper Method
prepareShuffleDependency(
rdd: RDD[InternalRow],
outputAttributes: Seq[Attribute],
newPartitioning: Partitioning,
serializer: Serializer): ShuffleDependency[Int, InternalRow, InternalRow]
prepareShuffleDependency
creates a ShuffleDependency dependency.
Note
|
prepareShuffleDependency is used when ShuffleExchange prepares a ShuffleDependency (as part of…FIXME), CollectLimitExec and TakeOrderedAndProjectExec physical operators are executed.
|