doExecute(): RDD[InternalRow]
SparkPlan — Physical Execution Plan
SparkPlan is the base QueryPlan for physical operators to build physical execution plan of a structured query (which is also modelled as…a Dataset!).
The SparkPlan contract assumes that concrete physical operators define doExecute method which is executed when the final execute is called.
|
Note
|
The final execute is triggered when the QueryExecution (of a Dataset) is requested for a RDD[InternalRow].
|
When executed, a SparkPlan produces RDDs of InternalRow (i.e. RDD[InternalRow]s).
|
Caution
|
FIXME SparkPlan is Serializable. Why?
|
|
Note
|
The naming convention for physical operators in Spark’s source code is to have their names end with the Exec prefix, e.g. DebugExec or LocalTableScanExec.
|
|
Tip
|
Read InternalRow about the internal binary row format. |
| Name | Description |
|---|---|
|
|
|
|
|
SparkPlan has the following final methods that prepare environment and pass calls on to corresponding methods that constitute SparkPlan Contract:
-
executecallsdoExecute -
preparecallsdoPrepare -
executeBroadcastcallsdoExecuteBroadcast
| Name | Description |
|---|---|
waitForSubqueries Method
|
Caution
|
FIXME |
prepare Method
|
Caution
|
FIXME |
executeCollect Method
|
Caution
|
FIXME |
SparkPlan Contract
SparkPlan contract requires that concrete physical operators (aka physical plans) define their own custom doExecute.
| Name | Description |
|---|---|
Prepares execution |
|
|
Caution
|
FIXME Why are there two executes? |
Executing Query in Scope (after Preparations) — executeQuery Final Method
executeQuery[T](query: => T): T
executeQuery executes query in a scope (i.e. so that all RDDs created will have the same scope).
Internally, executeQuery calls prepare and waitForSubqueries before executing query.
|
Note
|
executeQuery is executed as part of execute, executeBroadcast and when CodegenSupport produces a Java source code.
|
Computing Query Result As Broadcast Variable — executeBroadcast Final Method
executeBroadcast[T](): broadcast.Broadcast[T]
executeBroadcast returns the results of the query as a broadcast variable.
Internally, executeBroadcast executes doExecuteBroadcast inside executeQuery.
|
Note
|
executeBroadcast is executed in BroadcastHashJoinExec, BroadcastNestedLoopJoinExec and ReusedExchangeExec.
|
SQLMetric
SQLMetric is an accumulator that accumulates and produces long metrics values.
There are three known SQLMetrics:
-
sum -
size -
timing
metrics Lookup Table
metrics: Map[String, SQLMetric] = Map.empty
metrics is a private[sql] lookup table of supported SQLMetrics by their names.