SparkPlanner — Default Query Planner (with no Hive Support)

SparkPlanner is a concrete QueryPlanner (indirectly through extending SparkStrategies) that allows for plugging in a collection of additional SparkStrategy transformations.

Table 1. SparkStrategy Transformations in SparkPlanner (in alphabetic order)
SparkStrategy Description

Aggregation

BasicOperators

DataSourceStrategy

DDLStrategy

FileSourceStrategy

InMemoryScans

JoinSelection

SpecialLimits

SparkPlanner requires a SparkContext, a SQLConf, and a collection of Strategy objects (as extraStrategies) when created.

SparkPlanner defines numPartitions method that returns the value of spark.sql.shuffle.partitions for the number of partitions to use for joins and aggregations. It is later used in BasicOperators strategy with RepartitionByExpression logical operator.

The required strategies collection uses extraStrategies extension point (defined as the argument to the constructor) and the predefined collection of Strategy objects.

collectPlaceholders required method returns a collection of PlanLater and the corresponding logical plans.

prunePlans required method does nothing, i.e. it returns what it gets directly.

Note

The order of the SparkStrategy transformations in SparkPlanner is as follows:

  1. extraStrategies

  2. FileSourceStrategy

  3. DataSourceStrategy

  4. DDLStrategy

  5. SpecialLimits

  6. Aggregation

  7. JoinSelection

  8. InMemoryScans

  9. BasicOperators

results matching ""

    No results matching ""