import org.apache.spark.sql.expressions.Window
val orderId = Window.orderBy('id)
val dataset = spark.range(5).withColumn("group", 'id % 3)
scala> dataset.select('*, rank over orderId as "rank").show
+---+-----+----+
| id|group|rank|
+---+-----+----+
| 0| 0| 1|
| 1| 1| 2|
| 2| 2| 3|
| 3| 0| 4|
| 4| 1| 5|
+---+-----+----+
WindowExec Physical Operator
WindowExec is a unary physical operator with a collection of NamedExpressions (for windows), a collection of Expressions (for partitions), a collection of SortOrder (for sorting) and a child physical operator.
The output of WindowExec are the output of child physical plan and windows.
When executed (i.e. show) with no partitions, WindowExec prints out the following WARN message to the logs:
WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
|
Tip
|
Enable Add the following line to
Refer to Logging. |
|
Caution
|
FIXME Describe ClusteredDistribution
|
When the number of rows exceeds 4096, WindowExec creates UnsafeExternalSorter.
|
Caution
|
FIXME What’s UnsafeExternalSorter?
|