log4j.logger.org.apache.spark.shuffle.sort.ShuffleExternalSorter=INFO
ShuffleExternalSorter — Cache-Efficient Sorter
ShuffleExternalSorter is a specialized cache-efficient sorter that sorts arrays of compressed record pointers and partition ids. By using only 8 bytes of space per record in the sorting array, ShuffleExternalSorter can fit more of the array into cache.
ShuffleExternalSorter is a MemoryConsumer.
| Name | Initial Value | Description |
|---|---|---|
(empty) |
|
|
Tip
|
Enable Add the following line to Refer to Logging. |
getMemoryUsage Method
|
Caution
|
FIXME |
closeAndGetSpills Method
|
Caution
|
FIXME |
insertRecord Method
|
Caution
|
FIXME |
freeMemory Method
|
Caution
|
FIXME |
getPeakMemoryUsedBytes Method
|
Caution
|
FIXME |
writeSortedFile Method
|
Caution
|
FIXME |
cleanupResources Method
|
Caution
|
FIXME |
Creating ShuffleExternalSorter Instance
ShuffleExternalSorter takes the following when created:
-
memoryManager— TaskMemoryManager -
blockManager— BlockManager -
taskContext— TaskContext -
initialSize -
numPartitions -
writeMetrics— ShuffleWriteMetrics
ShuffleExternalSorter initializes itself as a MemoryConsumer (with pageSize as the minimum of PackedRecordPointer.MAXIMUM_PAGE_SIZE_BYTES and pageSizeBytes, and Tungsten memory mode).
ShuffleExternalSorter uses spark.shuffle.file.buffer (for fileBufferSizeBytes) and spark.shuffle.spill.numElementsForceSpillThreshold (for numElementsForSpillThreshold) Spark properties.
ShuffleExternalSorter creates a ShuffleInMemorySorter (with spark.shuffle.sort.useRadixSort Spark property enabled by default).
ShuffleExternalSorter initializes the internal registries and counters.
|
Note
|
ShuffleExternalSorter is created when UnsafeShuffleWriter is open (which is when UnsafeShuffleWriter is created).
|
Freeing Execution Memory by Spilling To Disk — spill Method
long spill(long size, MemoryConsumer trigger)
throws IOException
|
Note
|
spill is a part of MemoryConsumer contract to sort and spill the current records due to memory pressure.
|
spill frees execution memory, updates TaskMetrics, and in the end returns the spill size.
|
Note
|
spill returns 0 when ShuffleExternalSorter has no ShuffleInMemorySorter or the ShuffleInMemorySorter manages no records.
|
You should see the following INFO message in the logs:
INFO Thread [id] spilling sort data of [memoryUsage] to disk ([size] times so far)
spill writes sorted file (with isLastFile disabled).
spill frees memory and records the spill size.
spill resets the internal ShuffleInMemorySorter (that in turn frees up the underlying in-memory pointer array).
spill returns the spill size.