CacheManager — In-Memory Cache for Cached Tables

CacheManager is an in-memory cache for cached tables (as logical plans). It uses the internal cachedData collection of CachedData to track logical plans and their cached InMemoryRelation representation.

CacheManager is shared across SparkSessions though SharedState.

sparkSession.sharedState.cacheManager
Note
A Spark developer can use CacheManager to cache Datasets using cache or persist operators.

cachedData Internal Registry

cachedData is a collection of CachedData with logical plans and their cached InMemoryRelation representation.

cachedData is cleared when…​FIXME

invalidateCachedPath Method

Caution
FIXME

invalidateCache Method

Caution
FIXME

lookupCachedData Method

Caution
FIXME

uncacheQuery Method

Caution
FIXME

isEmpty Method

Caution
FIXME

Caching Dataset — cacheQuery Method

When you cache or persist a Dataset, both methods pass the call to cacheQuery method.

cacheQuery(
  query: Dataset[_],
  tableName: Option[String] = None,
  storageLevel: StorageLevel = MEMORY_AND_DISK): Unit

cacheQuery obtains analyzed logical plan and saves it as a InMemoryRelation in the internal cachedData cached queries collection.

If however the query has already been cached, you should instead see the following WARN message in the logs:

WARN CacheManager: Asked to cache already cached data.

Removing All Cached Tables From In-Memory Cache — clearCache Method

clearCache(): Unit

clearCache acquires a write lock and unpersists RDD[CachedBatch]s of the queries in cachedData before removing them altogether.

Note
clearCache is executed when the CatalogImpl is requested to clearCache.

CachedData

Caution
FIXME

results matching ""

    No results matching ""