val spark: SparkSession = ...
val schema = spark.read
.format("csv")
.option("header", true)
.option("inferSchema", true)
.load("csv-logs/*.csv")
.schema
val df = spark.readStream
.format("csv")
.schema(schema)
.load("csv-logs/*.csv")
DataStreamReader
DataStreamReader is an interface for reading streaming data in DataFrame from data sources with specified format, schema and options.
DataStreamReader offers support for the built-in formats: json, csv, parquet, text. parquet format is the default data source as configured using spark.sql.sources.default setting.
DataStreamReader is available using SparkSession.readStream method.
format
format(source: String): DataStreamReader
format specifies the source format of the streaming data source.
schema
schema(schema: StructType): DataStreamReader
schema specifies the schema of the streaming data source.
option Methods
option(key: String, value: String): DataStreamReader
option(key: String, value: Boolean): DataStreamReader
option(key: String, value: Long): DataStreamReader
option(key: String, value: Double): DataStreamReader
option family of methods specifies additional options to a streaming data source.
There is support for values of String, Boolean, Long, and Double types for user convenience, and internally are converted to String type.
|
Note
|
You can also set options in bulk using options method. You have to do the type conversion yourself, though. |
options
options(options: scala.collection.Map[String, String]): DataStreamReader
options method allows specifying one or many options of the streaming input data source.
|
Note
|
You can also set options one by one using option method. |
load Methods
load(): DataFrame
load(path: String): DataFrame (1)
-
Specifies
pathoption before passing calls toload()
load loads streaming input data as DataFrame.
Internally, load creates a DataFrame from the current SparkSession and a StreamingRelation (of a DataSource based on schema, format, and options).
Built-in Formats
json(path: String): DataFrame
csv(path: String): DataFrame
parquet(path: String): DataFrame
text(path: String): DataFrame
DataStreamReader can load streaming data from data sources of the following formats:
-
json -
csv -
parquet -
text
The methods simply pass calls to format followed by load(path).