Let's use the DataFrame
dfSequence
created from a sequence (see "
Create DataFrame from sequence:" above).
The same examples can be applied to DataFrame.
-
Schema:
<dataset>.printSchema
: Prints the schema to the console in a nice tree format.
<dataset>.dtypes
: Returns all column names and their data types as an array.
<dataset>.schema
: Returns the schema of this Dataset.
<dataset>.columns
: Returns all column names as an array.
-
Columns:
-
<dataset>.withColumn(columnName, column)
: Returns a new Dataset by adding a column or replacing the existing column that has the same name.
-
<dataset>.withColumnRenamed(existingColumnName, newColumnName)
: Returns a new Dataset with a column renamed.
-
<dataset>.drop(columnNames)
: Returns a new Dataset with columns dropped.
-
Rows:
<dataset>.count
: Returns the number of rows in the Dataset.
<dataset>.show
: Displays the top 20 rows of Dataset in a tabular form.
<dataset>.head
: Returns the first row.
<dataset>.first
: Returns the first row.
<dataset>.take(n)
: Returns the first n rows in the Dataset.
<dataset>.limit(n)
: Returns a new Dataset by taking the first n rows.
<dataset>.distinct
: Returns a new Dataset that contains only the unique rows from this Dataset.
-
Select:
-
<dataset>.select(columnName, columnNames)
: Selects a set of columns.
-
Filter:
-
<dataset>.filter(condition)
: Filters rows using the given condition.
-
Sort:
-
<dataset>.sort(columns)
: Returns a new Dataset sorted by the given expressions.
-
<dataset>.orderBy(columns)
: Returns a new Dataset sorted by the given expressions.
-
Views:
-
<dataset>.createTempView(viewName)
: Creates a local temporary view using the given name.