pyspark Pivot Example
Mar 25, 2021
pyspark Python code for pivot process, we take example product store information and apply pivot function.
Step1: create data frame as above product information.
from pyspark.sql import SparkSessionspark = SparkSession.builder.master("local[*]").appName("pivote_app").getOrCreate()
sc = spark.sparkContext
sc.setLogLevel("Error")product = [
(1, "store1", 95),
(1, "store2", 100),
(1, "store3", 105),
(2, "store1", 70),
(2, "store3", 80)
]
product_column = ["product_id", "store", "price"]
product_df = spark.createDataFrame(product, product_column)
product_df.show()
rsult_df = (
product_df
.groupBy("product_id").pivot("store").max("price")
)
rsult_df.show()
we get following data frame.
Step2: apply pivot operation.
rsult_df = (
product_df
.groupBy("product_id").pivot("store").max("price")
)
rsult_df.show()