pyspark Pivot Example

--

pyspark Python code for pivot process, we take example product store information and apply pivot function.

Step1: create data frame as above product information.

from pyspark.sql import SparkSessionspark = SparkSession.builder.master("local[*]").appName("pivote_app").getOrCreate()
sc = spark.sparkContext
sc.setLogLevel("Error")
product = [
(1, "store1", 95),
(1, "store2", 100),
(1, "store3", 105),
(2, "store1", 70),
(2, "store3", 80)

]
product_column = ["product_id", "store", "price"]
product_df = spark.createDataFrame(product, product_column)

product_df.show()

rsult_df = (
product_df
.groupBy("product_id").pivot("store").max("price")
)
rsult_df.show()

we get following data frame.

Step2: apply pivot operation.

rsult_df = (
product_df
.groupBy("product_id").pivot("store").max("price")
)
rsult_df.show()

--

--

Rupesh Kumar Singh
Rupesh Kumar Singh

Written by Rupesh Kumar Singh

An IT professional with 10+ years of experience, Python | pandas| Django | Flask | Superset | pyspark | FullStack | Hadoop | AWS | php | no-SQL | ETL | Data-pip

No responses yet