Spark Dataset with example

Spark dataset is distributed collection of typed objects.This was introduced in Spark 1.6 version.

It consolidates features of both RDD and Data frame with fast execution response and memory optimization of data processing .The methodology used in DataSet is Encoder which enhances execution and processing response of data over the network compared to others which depends on JAVA or Kryo Serializer .

Dataset concept came into feature with the following advantages.

Spark Dataset Features

Thus developers can get more functional benefits working with Dataset API.

Dataset API is available only for JAVA and Scala as a single interface which reduces the burden of libraries .

As of Spark 2.0.0,Dataframe is currently a mere type alias for Dataset[Row] . Users use Dataset<Row>to represent a DataFrame inJava API.

Creating a Dataset in Scala:

Creating a Dataset from a RDD

val rdd = sc.parallelize(Seq(("Apple",180), ("Banana",40))) //creating rdd by calling parallelize() method on collection of data

val dataset = rdd.toDS() // converting RDD to DataSet by calling toDS() method

dataset.show() // to view the content of dataset

Creating a Dataset from a DataFrame

case class Shop(fruitname: String, quantity:Long) //creating a case class named Shop with attributes fruitname and quantity

val data = Seq(("Apple",180), ("Banana",40)) // define a collection

val df = sc.parallelize(data).toDF() //creating rdd and calling toDF() method to

convert it into dataframe

val dataset = df.as[Shop] //creating dataset by using case class reflection

dataset.show() //to view the content of dataset

Learning Coding!!

See you in next blog!

ITechShree-Data-Analytics-Technologies

Spark Dataset with example

Creating a Dataset from a DataFrame

Posted by: D Gorai

Post a Comment

0 Comments

Labels

Random Posts

Popular Posts

Flume Spooling directory example

Apache Spark interview questions Set 2

Learn Flume

Menu Footer Widget

ITechShree-Data-Analytics-Technologies

Spark Dataset with example

Creating a Dataset from a DataFrame

Posted by: D Gorai

You may like these posts

Post a Comment

0 Comments

Labels

Random Posts

Popular Posts

Flume Spooling directory example

Apache Spark interview questions Set 2

Learn Flume

Menu Footer Widget