May 31, 2019 - SQLRelease

RDD, DataFrame, and DataSet – Introduction to Spark Data Abstraction

Leave a Comment / Spark / Gopal Krishna Ranjan / May 31, 2019 / big data processing

Apache Spark is a general purpose distributed computing engine used for Big Data processing – Batch and stream processing. It provides high level APIs like Spark SQL, Spark Streaming, MLib, and GraphX to allow interaction with core functionalities of Apache Spark. Spark also facilitates several core data abstractions on top of the distributed collection of […]

RDD, DataFrame, and DataSet – Introduction to Spark Data Abstraction Read More »