Apache Spark is a general-purpose big data processing engine. It is a very powerful cluster computing framework which can run from a single cluster to thousands of clusters. It can run on clusters managed by Hadoop YARN, Apache Mesos, or by Spark’s standalone cluster manager itself. To read more on Spark Big data processing framework, visit this post “Big Data processing using Apache Spark – Introduction“. Here, in this post, we will learn how we can install Apache Spark on a local Windows Machine in a pseudo-distributed mode (managed by Spark’s standalone cluster manager) and run it using PySpark (Spark’s Python API).
Install Spark on Local Windows Machine
To install Apache Spark on a local Windows machine, we need to follow below steps:
Step 1 – Download and install Java JDK 8
Java JDK 8 is required as a prerequisite for the Apache Spark installation. We … More