Apache
Spark is fast cluster computing technology. It is based on Hadoop MapReduce and
it extends the MapReduce model. The main feature of Spark is its in-memory
cluster computing that
increases the processing speed of an application.
Spark
provide a high level API in java, scala, python and R. Spark. It provides a
shell in scala and python
Scala
can be accessed through ./bin/spark-shell
and python shell can be accessed via ./bin/pyspark
Spark
is 100 time faster than hadoob, spark achieve this via parallel distributed
data processing using partitions.
Spark
support multiple data source like Parquet, JSON, hive and Cassandra apart from
text file, csv and RDBMS
Setup
Apache Spark-
Set
Java home, use below command to set JAVA_HOME
setx JAVA_HOME -m
"Path". For “Path”, paste in your Java installation
path .
Install
Scala https://www.scala-lang.org/download/
setx SCALA_HOME -m "Path of
scala installation dir".
Download
and unzip the Apache spark https://spark.apache.org/downloads.html
setx SPARK_HOME –m “path of
spark bin folder”
Download
and unzip Hadoob common library
setx HADOOB_HOME –m “hadoob
common bin directory”
Install
python https://www.python.org/downloads/
Open
command prompt and navigate to spark bin folder and run spark-shell command
No comments:
Post a Comment