hap-hsfs/INSTALL_SPARK at master · cloudian/hap-hsfs · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
This will help you setup Spark stand along on HyperStore cluster.
"stand along" simply means it runs without Hadoop YARN nor mesos.

1. go to Spark download page(http://spark.apache.org/downloads.html),
select 1.5.2 & "pre-build package for Hadoop 2.6 or later".
and download spark-1.5.2-bin-hadoop2.6.tgz.

2. upload spark-1.5.2-bin-hadoop2.6.tgz to one Cloudian HyperStore node

3. unpack the uploaded spark-1.5.2-bin-hadoop2.6.tgz to your Spark installation directory(e.g. /opt)

4. upload hap/build/*.jar(see the followings) to the shared location(e.g. /usr/local/lib/) of the node

4-1: hadoop-aws-2.7.1.jar
4-2: hap-5.2.1.jar
4-3: aws-java-sdk-1.7.4.jar

5. go to Spark installation directory, and copy spark-env.sh to modify

# cp conf/spark-env.sh.template as conf/spark-env.sh

5. add Cloudian related classpaths to SPARK_CLASSPATH in spark-env.sh

e.g.
SPARK_CLASSPATH=/usr/local/lib/*:/opt/cloudian/conf:/opt/cloudian/lib/apache-cassandra-2.0.11.jar:/opt/cloudian/lib/apache-cassandra-clientutil-2.0.11.jar:/opt/cloudian/lib/apache-cassandra-thrift-2.0.11.jar:/opt/cloudian/lib/cassandra-driver-core-2.1.4.jar:/opt/cloudian/lib/cloudian-s3-5.2.jar:/opt/cloudian/lib/commons-pool-1.5.5.jar:/opt/cloudian/lib/jetty-util-9.2.3.v20140905.jar:/opt/cloudian/lib/hector-core-1.1-4.jar:/opt/cloudian/lib/guava-17.0.jar:/opt/cloudian/lib/jedis-2.0.1-jmx.jar:/opt/cloudian/lib/snappy-java-1.1.0.1.jar:/opt/cloudian/lib/httpclient-4.3.6.jar:/opt/cloudian/lib/httpcore-4.4.1.jar

6. copy conf/spark-defaults.conf to modify

# cp conf/spark-defaults.conf.template conf/spark-defaults.conf

7. add hsfs(s3a) related properties

# cat conf/spark-defaults.conf | grep hadoop
spark.hadoop.fs.s3a.access.key	ACCESS_KEY
spark.hadoop.fs.s3a.secret.key	SECRET_KEY
spark.hadoop.fs.s3a.connection.ssl.enabled	true|false
spark.hadoop.fs.s3a.endpoint		S3.DOMAIN.COM:S3_PORT
spark.hadoop.fs.hsfs.impl		com.cloudian.hadoop.HyperStoreFileSystem

8. copy /usr/local/lib/* and spark installation including the modified spark-env.sh and spark-defaults.conf to the other nodes

9. start spark master/slave services on each node

e.g. master on cloudian-node1
[root@cloudian-node1 spark-1.5.2-bin-hadoop2.6]# sbin/start-master.sh

e.g. slave(2 cores, 2 GB) on cloudian-node6
[root@cloudian-node6 spark-1.5.2-bin-hadoop2.6]# sbin/start-slave.sh -c 2 -m 2g spark://cloudian-node1:7077

10. check the status of the Spark cluster launched on the Spark master UI

e.g. master on cloudian-node1
http://cloudian-node1:8080/