1、在local启动jupyter notebook
在“终端”输入以下命令,进入jupyter notebook交互界面。
cd ~/pythonwork/ipynotebook
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*]
2、hadoop YARN-client模式下启动jupyter notebook
先启动hadoop cluster:启动master,data1,data2,data3虚拟机,在终端输入
start-all.sh
jupyternotebook在hadoop YARN-client运行
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_IR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client pyspark
3、在Spark Stand Alone运行jupyter notebook
1)启动hadoop cluster:启动master,data1,data2,data3虚拟机,在终端输入
start-all.sh
2)启动Spark Stand Alone cluster
/usr/local/spark/sbin/start-all.sh
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://master:7077 pyspark --num-executors 1 --total-executor-cores 2 --executor-memory 512m