写一下看了部分spark submit 2017 ppt后的个人理解,
- Apache-Kylin--Speed-Up-Cubing-with-Apache-Spark-with-Luke-Han-and-Shaofeng-Shi-iteblog,kylin利用spark来加速之前MR的cube build过程
- 很多关于IoT的spark应用。ETL,real-time-analysis
- A-Deep-Dive-into-Spark-SQL's-Catalyst-Optimizer-with-Yin-Huai-iteblog,spark SQL优化项
- Apache-Spark-and-Apache-Ignite--Where-Fast-Data-Meets-the-IoT-with-Denis-Magda-iteblog,Ignite大数据分布式内存sql分析系统
- Best Practices for Using Alluxio with Spark, Alluxio缓存一份file,避免多个spark app的重复读并占用重复内存
- Cost-Based Optimizer in Apache Spark 2.2,spark CBO的最早提出?
- demystifying-dataframe-and-dataset-with-kazuaki-ishizaki. spark 2.2的dataset加速方案:数据转换(装、解箱)、序列化、字节码