准备
代码列子
1.安装scala插件
开发工具 intellij-IDEA
2.构建文件
在这里的例子,构建工具采用的是maven,sbt我们在实践中,发现拉取依赖包慢,而且每次更新或者添加依赖的时候,都会遍历检查所有的依赖,非常耗cpu,影响开发,建议maven.
插件
<scala.version>2.11.8</scala.version>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
<args>
<arg>-target:jvm-1.5</arg>
</args>
</configuration>
</plugin>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
</configuration>
</plugin>
//依赖:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.2.1</version>
</dependency>
3.开发代码
数据people.json
{"name":"zhangsan","age":25}
{"name":"wangwu","age":20}
{"name":"lisi","age":28}
{"name":"mazi","age":18}
新建HelloWorld scala object.
val spark = SparkSession
.builder()
.master("local[2]")
.appName("hello world")
.config("spark.some.config.option", "some-value")
.getOrCreate()
import spark.implicits._
val peopleDF = spark.read.json("src/main/resources/people.json")
val newPeopleDF = peopleDF.map(row=>{
val name = row.getAs[String]("name")
val age = row.getAs[Long]("age")
(name,age-18)
}).toDF("name","理黄花大闺女的年龄差")
newPeopleDF.show()
输出: