(1)首先,需要自行編譯添加jar包 neo4j-spark-connector_*.jar或者添加maven依賴。
下載源碼:
https://github.com/neo4j-contrib/neo4j-spark-connector
使用maven編譯。
或直接使用maven倉庫:
<!-- https://mvnrepository.com/artifact/neo4j-contrib/neo4j-spark-connector -->
<dependency>
<groupId>neo4j-contrib</groupId>
<artifactId>neo4j-spark-connector</artifactId>
<version>2.4.0-M6</version>
</dependency>
(2)測試代碼
查詢數據,獲取rdd:
object Neo4J_Test3 {
def main(args: Array[String]): Unit = {
val conf : SparkConf = new SparkConf().setAppName("InitSpark").setMaster("local[*]")
conf.set("spark.neo4j.bolt.url","bolt://192.168.72.143:7687")
conf.set("spark.neo4j.bolt.user","neo4j")
conf.set("spark.neo4j.bolt.password","1a2b3c4d")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val cypher = "match (n:Person) return n.pid "
var neo: Neo4jRowRDD = Neo4jRowRDD(sc, cypher)
print(neo.count())
}
}
object Neo4j_Test4 {
def main(args: Array[String]): Unit = {
val conf : SparkConf = new SparkConf().setAppName("InitSpark").setMaster("local[*]")
conf.set("spark.neo4j.bolt.url","bolt://192.168.72.143:7687")
conf.set("spark.neo4j.bolt.user","neo4j")
conf.set("spark.neo4j.bolt.password","1a2b3c4d")
val sc = new SparkContext(conf)
val neo4j = new Neo4j(sc)
val new_neo4j: Neo4j = neo4j.cypher("match(n:Person) return n.pid")
val rdd: RDD[Row] = new_neo4j.loadRowRdd
println(rdd.count())
}
}
查詢數據,獲取dateFrame:
object Neo4j_Test5 {
def main(args: Array[String]): Unit = {
val conf : SparkConf = new SparkConf().setAppName("InitSpark").setMaster("local[*]")
conf.set("spark.neo4j.bolt.url","bolt://192.168.72.143:7687")
conf.set("spark.neo4j.bolt.user","neo4j")
conf.set("spark.neo4j.bolt.password","1a2b3c4d")
val sc = new SparkContext(conf)
val neo4j = new Neo4j(sc)
val new_neo4j: Neo4j = neo4j.cypher("match(n:Movies) return n.movieId,n.title")
val dataFrame: DataFrame = new_neo4j.loadDataFrame
dataFrame.show()
}
}
同樣可執行算法,即可返回rdd也可返回dataFrame:
object Neo4j_Test6 {
def main(args: Array[String]): Unit = {
val conf : SparkConf = new SparkConf().setAppName("InitSpark").setMaster("local[*]")
conf.set("spark.neo4j.bolt.url","bolt://192.168.72.143:7687")
conf.set("spark.neo4j.bolt.user","neo4j")
conf.set("spark.neo4j.bolt.password","1a2b3c4d")
val sc = new SparkContext(conf)
val neo4j = new Neo4j(sc)
val new_neo4j: Neo4j = neo4j.cypher("CALL algo.pageRank.stream('Page', 'LINKS', {iterations:20, dampingFactor:0.85}) " +
"YIELD nodeId, score " +
"RETURN algo.asNode(nodeId).name AS page,score " +
"ORDER BY score DESC ")
val frame: DataFrame = new_neo4j.loadDataFrame
frame.show()
}
}
(3)關於此jar工具的具體使用,參考:
https://github.com/neo4j-contrib/neo4j-spark-connector
(4)可能存在的問題
直接運行測試代碼可能會報錯:
ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.neo4j.driver.v1.exceptions.AuthenticationException: Unsupported authentication token, scheme 'none' is only allowed when auth is disabled.
這時需要修改neo4j的配置文件conf/neo4j.conf,在文件末尾添加:
dbms.security.auth_enabled=false
重啓neo4j再測試,成功連接。