Neo4j連接spark(使用neo4j-spark-connector.jar)

(1)首先,需要自行編譯添加jar包 neo4j-spark-connector_*.jar或者添加maven依賴。
下載源碼:

https://github.com/neo4j-contrib/neo4j-spark-connector

使用maven編譯。

或直接使用maven倉庫:

<!-- https://mvnrepository.com/artifact/neo4j-contrib/neo4j-spark-connector -->
<dependency>
    <groupId>neo4j-contrib</groupId>
    <artifactId>neo4j-spark-connector</artifactId>
    <version>2.4.0-M6</version>
</dependency>

(2)測試代碼
查詢數據,獲取rdd:

object Neo4J_Test3 {
  def main(args: Array[String]): Unit = {

    val conf : SparkConf = new SparkConf().setAppName("InitSpark").setMaster("local[*]")
    conf.set("spark.neo4j.bolt.url","bolt://192.168.72.143:7687")
    conf.set("spark.neo4j.bolt.user","neo4j")
    conf.set("spark.neo4j.bolt.password","1a2b3c4d")

    val sc = new SparkContext(conf)

    val sqlContext = new SQLContext(sc)

    val cypher = "match (n:Person) return n.pid "
    var neo: Neo4jRowRDD = Neo4jRowRDD(sc, cypher)

    print(neo.count())
  }
}
object Neo4j_Test4 {
  def main(args: Array[String]): Unit = {

    val conf : SparkConf = new SparkConf().setAppName("InitSpark").setMaster("local[*]")
    conf.set("spark.neo4j.bolt.url","bolt://192.168.72.143:7687")
    conf.set("spark.neo4j.bolt.user","neo4j")
    conf.set("spark.neo4j.bolt.password","1a2b3c4d")
    val sc = new SparkContext(conf)

    val neo4j = new Neo4j(sc)

    val new_neo4j: Neo4j = neo4j.cypher("match(n:Person) return n.pid")
    val rdd: RDD[Row] = new_neo4j.loadRowRdd

    println(rdd.count())
  }
}

查詢數據,獲取dateFrame:

object Neo4j_Test5 {

  def main(args: Array[String]): Unit = {

    val conf : SparkConf = new SparkConf().setAppName("InitSpark").setMaster("local[*]")
    conf.set("spark.neo4j.bolt.url","bolt://192.168.72.143:7687")
    conf.set("spark.neo4j.bolt.user","neo4j")
    conf.set("spark.neo4j.bolt.password","1a2b3c4d")
    val sc = new SparkContext(conf)

    val neo4j = new Neo4j(sc)

    val new_neo4j: Neo4j = neo4j.cypher("match(n:Movies) return n.movieId,n.title")
    val dataFrame: DataFrame = new_neo4j.loadDataFrame

    dataFrame.show()
  }
}

同樣可執行算法,即可返回rdd也可返回dataFrame:

object Neo4j_Test6 {
  def main(args: Array[String]): Unit = {

    val conf : SparkConf = new SparkConf().setAppName("InitSpark").setMaster("local[*]")
    conf.set("spark.neo4j.bolt.url","bolt://192.168.72.143:7687")
    conf.set("spark.neo4j.bolt.user","neo4j")
    conf.set("spark.neo4j.bolt.password","1a2b3c4d")
    val sc = new SparkContext(conf)

    val neo4j = new Neo4j(sc)

    val new_neo4j: Neo4j = neo4j.cypher("CALL algo.pageRank.stream('Page', 'LINKS', {iterations:20, dampingFactor:0.85}) " +
      "YIELD nodeId, score " +
      "RETURN algo.asNode(nodeId).name AS page,score " +
      "ORDER BY score DESC ")
    val frame: DataFrame = new_neo4j.loadDataFrame

    frame.show()
  }
}

(3)關於此jar工具的具體使用,參考:

https://github.com/neo4j-contrib/neo4j-spark-connector

(4)可能存在的問題
直接運行測試代碼可能會報錯:

ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.neo4j.driver.v1.exceptions.AuthenticationException: Unsupported authentication token, scheme 'none' is only allowed when auth is disabled.

這時需要修改neo4j的配置文件conf/neo4j.conf,在文件末尾添加:

dbms.security.auth_enabled=false

重啓neo4j再測試,成功連接。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章