Spark shell裏RDD action失敗

今天操作Spark的時候遇到如下錯誤

scala> val work = sc.textFile("file:///tmp/input")

work: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[17] at textFile at <console>:27


scala> work.count()
16/01/13 23:01:52 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 6, sparkworker1): java.io.FileNotFoundException: File file:/tmp/input does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
                                                          

可是,在master上看/tmp/input確實是存在的,爲何?

[root@sparkmaster tmp]# ll /tmp/input 
-rw-r--r--. 1 root root 38692 Jan 11 07:57 /tmp/input

會不會是要求此文件在所有的worker的相同目錄下都存在?因爲我還有另外一個worker,於是scp這個input到worker1的tmp目錄:

[root@sparkmaster tmp]# scp input sparkworker1:/tmp/
input                                         100%   38KB  37.8KB/s   00:00    

再在Spark shell裏執行RDD action,成功!

scala> work.count()

res9: Long = 874


yeah!哈哈!

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章