hadoop要點(二)

轉自:http://blog.csdn.net/careefour/article/details/51461415

1、用./bin/spark-shell啓動spark時遇到異常:java.net.BindException: Can't assign requested address: Service 'sparkDriver' failed after 16 retries!

解決方法:add export SPARK_LOCAL_IP="127.0.0.1" to spark-env.sh

2、java Kafka producer error:ERROR kafka.utils.Utils$ - fetching topic metadata for topics [Set(words_topic)] from broker [ArrayBuffer(id:0,host: xxxxxx,port:9092)] failed

解決方法:Set 'advertised.host.name' on server.properties of Kafka broker to server's realIP(same to producer's 'metadata.broker.list' property)

3、java.net.NoRouteToHostException: No route to host

解決方法:zookeeper的IP要配對

4、Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)   java.net.UnknownHostException: linux-pic4.site:

解決方法:add your hostname to /etc/hosts: 127.0.0.1 localhost linux-pic4.site

5、org.apache.spark.SparkException: A master URL must be set in your configuration

解決方法:SparkConf sparkConf = new SparkConf().setAppName("JavaDirectKafkaWordCount").setMaster("local");

6、Failed to locate the winutils binary in the hadoop binary path

解決方法:先安裝好hadoop

7、啓動spark時: Failed to get database default, returning NoSuchObjectException

解決方法:1)Copy winutils.exe from here(https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.0/bin) to some folder say, C:\Hadoop\bin. Set HADOOP_HOME to C:\Hadoop.2)Open admin command prompt. Run C:\Hadoop\bin\winutils.exe chmod 777 /tmp/hive

8、org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true.

解決方法:Use this constructor JavaStreamingContext(sparkContext: JavaSparkContext, batchDuration: Duration)  替代  new JavaStreamingContext(sparkConf, Durations.seconds(5));

9、Reconnect due to socket error: java.nio.channels.ClosedChannelException

解決方法:kafka服務器broker ip寫對

10、java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute

解決方法:tranformation最後一步產生的那個RDD必須有相應Action操作,例如massages.print()等

11、經驗:spark中數據寫入ElasticSearch的操作必須在action中以RDD爲單位執行

12、 Problem binding to [0.0.0.0:50010] java.net.BindException: Address already in use;

解決方法:master和slave配置成同一個IP導致的,要配成不同IP

13、CALL TO LOCALHOST/127.0.0.1:9000

解決方法:host配置正確,/etc/sysconfig/network    /etc/hosts    /etc/sysconfig/network-scripts/ifcfg-eth0

13、打開namenode:50070頁面,Datanode Infomation只顯示一個節點

解決方法:SSH配置錯誤導致,主機名一定要嚴格匹配,重新配置ssh免密碼登錄

14、經驗:搭建集羣時要首先配置好主機名,並重啓機器讓配置的主機名生效

15、INFO hdfs.DFSClient: Exception in createBlockOutputStream  java.net.NoRouteToHostException: No route to host

解決方法:如果主從節點能相互ping通,那就關掉防火牆 service iptables stop

16、經驗:不要隨意格式化HDFS,這會帶來數據版本不一致等諸多問題,格式化前要清空數據文件夾

17、namenode1: ssh: connect to host namenode1 port 22: Connection refused

解決方法:sshd被關閉或沒安裝導致,which sshd檢查是否安裝,若已經安裝,則sshd restart,並ssh 本機hostname,檢查是否連接成功

18、Log aggregation has not completed or is not enabled.

解決方法:在yarn-site.xml中增加相應配置,以支持日誌聚合

19、failed to launch org.apache.spark.deploy.history.History Server full log in

解決方法:正確配置spark-defaults.xml,spark-en.sh中SPARK_HISTORY_OPTS屬性

20、Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

解決方法:yarn-lient模式出現的異常,暫時無解

21、hadoop的文件不能下載以及YARN中Tracking UI不能訪問歷史日誌

解決方法:windows系統不能解析域名所致,把hosts文件hostname複製到windows的hosts中

22、經驗:HDFS文件路徑寫法爲:hdfs://master:9000/文件路徑,這裏的master是namenode的hostname,9000是hdfs端口號。

23、Yarn JobHistory Error: Failed redirect for container

解決方法:將 <value>http://<LOG_SERVER_HOSTNAME>:19888/jobhistory/logs</value>  配置到yarn-site.xml中,重啓yarn和JobHistoryServer

24、通過hadoop UI訪問hdfs文件夾時,出現提示 Permission denied: user=dr.who

解決方法:namonode節點終端執行:hdfs dfs -chmod -R 755 /

25、經驗:Spark的Driver只有在Action時纔會收到結果
26、經驗:Spark需要全局聚合變量時應當使用累加器(Accumulator)
27、經驗:Kafka以topic與consumer group劃分關係,一個topic的消息會被訂閱它的消費者組全部消費,如果希望某個consumer使用topic的全部消息,可將該組只設一個消費者,每個組的消費者數目不能大於topic的partition總數,否則多出的consumer將無消可費

28、java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;

解決方法:禁止單機思維代碼

29、eturned Bad Request(400) - failed to parse;Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes; Bailing out..

解決方法:寫入ES的數據格式糾正

30、java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds

解決方法:確保所有節點之間能夠免密碼登錄

31、集羣模式下,spark無法向elasticsearch寫入數據

解決方法:採用這種寫入方式(帶上es配置的Map參數)results.foreachRDD(javaRDD -> {JavaEsSpark.saveToEs(javaRDD, esSchema, cfg);return null;});

32、經驗:所有自定義類要實現serializable接口,否則在集羣中無法生效
33、經驗:resources資源文件讀取要在Spark Driver端進行,以局部變量方式傳給閉包函數

34、通過nio讀取資源文件時,java.nio.file.FileSystemNotFoundException  at com.sun.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:171)

解決方法:打成jar包後URI發生變化所致,形如jar:file:/C:/path/to/my/project.jar!/my-folder,要採用以下解析方式,

final Map<String, String> env = new HashMap<>();
final String[] array = uri.toString().split("!");
final FileSystem fs = FileSystems.newFileSystem(URI.create(array[0]), env);
final Path path = fs.getPath(array[1]);
35、經驗:DStream流轉化只產生臨時流對象,如果要繼續使用,需要一個引用指向該臨時流對象
36、經驗:提交到yarn cluster的作業不能直接print到控制檯,要用log4j輸出到日誌文件中

37、java.io.NotSerializableException: org.apache.log4j.Logger

解決方法:序列化類中不能包含不可序列化對象,you have to prevent logger instance from default serializabtion process, either make it transient or static. Making it static final is preferred option due to many reason because if you make it transient than after deserialization logger instance will be null and any logger.debug() call will result in NullPointerException in Java because neither constructor not instance initializer block is called during deserialization. By making it static and final you ensure that its thread-safe and all instance of Customer class can share same logger instance, By the way this error is also one of the reason Why Logger should be declared static and final in Java program. 

38、log4j:WARN Unsupported encoding

解決方法:1.把UTF改成小寫utf-8    2.設置編碼那行有空格

39、MapperParsingException[Malformed content, must start with an object

解決方法:採用接口JavaEsSpark.saveJsonToEs,因爲saveToEs只能處理對象不能處理字符串

40、 ERROR ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application

解決方法:資源不能分配過大,或者沒有把.setMaster("local[*]")去掉

41、WARN Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)

解決方法:配置文件broker編號要寫對,命令中的IP寫真實IP

42、 User class threw exception: org.apache.spark.SparkException: org.apache.spark.SparkException: Couldn't find leaders for Set([mywaf,7], [mywaf,1])

解決方法:正確配置kafka,並重新創建topic

43、在ES界面發現有節點shard分片不顯示

解決方法:該節點磁盤容量不足,清理磁盤增加容量

44、The method updateStateByKey(Function2<List<String>,Optional<S>,Optional<S>>, int) in the type JavaPairDStream<String,String> is not applicable for the arguments (Function2<List<String>,Optional<String>,Optional<String>>, int)

解決方法:Spark use com.google.common.base.Optional not jdk default package java.util.Optional

45、NativeCrc32.nativeComputeChunkedSumsByteArray

解決方法:配置eclipse的hadoop-home,bin和system32文件夾中加入64位的2.6版本的hadoop.dll

46、經驗:Spark Streaming包含三種計算模式:nonstate 、stateful 、window

47、Yarn的RM單點故障

解決方法:通過三節點zookeeper集羣和yarn-site.xml配置文件完成Yarn HA

48、經驗:kafka可通過配置文件使用自帶的zookeeper集羣
49、經驗:Spark一切操作歸根結底是對RDD的操作

50、如何保證kafka消息隊列的強有序

解決方法:把需要強有序的topic只設置一個partition

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章