spark性能調優一--常規調優

一,分配更多的資源

bin/spark-submit \
 --class cn.spark.sparktest.core.WordCountCluster \
 --driver-memory 100m \配置driver的內存(影響不大)
 --num-executors 3 \ 配置executor的數量
 --executor-memory 100m \ 配置每個executor的內存大小
 --executor-cores 3 \ 配置每個executor的cpu core數量 
 /usr/local/SparkTest-0.0.1-SNAPSHOT-jar-with-dependencies.jar 

001.PNG-103.6kB

002.PNG-92.9kB

二,設置spark application的並行度

SparkConf conf=new SparkConf().set("spark.default.paralelism","500")

003.PNG-118.7kB

004.PNG-78.2kB

005.PNG-122.7kB
006.PNG-39.4kB

三,RDD架構重構和優化

001.PNG-117.8kB

002.PNG-92.2kB

四,廣播大變量

001.PNG-83.9kB

002.PNG-112.7kB

003.PNG-111.2kB

004.PNG-82.3kB

final Broadcast<Map<String,Map<String,List<Integer>>>> dateHourExtractMapBroadcast=sc.broadcast(dateHourExtractMap);

Map<String, Map<String, List<Integer>>> dateHourExtractMap =dateHourExtractMapBroadcast.value();

五,在項目中使用Kryo序列化

set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

001.PNG-148.2kB

002.PNG-106.8kB

六,在項目中使用fastutil框架

001.PNG-104.6kB

002.PNG-67.3kB

import it.unimi.dsi.fastutil.ints.IntArrayList;
import it.unimi.dsi.fastutil.ints.IntList;

        Map<String,Map<String,IntList>> fastutilDateHourExtractMap=new HashMap<String, Map<String, IntList>>();
        for(Map.Entry<String, Map<String,List<Integer>>> dateHourExtractEntry:dateHourExtractMap.entrySet()){
            String date=dateHourExtractEntry.getKey();
            Map<String,List<Integer>> hourExtractMap=dateHourExtractEntry.getValue();
            Map<String, IntList> fastutilHourExtractMap = new HashMap<String, IntList>();
            for(Map.Entry<String, List<Integer>> hourExtractEntry : hourExtractMap.entrySet()){
                String hour = hourExtractEntry.getKey();
                List<Integer> extractList = hourExtractEntry.getValue();

                IntList fastutilExtractList = new IntArrayList();
                for(int i = 0; i < extractList.size(); i++) {
                    fastutilExtractList.add(extractList.get(i));  
                }
                fastutilHourExtractMap.put(hour, fastutilExtractList);
            }
            fastutilDateHourExtractMap.put(date, fastutilHourExtractMap);
        }

七,調節本地化等待時長

SparkConf conf = new SparkConf()
                        .setAppName(Constants.SPARK_APP_NAME_SESSION)
                        .setMaster("local")
                        .set("spark.default.paralelism", "500")
                        .set("spark.locality.wait","10")
                        .set("spark.serializer","org.apache.spark.serializer.KryoSerializer")

001.PNG-44.9kB

002.PNG-88.7kB

003.PNG-85.9kB

004.PNG-55.8kB

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章