文章目錄

這幾天碰到了類似的問題, 網上查的一些資料, 這裏記錄一下~

1. Master

將所有的數據全部回收到master, 然後在master進行集中處理

連接池代碼:

public class TestRedisPool {
	private JedisPool pool = null;
	public TestRedisPool(String ip, int port, String passwd, int database) {
		if (pool == null) {
			JedisPoolConfig config = new JedisPoolConfig();
			config.setMaxTotal(500);
			config.setMaxIdle(30);
			config.setMinIdle(5);
			config.setMaxWaitMillis(1000 * 10);
			config.setTestWhileIdle(false);
			config.setTestOnBorrow(false);
			config.setTestOnReturn(false);
			pool = new JedisPool(config, ip, port, 10000, passwd, database);
			Logs.debug("init:" + pool);
		}
	}
	public JedisPool getRedisPool() {
		return pool;
	}
	public String set(String key,String value){
		Jedis jedis = null;
		try {
			jedis = pool.getResource();
			return jedis.set(key, value);
		} catch (Exception e) {
			e.printStackTrace();
			return "0";
		} finally {
			jedis.close();
		}
	}
}

使用方式:

List<String> list = Arrays.asList("a","b","c","d", "e");
JavaRDD<String> javaRDD = new JavaSparkContext(spark.sparkContext()).parallelize(list, 3);
TestRedisPool testRedisPool = new TestRedisPool(redisIp, port, passwd, dbNum);
List<String> lst = javaRDD.collect();
for(String s:lst) {
	testRedisPool.set(s, getDateString2(0));
}

2. Worker

在worker遍歷的時候初始化連接池

javaRDD.foreach(new VoidFunction<String>() {
	@Override
	public void call(String s) throws Exception {
		TestRedisPool testRedisPool = new TestRedisPool(redisIp, port, passwd, dbNum);
		Logs.debug(testRedisPool.getRedisPool());
		testRedisPool.set(s, getDateString2(0));
	}
});

遍歷所有元素，TestRedisPool不需要實現序列化；每一個RDD中的元素都需要創建很多的redis連接池，即便使用短連接也會對redis造成很大的壓力。效率也是極其低下的。

3. Master上創建，Worker上遍歷

在Master上創建一個實例，在進行分區遍歷時使用Master上創建的實例，這種方式是可以的，只需要將類實現序列即可。同時還可以通過廣播變量，將實例在Worker上持久化，減少實例使用時的網絡傳輸。

TestRedisPool testRedisPool = new TestRedisPool(redisIp, port, passwd, dbNum);
javaRDD.foreach(new VoidFunction<String>() {
	@Override
	public void call(String s) throws Exception {
		Logs.debug(testRedisPool.getRedisPool());
		testRedisPool.set(s, getDateString2(0));
	}
});

Exception in thread "main" org.apache.spark.SparkException: Task not serializable
...
Serialization stack:
	- object not serializable (class: redis.clients.jedis.JedisPool, value: redis.clients.jedis.JedisPool@3e4f80cb)

報錯jedispool無法序列化，即使TestRedisPool類實現了序列化，但因爲其成員變量jedispool本身並不支持序列化，所以這種方式在有成員變量無法序列化時也不可用。

4. Worker上按分區遍歷

javaRDD.foreachPartition(new VoidFunction<Iterator<String>>() {
	@Override
	public void call(Iterator<String> stringIterator) throws Exception {
		TestRedisPool testRedisPool = new TestRedisPool(redisIp, port, passwd, dbNum);
		while (stringIterator.hasNext()) {
			Logs.debug(testRedisPool.getRedisPool());
			testRedisPool.set(stringIterator.next(), getDateString2(0));
		}
	}
});

TestRedisPool不需要實現序列化，每個分區只需要創建一個redis連接池

5. 使用靜態類型，按分區遍歷

在上面，我們可以做到在每個分區上建立連接池，但是每臺機器一般對應多個分區，怎麼進一步減少連接池的創建呢。我們知道靜態類型全局只有一份，如果將redis連接池定義爲靜態類型，做到每個worker上只創建一個redis連接池。

public class TestRedisPool {
	private static JedisPool pool = null;
	...
}

錯誤使用:

TestRedisPool testRedisPool = new TestRedisPool(redisIp, port, passwd, dbNum);
javaRDD.foreachPartition(new VoidFunction<Iterator<String>>() {
	@Override
	public void call(Iterator<String> stringIterator) throws Exception {
		while (stringIterator.hasNext()) {
			Logs.debug(testRedisPool.getRedisPool());
			testRedisPool.set(stringIterator.next(), getDateString2(0));
		}
	}
});

這種在Master上創建TestRedisPool實例的方式，在worker上無法獲取到，會報java.lang.NullPointerException異常。

正確使用:

javaRDD.foreachPartition(new VoidFunction<Iterator<String>>() {
	@Override
	public void call(Iterator<String> stringIterator) throws Exception {
		TestRedisPool testRedisPool = new TestRedisPool(redisIp, port, passwd, dbNum);
		while (stringIterator.hasNext()) {
			Logs.debug(testRedisPool.getRedisPool());
			testRedisPool.set(stringIterator.next(), getDateString2(0));
		}
	}
});

TestRedisPool也不需要序列化。這種情況下是在分區上分別創建實例，分區對應的就是虛擬線程的個數，所以相當於3個線程同時去獲取jedispool實現，所以一共init了三次。如果做成單例模式就能解決init多次的問題。

6. 使用單例模式，按分區遍歷

連接池代碼:

package com.project.uitl;

import redis.clients.jedis.Jedis;
import redis.clients.jedis.JedisPool;
import redis.clients.jedis.JedisPoolConfig;

/**
 * Redis 連接池工具包
 *
 */
public class JedisPoolUtil {
    
    private static final String HOST = "132.232.6.208";
    private static final int PORT = 6381;
    
    private static volatile JedisPool jedisPool = null;
    
    private JedisPoolUtil() {}
    
    /**
     * 獲取RedisPool實例（單例）
     * @return RedisPool實例
     */
    public static JedisPool getJedisPoolInstance() {
        if (jedisPool == null) {
            synchronized (JedisPoolUtil.class) {
                if (jedisPool == null) {
                    
                    JedisPoolConfig poolConfig = new JedisPoolConfig();
                    poolConfig.setMaxTotal(1000);           // 最大連接數
                    poolConfig.setMaxIdle(32);              // 最大空閒連接數
                    poolConfig.setMaxWaitMillis(100*1000);  // 最大等待時間
                    poolConfig.setTestOnBorrow(true);       // 檢查連接可用性, 確保獲取的redis實例可用
                    
                    jedisPool = new JedisPool(poolConfig, HOST, PORT);
                }
            }
        }
        
        return jedisPool;
    }
    
    /**
     * 從連接池中獲取一個 Jedis 實例（連接）
     * @return Jedis 實例
     */
    public static Jedis getJedisInstance() {
        
        return getJedisPoolInstance().getResource();
    }
    
    /**
     * 將Jedis對象（連接）歸還連接池
     * @param jedisPool 連接池
     * @param jedis 連接對象
     */
    public static void release(JedisPool jedisPool, Jedis jedis) {
        
        if (jedis != null) {
            jedisPool.returnResourceObject(jedis);  // 已廢棄，推薦使用jedis.close()方法
        }
    }
}

以上volatile保證當jedispool未初始化完成是不能被獲取到，synchronized解決多線程衝突的問題。這兩個關鍵詞的使用其實也就是lazy initialize的實現。

javaRDD.foreachPartition(new VoidFunction<Iterator<String>>() {
	@Override
	public void call(Iterator<String> stringIterator) throws Exception {
		TestRedisPool testRedisPool = new TestRedisPool(redisIp, port, passwd, dbNum);
		while (stringIterator.hasNext()) {
			Logs.debug("class:" + testRedisPool );
			Logs.debug("pool:" + testRedisPool .getRedisPool());
			testRedisPool .set(stringIterator.next(), getDateString2(0));
		}
	}
});

現在jedispool只init了一次，並且全局也只有一個jedispool。但是現在TestRedisPool對象還是被創建了多個，改爲在Master上定義，並已廣播變量的形式分發到Worker上可以解決這個問題，這種情況下TestRedisPool需要序列化。

7. 使用單例模式，Driver上定義，分區上遍歷

TestRedisPool testRedisPool = new TestRedisPool(redisIp, port, passwd, dbNum);
final Broadcast<TestRedisPool> broadcastRedis = new JavaSparkContext(spark.sparkContext()).broadcast(testRedisPool);
javaRDD.foreachPartition(new VoidFunction<Iterator<String>>() {
	@Override
	public void call(Iterator<String> stringIterator) throws Exception {
		TestRedisPool redisClient1 = broadcastRedis.getValue();
		while (stringIterator.hasNext()) {
			Logs.debug("class:" + redisClient1);
			Logs.debug("pool:" + redisClient1.getRedisPool());
			redisClient1.set(stringIterator.next(), getDateString2(0));
		}
	}
});

現在是TestRedisPool在Master上定義，廣播到各個Worker上；同時jedispool在每臺worker上也始終只會有一個實例存在。

但是也會有人會疑問，爲什麼jedispool現在沒有序列化的問題（方法三），或者定義成靜態導致worker上獲取不到jedispool（方法五第一種情況）的問題。這是因爲方法三中jedispool爲普通類型是和類一起序列化，因爲其本身不支持序列化，所以報錯方法五中，定義成靜態類型之後，靜態類型不屬於類，所以TestRedisPool序列化不會出錯，但是因爲jedispool在Master上定義和初始化，不會傳輸到節點上，節點上獲取到的jedispool都爲null，所以報錯。而方法七中使用懶啓動的方式，在使用時纔會初始化jedispool，所以實際是在節點上完成的初始化，所以不會有問題。

spark使用redis的連接池

文章目錄

1. Master

2. Worker

3. Master上創建，Worker上遍歷

4. Worker上按分區遍歷

5. 使用靜態類型，按分區遍歷

6. 使用單例模式，按分區遍歷

7. 使用單例模式，Driver上定義，分區上遍歷

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

Navicat安裝與激活教程

TDengine docker安裝方法

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

設計模式學習筆記02 - Java版之UML類圖

設計模式學習筆記22 - Java版之狀態模式

Java高併發編程學習筆記

設計模式學習筆記07 - Java版之適配器模式

設計模式學習筆記21 - Java版之解釋器模式

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結