Flink狀態管理(二)狀態數據結構和註冊流程

起源

事情的起源是在1.6.2的版本上,釘友發現了一個Bug,在TTL state snapshot的時,此處會拋IllegalArgumentException:

protected CompositeSerializer<TtlValue<T>> createSerializerInstance(
	PrecomputedParameters precomputed,
	TypeSerializer<?> ... originalSerializers) {
	Preconditions.checkNotNull(originalSerializers);
	 //異常,length爲1,原因在緊接的一行中
	Preconditions.checkArgument(originalSerializers.length == 2);
	//duplicate的時候只取了fieldSerializer,丟棄了TTL中timestamp的LongSerializer
	return new TtlSerializer<>(precomputed, (TypeSerializer<T>) originalSerializers[1]); 
}

上述BUG已經在Flink 1.6.3版本修復,在定位過程中反覆跟蹤和閱讀了State的註冊和使用源碼,在此進行記錄。

State 存儲結構

存儲結構層次如下:

  1. 在AbstractKeyedStateBackend中以StateDescriptor name爲key存儲State
    // From AbstractKeyedStateBackend.java
    private final HashMap<String, InternalKvState<K, ?, ?>> keyValueStatesByName;
    
  2. InternalKvState中以三元組的形式存儲保存數據,具體保存方式和store的類型相關,以heap方式爲例:
    //From AbstractHeapState.java
    protected final StateTable<K, N, SV> stateTable;
    
  3. StateTable的具體實現方式有兩種,CopyOnWriteStateTable和NestedMapsStateTable,兩者的主要區別是一種是flat的方式存儲,一種是嵌套map的方式存儲。重點看看CopyOnWriteStateTable的數據保存方式:
    // From CopyOnWriteStateTable.java
    /**
     * This is the primary entry array (hash directory) of the state table. If no incremental rehash is ongoing, this
     * is the only used table.
     **/
    private StateTableEntry<K, N, S>[] primaryTable;
    
    /**
     * We maintain a secondary entry array while performing an incremental rehash. The purpose is to slowly migrate
     * entries from the primary table to this resized table array. When all entries are migrated, this becomes the new
     * primary table.
     */
    private StateTableEntry<K, N, S>[] incrementalRehashTable;
    
    此處有兩個StateTableEntry的數據,和CopyOnWriteTable的實現相關,不用太關注。
  4. StateTableEntry就是state的封裝類了,其中還包括一些hash、衝突鏈next指針等信息,和CopyOnWriteTable的實現強相關,是一種類hashMap的處理。

需要強調的是不同的State在具體的實現上有些差異。

State 註冊過程

1. 註冊時機

StreamTask在run前會先open所有的operator,operator的open方法中,我們通常會通過StateDescription 來初始化State。沒錯,就在此時註冊。一個MapState註冊的調用棧:
MapState的註冊過程

2. 關鍵過程

  1. 靜態註冊StateFactory

    //From TtlStateFactory
    @SuppressWarnings("deprecation")
    private Map<Class<? extends StateDescriptor>, SupplierWithException<IS, Exception>> createStateFactories() {
    	return Stream.of(
    		Tuple2.of(ValueStateDescriptor.class, (SupplierWithException<IS, Exception>) this::createValueState),
    		Tuple2.of(ListStateDescriptor.class, (SupplierWithException<IS, Exception>) this::createListState),
    		Tuple2.of(MapStateDescriptor.class, (SupplierWithException<IS, Exception>) this::createMapState),
    		Tuple2.of(ReducingStateDescriptor.class, (SupplierWithException<IS, Exception>) this::createReducingState),
    		Tuple2.of(AggregatingStateDescriptor.class, (SupplierWithException<IS, Exception>) this::createAggregatingState),
    		Tuple2.of(FoldingStateDescriptor.class, (SupplierWithException<IS, Exception>) this::createFoldingState)
    	).collect(Collectors.toMap(t -> t.f0, t -> t.f1));
    }
    
  2. 創建 KeyedState

    // From AbstractKeyedStateBackend.java
    public <N, S extends State, V> S getOrCreateKeyedState(
    		final TypeSerializer<N> namespaceSerializer,
    		StateDescriptor<S, V> stateDescriptor) throws Exception {
    	checkNotNull(namespaceSerializer, "Namespace serializer");
    	checkNotNull(keySerializer, "State key serializer has not been configured in the config. " +
    			"This operation cannot use partitioned state.");
    	
    	InternalKvState<K, ?, ?> kvState = keyValueStatesByName.get(stateDescriptor.getName());
    	if (kvState == null) {
    		if (!stateDescriptor.isSerializerInitialized()) {
    			// 將Tyepinfo轉爲Serializer,上面一句判空應該是冗餘的			
    			stateDescriptor.initializeSerializerUnlessSet(executionConfig);
    		}
    		// 通過KeyedStateFactory創建KVState
    		// 用TTLStateFactory進行包裝,未enale TTL的情況,就直接使用this
    		kvState = TtlStateFactory.createStateAndWrapWithTtlIfEnabled(
    			namespaceSerializer, stateDescriptor, this, ttlTimeProvider);
    		//以Descriptor的名稱爲key
    		keyValueStatesByName.put(stateDescriptor.getName(), kvState);
    		publishQueryableStateIfEnabled(stateDescriptor, kvState);
    	}
    	return (S) kvState;
    } 
    
  3. 以MapState爲例看具體創建代碼

    private <UK, UV> IS createMapState() throws Exception {
    	MapStateDescriptor<UK, UV> mapStateDesc = (MapStateDescriptor<UK, UV>) stateDesc;
    	//這裏重新包裝了descriptor,是因爲除了mapstate之外,還需要記錄timemstap,用於判斷TTL
    	MapStateDescriptor<UK, TtlValue<UV>> ttlDescriptor = new MapStateDescriptor<>(
    		stateDesc.getName(),
    		mapStateDesc.getKeySerializer(),
    		new TtlSerializer<>(mapStateDesc.getValueSerializer()));
    	//創建MapState
    	return (IS) new TtlMapState<>(
    		originalStateFactory.createInternalState(namespaceSerializer, ttlDescriptor, getSnapshotTransformFactory()),
    		ttlConfig, timeProvider, mapStateDesc.getSerializer());
    }
    
  4. 對於heap backend,創建internalState的code:

    // From heapKeyedStateBackend.java
    public <N, SV, SEV, S extends State, IS extends S> IS createInternalState(
    	@Nonnull TypeSerializer<N> namespaceSerializer,
    	@Nonnull StateDescriptor<S, SV> stateDesc,
    	@Nonnull StateSnapshotTransformFactory<SEV> snapshotTransformFactory) throws Exception {
    	//STATE_FACTORIES的定義見後文
    	StateFactory stateFactory = 
    	STATE_FACTORIES.get(stateDesc.getClass());
    	if (stateFactory == null) {
    		String message = String.format("State %s is not supported by %s",
    			stateDesc.getClass(), this.getClass());
    		throw new FlinkRuntimeException(message);
    	}
    	StateTable<K, N, SV> stateTable = tryRegisterStateTable(
    		namespaceSerializer, stateDesc, getStateSnapshotTransformer(stateDesc, snapshotTransformFactory));
    	return stateFactory.createState(stateDesc, stateTable, keySerializer);
    }
    
    // 註冊state
    private <N, V> StateTable<K, N, V> tryRegisterStateTable(
    		TypeSerializer<N> namespaceSerializer,
    		StateDescriptor<?, V> stateDesc,
    		StateSnapshotTransformer<V> snapshotTransformer) throws StateMigrationException {
    
    	@SuppressWarnings("unchecked")
    	// 以descriptor的名稱爲key, StateTable爲value
    	StateTable<K, N, V> stateTable = (StateTable<K, N, V>) registeredKVStates.get(stateDesc.getName());
    
    	RegisteredKeyValueStateBackendMetaInfo<N, V> newMetaInfo;
    	if (stateTable != null) {
    		@SuppressWarnings("unchecked")
    		StateMetaInfoSnapshot restoredMetaInfoSnapshot =
    			restoredStateMetaInfo.get(
    				StateUID.of(stateDesc.getName(), StateMetaInfoSnapshot.BackendStateType.KEY_VALUE));
    
    		Preconditions.checkState(
    			restoredMetaInfoSnapshot != null,
    			"Requested to check compatibility of a restored RegisteredKeyedBackendStateMetaInfo," +
    				" but its corresponding restored snapshot cannot be found.");
    
    		newMetaInfo = RegisteredKeyValueStateBackendMetaInfo.resolveKvStateCompatibility(
    			restoredMetaInfoSnapshot,
    			namespaceSerializer,
    			stateDesc,
    			snapshotTransformer);
    
    		stateTable.setMetaInfo(newMetaInfo);
    	} else {
    		newMetaInfo = new RegisteredKeyValueStateBackendMetaInfo<>(
    			stateDesc.getType(),
    			stateDesc.getName(),
    			namespaceSerializer,
    			stateDesc.getSerializer(),
    			snapshotTransformer);
    		// 創建StateTable
    		stateTable = snapshotStrategy.newStateTable(newMetaInfo);
    		registeredKVStates.put(stateDesc.getName(), stateTable);
    	}
    
    	return stateTable;
    }
    

鏈路有點長,需要點耐心,希望上述記錄可以作爲看該過程代碼的roadmap。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章