文章目錄

對HpBandSter現有代碼與數據結構的細緻分析

對HpBandSter現有代碼與數據結構的細緻分析

Result

hpbandster.core.result.Result#__init__

self.data[0].keys()
Out[32]: dict_keys([(0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 0, 3), (0, 0, 4), (0, 0, 5), (0, 0, 6), (0, 0, 7), (0, 0, 8), (0, 0, 9), (0, 0, 10), (0, 0, 11), (0, 0, 12), (0, 0, 13), (0, 0, 14), (0, 0, 15), (0, 0, 16), (0, 0, 17), (0, 0, 18), (0, 0, 19), (0, 0, 20), (0, 0, 21), (0, 0, 22), (0, 0, 23), (0, 0, 24), (0, 0, 25), (0, 0, 26)])

run_id是一個三元組，是(iteration, stage, config)。但是看到stage全是0？

for k,v in self[0].items():
    print(v.budget)
    
9.0
...
27.0
27.0
27.0
27.0
27.0
27.0
243.0
81.0
9.0
81.0

在構造時，Result會被傳入(HB_iteration_data, HB_config)。其中HB_iteration_data是所有BaseIteration對象構成的一個列表，這個列表的每個item是一個字典，鍵是config_id。config_id是由(iteration, stage, config)構成的三元組，注意stage始終爲0 。

再來看看每個鍵對應的值：

>>> result.data[(0,0,24)].config
Out[13]: {'x': 0.008481607331294692}
>>> result.data[(0,0,24)].config_info
Out[14]: {'model_based_pick': True}
>>> result.data[(0,0,24)].results
Out[15]: 
{9.0: {'loss': 0.012722410996942038, 'info': 0.012722410996942038},
 27.0: {'loss': 0.004240803665647346, 'info': 0.004240803665647346},
 81.0: {'loss': 0.012722410996942038, 'info': 0.012722410996942038}}
>>> result.data[(0,0,24)].time_stamps
Out[16]: 
{9.0: {'submitted': 6.348446846008301,
  'started': 6.348571300506592,
  'finished': 6.866404056549072},
 27.0: {'submitted': 9.460097312927246,
  'started': 9.460161447525024,
  'finished': 9.97724461555481},
 81.0: {'submitted': 10.497020959854126,
  'started': 10.497111558914185,
  'finished': 11.015310049057007}}
>>> result.data[(0,0,24)].status
Out[17]: 'COMPLETED'
>>> result.data[(0,0,24)].budget
Out[18]: 81.0
>>> result.data[(0,0,24)].exceptions
Out[19]: {9.0: None, 27.0: None, 81.0: None}

對應每個config_id，config都是固定的。config是一個鍵值對。result是一個以budget爲鍵，loss，info爲值的字典。budget代表了最大預算。

對於一個result對象，又可以拆成1到多個run對象。

run對象少了個config_info

可以看到，同一個config可能有多個budget。那麼能不能處理成一個唯一的對象呢，比如上文的config有3個budget，能不能變成3個對象呢？

這就要用到Run數據結構了。

class Run(object):
	def __init__(self, config_id, budget, loss, info, time_stamps, error_logs):
		self.config_id   = config_id
		self.budget      = budget
		self.error_logs  = error_logs
		self.loss        = loss
		self.info        = info
		self.time_stamps = time_stamps

WarmStartIteration

在整個HpBandSter項目中，WarmStartIteration對象只會在hpbandster.core.master.Master#__init__中被實例化

		if previous_result is None:
			self.warmstart_iteration = []

		else:
			self.warmstart_iteration = [WarmStartIteration(previous_result, self.config_generator)]

看到熱啓動迭代對象的構造函數：

delta_t = - max(map(lambda r: r.time_stamps['finished'], Result.get_all_runs()))

構造一個起始時間。max(map(lambda r: r.time_stamps['finished'], Result.get_all_runs()))表示最大的時間跨度，加個負號變成負數的開始時間，也是make sense的。

super().__init__(-1, [len(id2conf)]	, [None], None)

參數名從左到右依次是：HPB_iter, num_configs, budgets, config_sampler, logger=None, result_logger=None
這一波迭代的iteration都是-1

		for i, id in enumerate(id2conf):
			new_id = self.add_configuration(config=id2conf[id]['config'], config_info=id2conf[id]['config_info'])

>>> id
Out[11]: (0, 0, 0)
>>> new_id
Out[12]: (-1, 0, 0)

			for r in Result.get_runs_by_id(id):
			
				
				j = Job(new_id, config=id2conf[id]['config'], budget=r.budget)
				
				j.result = {'loss': r.loss, 'info': r.info}
				j.error_logs = r.error_logs
				
				for k,v in r.time_stamps.items():
					j.timestamps[k] = v + delta_t

將config、budget（job輸入），loss、info（job輸出）等信息構造成一個job，
更新job的時間戳（留意後續有沒有校準這個時間戳）

				self.register_result(j , skip_sanity_checks=True)

new_id是用self.add_configuration這個方法獲取的，在調用這個方法是，就隱含了爲self.data這個成員變量創建一個鍵爲config_id，值爲Datum(config=config, config_info=config_info, budget = self.budgets[self.stage])的記錄。
self.register_result方法就是在更新這個Datum對象。

記錄一個點，這個Datum對象的budget爲None

				config_generator.new_result(j, update_model=(i==len(id2conf)-1))

最後的(i==len(id2conf)-1)表示只在最後一步更新模型。

看到熱啓動迭代對象是怎樣對時間戳做校準的。
看到hpbandster.core.base_iteration.WarmStartIteration#fix_timestamps

他是從hpbandster.core.master.Master#run調用過來的

		for i in self.warmstart_iteration:
			i.fix_timestamps(self.time_ref)

對所有的warmstart_iteration做時間戳校準。

最後Result構造函數的_merge_results會統一減去time_ref，warmstart的timestamp會重新變成負數。

記錄一個bug，max(map(lambda r: r.time_stamps['finished'], Result.get_all_runs()))的計算方式要改，因爲沒有考慮某次啓動過程採用了熱啓動，存儲results之後又被熱啓動的情況。

使用熱啓動之後繪圖效果是怎樣的呢？

json_result_logger

在看result的代碼的時候一個小寫命名的類json_result_logger吸引了我的注意。

在hpbandster.core.result.json_result_logger#__call__下方打一個斷點，調試進入。

調用來源：hpbandster.core.master.Master#job_callback

>>> job
Out[2]: 
job_id: (0, 0, 1)
kwargs: {'config': {'x': 0.3506573720892966}, 'budget': 9.0, 'working_directory': '.'}
result: {'loss': 0.3601942269139056, 'info': 0.3601942269139056}
exception: None

調試顯示job對象包含了任務輸入與輸出的各種信息，十分完備。

可以考慮用數據庫存儲job信息。

深入理解HpBandSter

文章目錄

對HpBandSter現有代碼與數據結構的細緻分析

Result

WarmStartIteration

json_result_logger

.Net 8.0 下的新RPC，IceRPC之試試的新玩法"打洞"

完美替代postman的軟件

Vue mockjs mock.js

關於遊戲付費的一點想法

我通過CKA和CKS啦！

安裝chromadb注意事項

《最新出爐》系列入門篇-Python+Playwright自動化測試-42-強大的可視化追蹤利器Trace Viewer

大數據怎麼學？對大數據開發領域及崗位的詳細解讀，完整理解大數據開發領域技術體系

自研貝葉斯優化算法遇到的坑

CSDN-AutoML技術實踐與應用

幾種測試用的黑盒函數

RoBO源碼分析

peewee調研

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結