我們迅速進入到代碼編寫階段
我們要實現的圍棋機器人必須做到以下幾點:
1, 跟蹤當前所下的每一步棋。
2, 跟蹤當前的棋局進展。如果是機器人自我對弈,那麼代碼對棋局的跟蹤與人和機器人對弈是對棋局的跟蹤有所不同。
3, 根據當前棋盤局勢,搜索多種可行的下法,並從中評估出最好的走法。
4, 將棋局轉換爲可以擁有訓練網絡的數據。我們從易到難,先解決好小範圍的問題,打好基礎後才能處理更復雜的問題。首先我們要用代碼編制好棋盤,player,落子等對象。首先我們用代碼實現棋手:
import enum class Player(enum.Enum): black = 1 white = 2 ''' 返回對方棋子顏色,如果本方是白棋,那就返回Player.black ''' @property def other(self): if self == Player.white: return Player.black else: return Player.white
上一節我們講過,圍棋棋盤是由多條橫線和豎線交織而成,棋子必須落在橫線和豎線交叉點上,我們用以下代碼表示交叉點:
這裏我們使用python3的語言特性增加可讀性,Point類其實包含兩個整形成員,分別命名爲row和col,我們可以使用point.row和point.col來訪問兩個成員,如果不使用nametuple,那麼我們得通過point[0],piont[1]來訪問兩個成員,如此可讀性就大大降低。
from collections import namedtuple class Point(namedtuple('Point', 'row col')): def neghbors(self): ''' 返回當前點的相鄰點,也就是相對於當前點的上下左右四個點 ''' return [ Point(self.row - 1, self.col), Point(self.row + 1, self.col), Point(self.row, self.col - 1), Point(self.row, self.col + 1), ]
接下來我們需要用代碼來表示“落子”:
在圍棋中,“落子”分三種情況,一種是把棋子放到某個點;一種是放棄下子,讓對方繼續下,類似於撲克中的“大”,“過”;第三是投子認負,我們代碼中都對應了三種情況。
import copy class Move(): def __init__(self, point = None, is_pass = False, is_resign = False): assert(point is not None) ^is_pass ^is_resign self.point = point #是否輪到我下 self.is_play (self.pint is not None) self.is_pass = is_pass self.is_resign = is_resign @classmethod def play(cls, point): return Move(point = point) @classmethod #讓對方繼續下 def pass_turn(cls): return move(is_pass = True) @classmethod #投子認輸 def resign(cls): return move(is_resign = True)
上面代碼只是擁有表示下棋時的一下基本概念,並不包含邏輯,接下來我們要編寫圍棋的規則及邏輯代碼。首先要做的是棋盤,棋盤在每次落子之後它要檢測是否有對方棋子被喫,它要檢測相鄰棋子的所有自由點是否全部堵上,由於很可能有很多個棋子相鄰在一起,因此這一步或許或比較耗時,我們先用代碼表示相鄰在一起的多個棋子:
代碼中的merge_with函數不好理解,必須要仔細理解上面註釋纔好理解代碼邏輯,同時我們可以藉助下圖來理解merge_with函數的邏輯:
試想在第二行兩個分離的黑棋中落一個黑棋,那麼左邊單個黑棋和右邊兩個黑棋就會連成一片,左邊黑棋與落在中間黑棋連接成片時,它的自由點集合要減去中間落入的黑棋,同理右邊兩個黑棋的自由點也要減去落在中間黑棋所佔據的位置,這就是爲何要執行語句(self.liberties | go_string.liberties) - combined_stones。
#多個棋子連成一片 class GoString(): def __init__(self, color, stones, liberties): self.color = color #黑/白 self.stones = set(stones) #stone就是棋子 self.liberties = set(liberties) #自由點 def remove_liberty(self, point): self.liberties.remove(point) def add_liberty(self, point): self.liberties.add(point) def merged_with(self, go_string): # 落子之後,兩片相鄰棋子可能會合成一片 ''' 假設*代表黑棋,o代表白棋,x代表沒有落子的棋盤點,當前棋盤如下: x x x x x x x * x! * o * x x x * o x x x * x o x x x * o x x 注意看帶!的x,如果我們把黑子下在那個地方,那麼x!左邊的黑棋和新下的黑棋會調用當前函數進行合併, 同時x!上方的x和下面的x就會成爲合併後相鄰棋子共同具有的自由點。同時x!原來屬於左邊黑棋的自由點, 現在被一個黑棋佔據了,所以下面代碼要把該點從原來的自由點集合中去掉 ''' assert go_string.color == self.color combined_stones = self.stones | go_string.stones return GoString(self.color, combined_stones, (self.liberties | go_string.liberties) - combined_stones) @property def num_liberties(self): #自由點的數量 return len(self.liberties) def __eq__(self,other): #是否相等 return isinstance(other, GoString) and self.color == other.color and self.stones == other.stones and self.liberties == other.liberties
接下來我們使用代碼實現棋盤:這裏我們需要解釋_remove_string的邏輯
當我們在像右邊落入黑子後,中間被包圍的白子被喫掉後需要從棋盤上拿開。此時我們需要把被拿走棋子所在的點設置成未被佔據狀態,同時查找改點上下左右四邊的棋子片,爲這些棋片增加一個自由點。
#實現棋盤 class Board(): def __init__(self,num_rows,num_cols): self.num_rows = num_rows self.num_cols = num_cols self._grid = {} def place_stone(self,player,point): #確保位置在棋盤內 assert self.is_on_grid(point) #確保給定位置沒有被佔據 assert self._grid.get(point) is None adjecent_same_color = [] adhecent_opposite_color = [] liberties = [] for neighbor in point.neighbors(): #判斷落子點上下左右臨界點的情況 if not self.is_on_grid(neighbor): continue neighbor_string = self._grid.get(neighbor) if neighbor_string is None: #如果鄰接點沒有被佔據,那麼就是當前落子點的自由點 liberties.append(neighbor) elif neighbor_string.color == player: if neighbor_string not in adjecent_same_color: #記錄與棋子同色的連接棋子 adjecent_same_color.append(neighbor_string) else: if neighbor_string not in adhecent_opposite_color: #記錄落點與鄰接點棋子不同色的棋子 adhecent_opposite_color.append(neighbor_string) #將當前落子與棋盤上相鄰的棋子合併成一片 new_string = GoString(player, [point], liberties) for same_color_string in adjecent_same_color: new_string = new_string.merged_with(same_color_string) for new_string_point in new_string.stones: #訪問棋盤某個點時返回與該棋子相鄰的所以棋子集合 self._grid[new_string_point] = new_string for other_color_string in adhecent_opposite_color: #當該點被佔據前,它屬於反色棋子的自由點,佔據後就不再屬於反色棋子自由點 other_color_string.remove_liberty(point) for other_color_string in adhecent_opposite_color: #如果落子後,相鄰反色棋子的所有自由點都被堵住,對方棋子被喫掉 if other_color_string.num_liberties == 0: self._remove_string(other_color_string) def is_on_grid(self,point): return 1 <= point.row <= self.num_rows and 1 <= point.col <= self.num_cols def get(self, point): string = self._grid.get(point) if string is None: return None return string.color def get_go_string(self,point): string = self._grid.get(point) if string is None: return None return string def _remove_string(self,string): #從棋盤上刪除一整片連接棋子 for point in string.stones: for neighbor in point.neighbors(): neighbor_string = self._grid.get(neighbor) if neighbor_string is None: continue if neighbor_string is not string: neighbor_string.add_liberty(point) self._grid[point] = None
落子和棋盤都完成了,由於每次落子到棋盤上後,棋局的狀態會發生變化,接下來我們完成棋盤狀態的檢測和落子法性檢測,狀態檢測會讓程序得知以下信息:各個棋子的擺放位置;輪到誰落子;落子前的棋盤狀態,以及最後一次落子信息,以及落子後棋盤的狀態變化:
#棋盤狀態的檢測和落子檢測 class GameState(): def __init__(self, board, next_player, previous, move): self.board = board self.next_player = next_player self.previous_state = previous self.last_move = move def apply_move(self, move): if move.is_play: next_board = copy.deepcopy(self.board) next_board.place_stone(self.next_player, move.point) else: next_board = self.board return GameState(next_board, self.next_player.other, self, move) @classmethod def new_game(cls, board_size): if isinstance(board_size,int): board_size = (board_size,board_size) board = Board(*board_size) return GameState(board, Player.black, None, None) def is_over(self): if self.last_move is None: return False if self.last_move.is_resign: return True second_last_move = self.previous_state.last_move if second_last_move is None: return False #如果兩個棋手同時放棄落子,棋局結束 return self.last_move.is_pass and second_last_move.is_pass
接下來我們需要確定,落子時是否合法。
因此我們需要確定三個條件,落子的位置沒有被佔據;落子時不構成自己喫自己;落子不違反ko原則。
第一個原則檢測很簡單,我們看看第二原則:
我們看上圖,三個黑棋連片只有一個自由點,那就是小方塊所在位置。但不管黑棋要不要堵住那個點,三個黑子終究要被喫掉,因此黑棋不能在小方塊所在位置落點,因爲落點後,四個黑棋連片,但卻再也沒有自由點,於是黑棋下在小方塊位置,反而被對方喫的更多,這就叫自己喫自己,絕大多數圍棋比賽都不允許這樣的下法。
但是下面請看就不同了:
當黑棋下在小方塊處,它能把中間兩個白棋喫掉,因此就不算是自己喫自己,因爲中間兩個白棋拿掉後,黑棋就會有自由點。因此程序必須在落子結束,拿掉所有被喫棋子後,才能檢查該步是否形成自己喫自己:
def is_move_self_capture(self,player,move): if not move.is_play: return False next_board = copy.deepcopy(self.board) #先落子,完成喫子後再判斷是否是自己喫自己 next_board.place_stone(player, move.point) new_string = next_board.get_go_string(move.point) return new_string.num_liberties == 0
接下來我們完成ko的檢測,也就是對方落子後,你的走棋方式不能把棋盤恢復到對方落子前的局面。由於我們上面實現的GameState類保留了落子前狀態,因此當有新落子後,我們把當前狀態跟以前狀態比對,如果發現有比對上的,那表明當前落子屬於ko。
但這裏實現的does_move_violate_ko效率比較差,因爲每下一步棋,我們就得執行該函數,它會搜索過往所有棋盤狀態進行比較,如果當前已經下了幾百手,那麼每下一步,它就得進行幾百次比對,因此效率會非常慢,後面我們會有辦法改進它的效率。
@property def situation(self): return (self.next_player, self.board) def does_move_violate_ko(self,player,move): if not move.is_play: return False next_board = copy.deepcopy(self.board) next_board.place_stone(player,move.point) next_situation = (player.other, next_board) past_state = self.previous_state #判斷ko不僅僅看是否返回上一步的棋盤而是檢測能否返回以前有過的棋盤狀態 while past_state is not None: if past_state.situation == next_situation: return True return False def is_valid_move(self, move): if self.is_over(): return False if move.is_pass or move.is_resign: return True return (self.board.get(move.point) is None and not self.is_move_self_capture(self.next_player, move) and not self.does_move_violate_ko(self.next_player,move))
最後我們需要預防機器人下棋時,把自己的棋眼給堵死,例如下面棋局:
如果機器人下的是白棋,那麼它不能自己把A,B點給堵上,因爲堵上後,黑棋會把所有白棋喫掉,因此我們必須增加代碼邏輯檢測這種情況。我們對棋眼的定義是,所有的鄰接點都被己方棋子佔據的位置,並且該棋子四個對角線位置中至少有3個被己方棋子佔據,如果棋子落子棋盤邊緣,那麼我們要求它所有對角線位置都被己方棋子佔據,實現代碼如下:
def is_point_an_eye(board,point,color): if board.get(point) is not None: return False for neighbor in point.neighbors(): #檢測鄰接點全是己方棋子 if board.is_on_grid(neighbor): neighbor_color = board.get(neighbor) if neighbor_color != color: return False #四個對角線位置至少三個被己方棋子佔據 friendly_corners = 0 off_board_corners = 0 corners = [ Point(point.row - 1, point.col - 1), Point(point.row - 1, point.col + 1), Point(point.row + 1, point.col - 1), Point(point.row + 1, point.col + 1), ] for corner in corners: if board.is_on_grid(corner): corner_color = board.get(corner) if corner_color == color: friendly_corners += 1 else: off_board_corners += 1 if off_board_corners > 0: return off_board_corners + friendly_corners == 4 return friendly_corners >= 3
總的
AlphaGo.py
import enum class Player(enum.Enum): black = 1 white = 2 #返回對方棋子顏色,如果本方是白棋,就返回Player.black @property def other(self): if self == Player.white: return Player.black else: return Player.white from collections import namedtuple class Point(namedtuple('Point','row col')): def neighbors(self): #返回當前的相鄰點,也就是相對於當前點的上下左右四個點 return [ Point(self.row - 1, self.col), Point(self.row + 1, self.col), Point(self.row, self.col - 1), Point(self.row, self.col + 1), ] import copy class Move(): def __init__(self, point=None, is_pass=False, is_resign=False): assert(point is not None) ^is_pass ^is_resign self.point = point #是否輪到我下 self.is_play(self.point is not None) self.is_pass = is_pass self.is_resign = is_resign @classmethod def play(cls, point): return Move(point = point) @classmethod #讓對方繼續下 def pass_turn(cls): return Move(is_pass = True) @classmethod #投子認輸 def resign(cls): return Move(is_resign = True) #多個棋子連成一片 class GoString(): def __init__(self, color, stones, liberties): self.color = color #黑/白 self.stones = set(stones) #stone就是棋子 self.liberties = set(liberties) #自由點 def remove_liberty(self, point): self.liberties.remove(point) def add_liberty(self, point): self.liberties.add(point) def merged_with(self, go_string): # 落子之後,兩片相鄰棋子可能會合成一片 ''' 假設*代表黑棋,o代表白棋,x代表沒有落子的棋盤點,當前棋盤如下: x x x x x x x * x! * o * x x x * o x x x * x o x x x * o x x 注意看帶!的x,如果我們把黑子下在那個地方,那麼x!左邊的黑棋和新下的黑棋會調用當前函數進行合併, 同時x!上方的x和下面的x就會成爲合併後相鄰棋子共同具有的自由點。同時x!原來屬於左邊黑棋的自由點, 現在被一個黑棋佔據了,所以下面代碼要把該點從原來的自由點集合中去掉 ''' assert go_string.color == self.color combined_stones = self.stones | go_string.stones return GoString(self.color, combined_stones, (self.liberties | go_string.liberties) - combined_stones) @property def num_liberties(self): #自由點的數量 return len(self.liberties) def __eq__(self,other): #是否相等 return isinstance(other, GoString) and self.color == other.color and self.stones == other.stones and self.liberties == other.liberties #實現棋盤 class Board(): def __init__(self,num_rows,num_cols): self.num_rows = num_rows self.num_cols = num_cols self._grid = {} def place_stone(self,player,point): #確保位置在棋盤內 assert self.is_on_grid(point) #確保給定位置沒有被佔據 assert self._grid.get(point) is None adjecent_same_color = [] adhecent_opposite_color = [] liberties = [] for neighbor in point.neighbors(): #判斷落子點上下左右臨界點的情況 if not self.is_on_grid(neighbor): continue neighbor_string = self._grid.get(neighbor) if neighbor_string is None: #如果鄰接點沒有被佔據,那麼就是當前落子點的自由點 liberties.append(neighbor) elif neighbor_string.color == player: if neighbor_string not in adjecent_same_color: #記錄與棋子同色的連接棋子 adjecent_same_color.append(neighbor_string) else: if neighbor_string not in adhecent_opposite_color: #記錄落點與鄰接點棋子不同色的棋子 adhecent_opposite_color.append(neighbor_string) #將當前落子與棋盤上相鄰的棋子合併成一片 new_string = GoString(player, [point], liberties) for same_color_string in adjecent_same_color: new_string = new_string.merged_with(same_color_string) for new_string_point in new_string.stones: #訪問棋盤某個點時返回與該棋子相鄰的所以棋子集合 self._grid[new_string_point] = new_string for other_color_string in adhecent_opposite_color: #當該點被佔據前,它屬於反色棋子的自由點,佔據後就不再屬於反色棋子自由點 other_color_string.remove_liberty(point) for other_color_string in adhecent_opposite_color: #如果落子後,相鄰反色棋子的所有自由點都被堵住,對方棋子被喫掉 if other_color_string.num_liberties == 0: self._remove_string(other_color_string) def is_on_grid(self,point): return 1 <= point.row <= self.num_rows and 1 <= point.col <= self.num_cols def get(self, point): string = self._grid.get(point) if string is None: return None return string.color def get_go_string(self,point): string = self._grid.get(point) if string is None: return None return string def _remove_string(self,string): #從棋盤上刪除一整片連接棋子 for point in string.stones: for neighbor in point.neighbors(): neighbor_string = self._grid.get(neighbor) if neighbor_string is None: continue if neighbor_string is not string: neighbor_string.add_liberty(point) self._grid[point] = None #棋盤狀態的檢測和落子檢測 class GameState(): def __init__(self, board, next_player, previous, move): self.board = board self.next_player = next_player self.previous_state = previous self.last_move = move def apply_move(self, move): if move.is_play: next_board = copy.deepcopy(self.board) next_board.place_stone(self.next_player, move.point) else: next_board = self.board return GameState(next_board, self.next_player.other, self, move) @classmethod def new_game(cls, board_size): if isinstance(board_size,int): board_size = (board_size,board_size) board = Board(*board_size) return GameState(board, Player.black, None, None) def is_over(self): if self.last_move is None: return False if self.last_move.is_resign: return True second_last_move = self.previous_state.last_move if second_last_move is None: return False #如果兩個棋手同時放棄落子,棋局結束 return self.last_move.is_pass and second_last_move.is_pass def is_move_self_capture(self,player,move): if not move.is_play: return False next_board = copy.deepcopy(self.board) #先落子,完成喫子後再判斷是否是自己喫自己 next_board.place_stone(player, move.point) new_string = next_board.get_go_string(move.point) return new_string.num_liberties == 0 @property def situation(self): return (self.next_player, self.board) def does_move_violate_ko(self,player,move): if not move.is_play: return False next_board = copy.deepcopy(self.board) next_board.place_stone(player,move.point) next_situation = (player.other, next_board) past_state = self.previous_state #判斷ko不僅僅看是否返回上一步的棋盤而是檢測能否返回以前有過的棋盤狀態 while past_state is not None: if past_state.situation == next_situation: return True return False def is_valid_move(self, move): if self.is_over(): return False if move.is_pass or move.is_resign: return True return (self.board.get(move.point) is None and not self.is_move_self_capture(self.next_player, move) and not self.does_move_violate_ko(self.next_player,move)) def is_point_an_eye(board,point,color): if board.get(point) is not None: return False for neighbor in point.neighbors(): #檢測鄰接點全是己方棋子 if board.is_on_grid(neighbor): neighbor_color = board.get(neighbor) if neighbor_color != color: return False #四個對角線位置至少三個被己方棋子佔據 friendly_corners = 0 off_board_corners = 0 corners = [ Point(point.row - 1, point.col - 1), Point(point.row - 1, point.col + 1), Point(point.row + 1, point.col - 1), Point(point.row + 1, point.col + 1), ] for corner in corners: if board.is_on_grid(corner): corner_color = board.get(corner) if corner_color == color: friendly_corners += 1 else: off_board_corners += 1 if off_board_corners > 0: return off_board_corners + friendly_corners == 4 return friendly_corners >= 3
參考: