raft理論與實踐[4]-lab2b

準備工作
1、閱讀raft論文
2、閱讀raft理論與實踐[1]-理論篇
3、閱讀raft理論與實踐[2]-lab2a
4、閱讀raft理論與實踐[3]-lab2a講解
5、查看我寫的這篇文章：模擬RPC遠程過程調用

執行日誌

我們需要執行日誌中的命令，因此在make函數中，新開一個協程:applyLogEntryDaemon()

func Make(peers []*labrpc.ClientEnd, me int,
    persister *Persister, applyCh chan ApplyMsg) *Raft {
    ...
    go rf.applyLogEntryDaemon() // start apply log
    DPrintf("[%d-%s]: newborn election(%s) heartbeat(%s) term(%d) voted(%d)\n",
        rf.me, rf, rf.electionTimeout, rf.heartbeatInterval, rf.CurrentTerm, rf.VotedFor)
    return rf
}

一個死循環
1、如果rf.lastApplied == rf.commitIndex, 意味着commit log entry命令都已經被執行了，這時用信號量陷入等待。
一旦收到信號，說明需要執行命令。這時會把最後執行的log entry之後，一直到最後一個commit log entry的所有log都傳入通道apply中進行執行。
由於是測試，處理apply的邏輯會在測試代碼中。

// applyLogEntryDaemon exit when shutdown channel is closed
func (rf *Raft) applyLogEntryDaemon() {
    for {
        var logs []LogEntry
        // wait
        rf.mu.Lock()
        for rf.lastApplied == rf.commitIndex {
            rf.commitCond.Wait()
            select {
            case <-rf.shutdownCh:
                rf.mu.Unlock()
                DPrintf("[%d-%s]: peer %d is shutting down apply log entry to client daemon.\n", rf.me, rf, rf.me)
                close(rf.applyCh)
                return
            default:
            }
        }
        last, cur := rf.lastApplied, rf.commitIndex
        if last < cur {
            rf.lastApplied = rf.commitIndex
            logs = make([]LogEntry, cur-last)
            copy(logs, rf.Logs[last+1:cur+1])
        }
        rf.mu.Unlock()
        for i := 0; i < cur-last; i++ {
            // current command is replicated, ignore nil command
            reply := ApplyMsg{
                CommandIndex: last + i + 1,
                Command:      logs[i].Command,
                CommandValid: true,
            }
            // reply to outer service
            // DPrintf("[%d-%s]: peer %d apply %v to client.\n", rf.me, rf, rf.me)
            DPrintf("[%d-%s]: peer %d apply to client.\n", rf.me, rf, rf.me)
            // Note: must in the same goroutine, or may result in out of order apply
            rf.applyCh <- reply
        }
    }
}

新增 Start函數，此函數爲leader執行從client發送過來的命令。
當client發送過來之後，首先需要做的就是新增entry 到leader的log中。並且將自身的nextIndex 與matchIndex 更新。

func (rf *Raft) Start(command interface{}) (int, int, bool) {
    index := -1
    term := 0
    isLeader := false

    // Your code here (2B).
    select {
    case <-rf.shutdownCh:
        return -1, 0, false
    default:
        rf.mu.Lock()
        defer rf.mu.Unlock()
        // Your code here (2B).
        if rf.state == Leader {
            log := LogEntry{rf.CurrentTerm, command}
            rf.Logs = append(rf.Logs, log)

            index = len(rf.Logs) - 1
            term = rf.CurrentTerm
            isLeader = true

            //DPrintf("[%d-%s]: client add new entry (%d-%v), logs: %v\n", rf.me, rf, index, command, rf.logs)
            DPrintf("[%d-%s]: client add new entry (%d)\n", rf.me, rf, index)
            //DPrintf("[%d-%s]: client add new entry (%d-%v)\n", rf.me, rf, index, command)

            // only update leader
            rf.nextIndex[rf.me] = index + 1
            rf.matchIndex[rf.me] = index
        }
    }

    return index, term, isLeader
}

接下來最重要的部分涉及到日誌複製，這是通過AppendEntries實現的。我們知道leader會不時的調用consistencyCheck(n)進行一致性檢查。
在給第n號節點一致性檢查時，首先獲取pre = rf.nextIndex，pre至少要爲1。代表要給n節點發送的log index。因此AppendEntriesArgs參數中，PrevLogIndex 與 prevlogTerm 都爲pre - 1位置。
代表leader相信PrevLogIndex及其之前的節點都是與leader相同的。
將pre及其之後的entry 加入到AppendEntriesArgs參數中。這些log entry可能是與leader不相同的，或者是follower根本就沒有的。

func (rf *Raft) consistencyCheck(n int) {
    rf.mu.Lock()
    defer rf.mu.Unlock()
    pre := max(1,rf.nextIndex[n])
    var args = AppendEntriesArgs{
        Term:         rf.CurrentTerm,
        LeaderID:     rf.me,
        PrevLogIndex: pre - 1,
        PrevLogTerm:  rf.Logs[pre - 1].Term,
        Entries:      nil,
        LeaderCommit: rf.commitIndex,
    }

    if rf.nextIndex[n] < len(rf.Logs){
        args.Entries = append(args.Entries, rf.Logs[pre:]...)
    }

    go func() {
        DPrintf("[%d-%s]: consistency Check to peer %d.\n", rf.me, rf, n)
        var reply AppendEntriesReply
        if rf.sendAppendEntries(n, &args, &reply) {
            rf.consistencyCheckReplyHandler(n, &reply)
        }
    }()
}

接下來查看follower執行AppendEntries時的反應。
AppendEntries會新增兩個返回參數：
ConflictTerm代表可能發生衝突的term
FirstIndex 代表可能發生衝突的第一個index。

type AppendEntriesReply struct {
    CurrentTerm int  // currentTerm, for leader to update itself
    Success     bool // true if follower contained entry matching prevLogIndex and prevLogTerm
    // extra info for heartbeat from follower
    ConflictTerm int // term of the conflicting entry
    FirstIndex   int // the first index it stores for ConflictTerm
}

如果args.PrevLogIndex < len(rf.Logs), 表明至少當前節點的log長度是合理的。
令preLogIdx 與 args.PrevLogIndex相等。prelogTerm爲當前follower節點preLogIdx位置的term。
如果擁有相同的term，說明follower與leader 在preLogIdx之前的log entry都是相同的。因此請求是成功的。
此時會截斷follower的log，將傳遞過來的entry加入到follower的log之後，執行此步驟後，強制要求與leader的log相同了。
請求成功後，reply的ConflictTerm爲最後一個log entry的term,reply的FirstIndex爲最後一個log entry的index。

否則說明leader與follower的日誌是有衝突的，衝突的原因可能是：
1、leader認爲的match log entry超出了follower的log個數，或者follower 還沒有任何log entry（除了index爲0的entry是每一個節點都有的）。
2、log在相同的index下，leader的term 與follower的term確是不同的。
這時找到follower衝突的term即爲ConflictTerm。
獲取此term的第一個entry的index即爲FirstIndex。
所以最後，AppendEntries會返回衝突的term以及第一個可能衝突的index。

// AppendEntries handler, including heartbeat, must backup quickly
func (rf *Raft) AppendEntries(args *AppendEntriesArgs, reply *AppendEntriesReply) {
    ...
    preLogIdx, preLogTerm := 0, 0
    if args.PrevLogIndex < len(rf.Logs) {
        preLogIdx = args.PrevLogIndex
        preLogTerm = rf.Logs[preLogIdx].Term
    }

    // last log is match
    if preLogIdx == args.PrevLogIndex && preLogTerm == args.PrevLogTerm {
        reply.Success = true
        // truncate to known match
        rf.Logs = rf.Logs[:preLogIdx+1]
        rf.Logs = append(rf.Logs, args.Entries...)
        var last = len(rf.Logs) - 1

        // min(leaderCommit, index of last new entry)
        if args.LeaderCommit > rf.commitIndex {
            rf.commitIndex = min(args.LeaderCommit, last)
            // signal possible update commit index
            go func() { rf.commitCond.Broadcast() }()
        }
        // tell leader to update matched index
        reply.ConflictTerm = rf.Logs[last].Term
        reply.FirstIndex = last

        if len(args.Entries) > 0 {
            DPrintf("[%d-%s]: AE success from leader %d (%d cmd @ %d), commit index: l->%d, f->%d.\n",
                rf.me, rf, args.LeaderID, len(args.Entries), preLogIdx+1, args.LeaderCommit, rf.commitIndex)
        } else {
            DPrintf("[%d-%s]: <heartbeat> current logs: %v\n", rf.me, rf, rf.Logs)
        }
    } else {
        reply.Success = false

        // extra info for restore missing entries quickly: from original paper and lecture note
        // if follower rejects, includes this in reply:
        //
        // the follower's term in the conflicting entry
        // the index of follower's first entry with that term
        //
        // if leader knows about the conflicting term:
        //      move nextIndex[i] back to leader's last entry for the conflicting term
        // else:
        //      move nextIndex[i] back to follower's first index
        var first = 1
        reply.ConflictTerm = preLogTerm
        if reply.ConflictTerm == 0 {
            // which means leader has more logs or follower has no log at all
            first = len(rf.Logs)
            reply.ConflictTerm = rf.Logs[first-1].Term
        } else {
            i := preLogIdx
            // term的第一個log entry
            for ; i > 0; i-- {
                if rf.Logs[i].Term != preLogTerm {
                    first = i + 1
                    break
                }
            }
        }
        reply.FirstIndex = first
        if len(rf.Logs) <= args.PrevLogIndex {
            DPrintf("[%d-%s]: AE failed from leader %d, leader has more logs (%d > %d), reply: %d - %d.\n",
                rf.me, rf, args.LeaderID, args.PrevLogIndex, len(rf.Logs)-1, reply.ConflictTerm,
                reply.FirstIndex)
        } else {
            DPrintf("[%d-%s]: AE failed from leader %d, pre idx/term mismatch (%d != %d, %d != %d).\n",
                rf.me, rf, args.LeaderID, args.PrevLogIndex, preLogIdx, args.PrevLogTerm, preLogTerm)
        }
    }
}

leader調用AppendEntries後，會執行回調函數consistencyCheckReplyHandler。
如果調用是成功的，那麼正常的跟新matchIndex，nextIndex即下一個要發送的index應該爲matchIndex + 1。

如果調用失敗，說明有衝突。
如果confiicting term等於0，說明了leader認爲的match log entry超出了follower的log個數，或者follower 還沒有任何log entry（除了index爲0的entry是每一個節點都有的）。
此時簡單的讓nextIndex 爲reply.FirstIndex即可。

如果confiicting term不爲0，獲取leader節點confiicting term 的最後一個log index，此時nextIndex 應該爲此index與reply.FirstIndex的最小值。
檢查最小值是必須的：
假設
s1: 0-0 1-1 1-2 1-3 1-4 1-5
s2: 0-0 1-1 1-2 1-3 1-4 1-5
s3: 0-0 1-1

此時s1爲leader，並一致性檢查s3, 從1-5開始檢查，此時由於leader有更多的log，因此檢查不成功，返回confict term 1， firstindex：2
如果只是獲取confiicting term 的最後一個log index，那麼nextIndex又是1-5，陷入了死循環。

func (rf *Raft) consistencyCheckReplyHandler(n int, reply *AppendEntriesReply) {
    rf.mu.Lock()
    defer rf.mu.Unlock()

    if rf.state != Leader {
        return
    }
    if reply.Success {
        // RPC and consistency check successful
        rf.matchIndex[n] = reply.FirstIndex
        rf.nextIndex[n] = rf.matchIndex[n] + 1
        rf.updateCommitIndex() // try to update commitIndex
    } else {
        // found a new leader? turn to follower
        if rf.state == Leader && reply.CurrentTerm > rf.CurrentTerm {
            rf.turnToFollow()
            rf.resetTimer <- struct{}{}
            DPrintf("[%d-%s]: leader %d found new term (heartbeat resp from peer %d), turn to follower.",
                rf.me, rf, rf.me, n)
            return
        }

        // Does leader know conflicting term?
        var know, lastIndex = false, 0
        if reply.ConflictTerm != 0 {
            for i := len(rf.Logs) - 1; i > 0; i-- {
                if rf.Logs[i].Term == reply.ConflictTerm {
                    know = true
                    lastIndex = i
                    DPrintf("[%d-%s]: leader %d have entry %d is the last entry in term %d.",
                        rf.me, rf, rf.me, i, reply.ConflictTerm)
                    break
                }
            }
            if know {
                rf.nextIndex[n] = min(lastIndex, reply.FirstIndex)
            } else {
                rf.nextIndex[n] = reply.FirstIndex
            }
        } else {
            rf.nextIndex[n] = reply.FirstIndex
        }
        rf.nextIndex[n] = min(rf.nextIndex[n], len(rf.Logs))
        DPrintf("[%d-%s]: nextIndex for peer %d  => %d.\n",
            rf.me, rf, n, rf.nextIndex[n])
    }
}

當調用AppendEntry成功後，說明follower與leader的log是匹配的。此時leader會找到commited的log並且執行其命令。
這裏有一個比較巧妙的方法，對matchIndex排序後取最中間的數。
由於matchIndex代表follower有多少log與leader的log匹配，因此中間的log index意味着其得到了大部分節點的認可。
因此會將此中間的index之前的所有log entry都執行了。
rf.Logs[target].Term == rf.CurrentTerm 是必要的：
這是由於當一個entry出現在大多數節點的log中，並不意味着其一定會成爲commit。考慮下面的情況：

  S1: 1 2     1 2 4
  S2: 1 2     1 2
  S3: 1   --> 1 2
  S4: 1       1
  S5: 1       1 3

s1在term2成爲leader，只有s1，s2添加了entry2.
s5變成了term3的leader，之後s1變爲了term4的leader，接着繼續發送entry2到s3中。
此時，如果s5再次變爲了leader，那麼即便沒有S1的支持，S5任然變爲了leader，並且應用entry3，覆蓋掉entry2。
所以一個entry要變爲commit，必須：
1、在其term週期內，就複製到大多數。
2、如果隨後的entry被提交。在上例中，如果s1持續成爲term4的leader，那麼entry2就會成爲commit。

這是由於以下原因造成的：
更高任期爲最新的投票規則，以及leader將其日誌強加給follower。

// updateCommitIndex find new commit id, must be called when hold lock
func (rf *Raft) updateCommitIndex() {
    match := make([]int, len(rf.matchIndex))
    copy(match, rf.matchIndex)
    sort.Ints(match)

    DPrintf("[%d-%s]: leader %d try to update commit index: %v @ term %d.\n",
        rf.me, rf, rf.me, rf.matchIndex, rf.CurrentTerm)

    target := match[len(rf.peers)/2]
    if rf.commitIndex < target {
        //fmt.Println("target:",target,match)
        if rf.Logs[target].Term == rf.CurrentTerm {
            //DPrintf("[%d-%s]: leader %d update commit index %d -> %d @ term %d command:%v\n",
            //  rf.me, rf, rf.me, rf.commitIndex, target, rf.CurrentTerm,rf.Logs[target].Command)

            DPrintf("[%d-%s]: leader %d update commit index %d -> %d @ term %d\n",
                rf.me, rf, rf.me, rf.commitIndex, target, rf.CurrentTerm)

            rf.commitIndex = target
            go func() { rf.commitCond.Broadcast() }()
        } else {
            DPrintf("[%d-%s]: leader %d update commit index %d failed (log term %d != current Term %d)\n",
                rf.me, rf, rf.me, rf.commitIndex, rf.Logs[target].Term, rf.CurrentTerm)
        }
    }
}

參考

講義
 講義新

raft理論與實踐[4]-lab2b

執行日誌

參考

【原創】golang快速入門[9.3]-精深奧妙的切片功夫

【原創】golang快速入門[9.3]-精深奧妙的切片功夫

golang快速入門[8.4]-常量與隱式類型轉換

golang快速入門[8.3]-深入理解IEEE754浮點數

golang快速入門[8.2]-自動類型推斷的祕密

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結