【數據結構】樹（一）：字典搜索樹&並查集&哈夫曼編碼（C++實現）

大三狗發現知識遺忘率實在是太高了，決心今天開始好好複習，基本上是把這裏當複習筆記用了，畢竟看英語教材超苦=^=給自己找點動力，歡迎大家來一起學習呀~

樹

一. 基本概念

樹（tree）：由一個點集（vertices）及邊的集合（edges/branches）構成，這個結構符合兩個條件(1)對於任意一個節點都存在一個邊的序列（路徑，path）使得該節點與其他節點相互連通；(2) 結構中不存在環路（circuits），即不存在一條路徑使得一個節點能夠回到起點。
- 根（root）：沒有父節點的節點，如節點A。
- 子節點（child node）&父節點（parent node）：節點的子樹的根爲子節點，如節點A的子節點爲B、C、D。若一個結點含有子節點，則這個節點稱爲其子節點的父節點，如節點A爲節點B、C、D的父節點。
- 分支節點：度不爲0的節點。
- 葉節點（leaf node）：沒有子節點的節點。如G、H、F、D。
- 兄弟節點（sibling node）：具有相同父節點的節點互爲兄弟節點。如節點G、H互爲兄弟節點。
- 祖先（ancestor）：從根節點到該節點所經過的路徑上所有節點。
- 後裔（descendant）：以該節點爲根的子樹中的所有節點均爲後裔。
- 子樹（subtree）：設T是有根樹，a是T中的一個頂點，由a以及a的所有後裔（後代）導出的子圖稱爲有向樹T的子樹，a是子樹的根。
- 森林（forest）：一個樹的無序集合，通常假設森林中的樹都是有根樹。
- 果園（orchard）：也可稱爲有序森林（ordered forest），空集或者有序樹的有序集合。
- 路徑（path）：從節點N1到節點Nk的路徑是一個節點序列N1, n2, …, Nk（1 ≤ i < k），其中Ni爲Ni+1的父節點。路徑的長度爲路徑中邊的數量，爲k-1。
- 深度（depth）：從根節點到該節點存在唯一路徑，該路徑的長度爲該節點的深度。根節點的深度爲0。
- 高度（height）：從該節點到葉節點的最長路徑長度爲節點高度。葉節點的高度爲0，樹的高度爲根節點的高度。
- 度（degree）：節點的子節點個數，樹的度爲最大的節點的度。
- 層次（level）：從根開始定義起，根爲第0層，根的子節點爲第1層，以此類推。
- 滿二叉樹（full binary tree）：除葉節點外，所有節點都有兩個1子節點。
- 完全二叉樹（complete binary tree）：若設二叉樹的深度爲h，除第 h 層外，其它各層 (1～h-1) 的結點數都達到最大個數，第 h 層所有的結點都連續集中在最左邊。
- 理想二叉樹（perfect binary tree）：除最後一層無任何子節點外，每一層上的所有結點都有兩個子結點的樹稱爲理想二叉樹。高度爲h（從0開始算起）且包含2^(h+1)-1個節點的二叉樹。
- 最優二叉樹（哈夫曼樹）：給定n個權值作爲n個葉子結點，構造一棵二叉樹，若帶權路徑長度達到最小，稱這樣的二叉樹爲最優二叉樹，也稱爲哈夫曼樹(Huffman Tree)。

二. 樹的表示方法

雙親表示法（parent method）

/* 樹的雙親表示法結點結構定義 */
#define MAX_TREE_SIZE 100
typedef char TElemType; /* 樹結點的數據類型 */

struct PTNode /* 結點結構 */
{
    TElemType data; /* 結點數據 */
    int parent;     /* 雙親位置 */
};

struct PTree        /* 樹結構 */
{
    PTNode nodes[MAX_TREE_SIZE]; /* 結點數組 */
    int r, n;       /* 根節點的位置和結點數 */
};

這種表示方法使得節點的父節點十分容易得到，但是節點的子節點難以獲取（需要遍歷整個表）。
2. 多重鏈表表示法（Multiple links）
(1) 每個節點都包含d個指針，d是樹中節點的最大度數（degree）。

(2) 另一種表示方法：用一個數字d聲明節點的度數，指針域包含d個指針。

3. 孩子鏈表表示法（child-link）

/* 孩子鏈表表示法的結構定義 */
#define MAX_TREE_SIZE 100
typedef char TElemType; /* 樹結點的數據類型 */
typedef struct CTNode   /* 孩子結點結構 */
{
    int child;
    CTNode *next;   
} *ChildPtr;        
struct CTBox        /* 表頭結構 */
{
    TElemType data; 
    ChildPtr firstchild;    
};      
struct CTree        /* 樹結構 */
{
    CTBox nodes[MAX_TREE_SIZE]; /* 結點數組 */
    int r, n;   /* 根節點的位置和結點數 */
};

4 孩子兄弟表示法（First child next sibling）

/* 樹的節點定義 */
struct TreeNode 
{
    TElemType data;
    TreeNode *firstChild;
    TreeNode *nextSibling;
};

5 森林遍歷（Forest Traverse）
對森林前序遍歷需要先將其轉換爲對應二叉樹。

三. 一些簡單應用

1 字典搜索樹（Lexicographic Search Trees）：trie(retrieval的截取，字典樹)[1]

除根節點之外的所有節點都存儲一個字符，從根節點到某一個節點A，路徑上經過的所有字符構成節點A對應的字符串。具有同一父節點的節點存儲的字符不同。主要用於處理字符串公共前綴相關的問題。
優點是利用字符串的公共前綴來減少查詢時間，最大限度地減少無謂的字符串比較，查詢效率比哈希樹高。缺點是如果系統中存在大量字符串且這些字符串基本沒有公共前綴，則相應的trie樹將非常消耗內存。

#include<iostream>
using namespace std;
#define num_chars 26
struct TrieNode{
    int count;
    TrieNode *branches[num_chars];
};

class Trie{
    public:
        // constructor
        Trie(){
            root = new TrieNode;
            for(int i=0; i<num_chars; i++){
                root->branches[i] = NULL;
                root->count = 0;
            }
        }
        // destructor
        ~Trie(){
            deleteNode(root);
            root = NULL;
        }

        // create or insert new string
        void insert(const string str){
            int len = str.length();
            if(len<=0) return;

            TrieNode *recNode = root;
            for(int i=0; i<len; i++){
                if(recNode->branches[str[i]-'a']==NULL){
                    TrieNode *tmp = new TrieNode;
                    for(int j=0; j<num_chars; j++)
                        tmp->branches[j] = NULL;
                    tmp->count = 0;
                    recNode->branches[str[i]-'a'] = tmp;
                    recNode = tmp;
                }
                else{
                    recNode = recNode->branches[str[i]-'a'];
                }
            }
            recNode->count++;
        } 

        // Check whether a string exists in the trie
        bool search(const string str){
            int len = str.length();
            if(len<=0) return true;
            TrieNode *recNode = root;
            for(int i=0; i<len; i++){
                if(recNode->branches[str[i]-'a']==NULL)
                    return false;
                recNode = recNode->branches[str[i]-'a'];
            }
            if(recNode->count>0) return true;

            return false;
        }

    protected:
        void deleteNode(TrieNode *node){
            for(int i=0; i<num_chars; i++){
                if(node->branches[i]!=NULL){
                    deleteNode(node->branches[i]);
                }
            }
            delete node;
        }

    private:
        TrieNode *root;
};

void Test(){
    Trie trie;
    trie.insert("hello");
    trie.insert("he");
    trie.insert("her");
    trie.insert("world");
    trie.insert("word");
    if(trie.search("hello")) cout << "YES" << endl;
    else cout << "NO" << endl; 
    if(trie.search("hel")) cout << "YES" << endl;
    else cout << "NO" << endl;
    if(trie.search("helloooo")) cout << "YES" << endl;
    else cout << "NO" << endl;
}

int main(){
    Test(); 
    return 0;
}

2 並查集（Disjoint-set Forest）

並查集用樹型的數據結構表示不相交集合，集合中的每個節點都存儲其父親節點的引用（用雙親表示法表示）。主要用於處理不相交集合的合併及查詢。
優化：每次查找的時候，如果路徑較長，則修改信息，以便下次查找的時候速度更快（修改查找路徑上的所有節點，將它們都指向根結點）。

#include <iostream>
using namespace std;
#define MAX 50005
int father[MAX];

// Find the father of node x
int findFather(int x){
    if(father[x] != x) 
        return father[x] = findFather(father[x]); // path compression
    else return x;
}

// a and b are in the same set. The trees 
// that a and b belong to should be combined
int combineTree(int a, int b){
    father[findFather(a)] = findFather(b);
}

int main(){
    int numNode, numEdge, numQuery, tmp1, tmp2;
    cin >> numNode >> numEdge >> numQuery;

    // Initialize the father of each node as itself
    for(int i=1; i<=numNode; i++){
        father[i]=i;
    } 
    // if two trees are connected, combine them
    for(int i=1; i<=numEdge; i++){
        cin >> tmp1 >> tmp2;
        combineTree(tmp1, tmp2);
    }
    // check if two nodes are in one set
    for(int i=1; i<=numQuery; i++){
        cin >> tmp1 >> tmp2;
        if(findFather(tmp1)==findFather(tmp2)) cout << "YES" << endl;
        else cout << "NO" << endl;
    }
}

3 哈夫曼編碼（Huffman Coding）

哈夫曼編碼依據字符出現概率來構造異字頭的平均長度最短的碼字（用0/1編碼）。
前綴碼（prefix code）：任何一個字符的編碼都不能是另一個字符編碼的前綴。
通常使用滿二叉樹進行哈夫曼編碼，得到的二叉樹稱爲哈夫曼樹（Huffman tree）。簡單示例如圖所示：

每一個葉節點都代表一個值的編碼。f(c) 表示字符c的出現頻數，dT(c) 是代表字符c編碼的葉節點，在哈夫曼樹中的深度，通過以下函數可以計算出編碼一個文件所需要的比特數。即爲哈夫曼樹的費用（cost）。

B (T) = \sum c \in C f (c) d T (c)

哈夫曼算法（Huffman Algorithm）：貪心算法，步驟如下
1. 創建一個森林包含s個節點，每個節點代表一個字符，節點間互相獨立，每個節點有一個對應的數值，爲該字符的頻數。這些頻數被放進優先隊列中。
2. 接着重複以下步驟s-1次：
(1) 移除優先隊列中值最小的兩個節點L和R，創建一個節點作爲L和R的父節點。
(2) 計算出創建的節點的數值爲L和R的頻數之和，並將該數值插入優先隊列中。

#include <iostream>
#include <vector>
#include <queue>
using namespace std;

struct Node{
    int freq;
    Node* left;
    Node* right;
};

struct cmpNode{
    bool operator()(const Node* a, const Node* b){
        return a->freq >= b->freq;
    }
}; 

Node* mergeTree(Node* &small1, Node* &small2){
    Node* newNode = new Node();
    newNode->freq = small1->freq + small2->freq;
    newNode->left = small1;
    newNode->right = small2;
    return newNode;
} 

void level_traversal(Node* node){
    Node* curNode = node;
    queue<Node*> q;
    if(curNode != NULL) q.push(curNode);
    while(!q.empty()){
        curNode = q.front();
        q.pop();
        cout << curNode->freq << " ";
        if(curNode->left != NULL) q.push(curNode->left);
        if(curNode->right != NULL) q.push(curNode->right);
    }
}

int main(){
    int n, freq; 
    Node *less1, *less2, *root;
    cin >> n;
    // Construct MinHeap
    priority_queue<Node*, vector<Node*>, cmpNode> Q;
    for(int i=0; i<n; i++){
        cin >> freq;
        Node* newNode = new Node();
        newNode->freq = freq;
        newNode->left = NULL;
        newNode->right = NULL;
        Q.push(newNode); // Put the value into heap
    }
    while(Q.size() > 1){
        less1 = Q.top();
        Q.pop();
        less2 = Q.top();
        Q.pop();
        root = mergeTree(less1, less2);
        Q.push(root);
    }
    level_traversal(root);
    cout << "END" << endl;

    return 0;
}

參考及代碼附錄

[1] Trie樹（字典樹）
[2] 代碼demo下載

熱愛改名阿呆呆

發佈了43 篇原創文章 · 獲贊 12 · 訪問量 4萬+

私信關注

【數據結構】樹（一）：字典搜索樹&並查集&哈夫曼編碼（C++實現）

樹

一. 基本概念

二. 樹的表示方法

三. 一些簡單應用

1 字典搜索樹（Lexicographic Search Trees）：trie(retrieval的截取，字典樹)[1]

2 並查集（Disjoint-set Forest）

3 哈夫曼編碼（Huffman Coding）

參考及代碼附錄

嵌入式實驗EX1：Kahn Process Networks and Synchronous Data Flows

【算法】排序 (三)：二叉樹排序&基於散列排序(C++實現)

【算法】排序 (一)：插入排序&希爾排序&選擇排序&堆排序(C++實現)

計算機網絡筆記整理(三)：數據鏈路層

計算機網絡筆記整理(四)：網絡層

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結