目錄
哈夫曼壓縮與解壓縮(java版)
一哈夫曼樹以及文件壓縮原理:
1.哈夫曼樹 :
給定N個權值作爲N個葉子結點,構造一棵二叉樹,若該樹的帶權路徑長度達到最小,稱這樣的二叉樹爲最優二叉樹,也稱爲哈夫曼樹。哈夫曼樹是帶權路徑長度最短的樹,權值較大的結點離根較近(頻率越高的結點離根越進)。
以 下數組爲例,構建哈夫曼樹
int a[] = {0,1,2,3,4,5,6,7,8}
我們可以發現以下規律
1:9個數構成的哈夫曼樹一共有17個結點,也就是可以n個數可以生產2*n-1個結點
2:數字越大的數離根節點越近,越小的數離根節點越近。
2.如何利用haffman編碼實現文件壓縮:
比如abc.txt文件中有以下字符:aaaabbbccde,
1.進行字符統計
aaaabbbccde
a : 4次
b : 3次
c : 2次
d : 1次
e : 1次
2.用統計結果構建哈夫曼樹
3.用哈夫曼樹生成哈夫曼編碼(從根結點開始,路徑左邊記爲0,右邊記爲1):
a的編碼:1
b的編碼:01
c的編碼:000
d的編碼:0011
e的編碼:0010
4.哈夫曼編碼代替字符,進行壓縮。
源文件內容爲:aaaabbbccde
將源文件用對應的哈夫曼編碼(haffman code)替換,則有:11110101 01000000 00110010 (總共3個字節)
由此可見,源文件一共有11個字符,佔11字節的內存,但是經過用haffman code替換之後,只佔3個字節,這樣就能達到壓縮的目的
二主要技術點:
1.哈夫曼樹算法(哈夫曼壓縮的基本算法)
2.哈希算法(字符統計時候會用到,也可以直接用HashMap統計)
3.位運算(涉及到將指定位,置0或置1)
4.java文件操作,以及緩衝操作。
5.存儲模式(大端存儲,小端存儲,能看懂文件16進制的形式)
7.設置壓縮密碼,解壓輸入密碼解壓(小編自己加的內容)
三實現過程:
以上述aaaabbbccde爲例
1.字符統計:
public class FreqHuf {
public static int BUFFER_SIZE = 1 << 18;
int freq[] = new int[256];
File file;
int count;
List<HuffmanFreq> list;
FreqHuf(String pathname) throws Exception {
list = new ArrayList<>();
this.file = new File(pathname);
if(!file.exists()){
throw new Exception("文件不存在");
}
System.out.println("進行字符統計中");
CensusChar();
System.out.println("字符統計完畢");
}
public void CensusChar() throws IOException{
int intchar;
FileInputStream fis = new FileInputStream(file);
System.out.println("統計中");
//這種統計處理方案,速度極慢,不建議使用,以下采用緩存讀數據。
// while((intchar = fis.read()) != -1){
// freq[intchar]++;
// }
//這裏採用緩存機制,一次讀1 << 18個字節,大大提高效率。
byte[] bytes = new byte[BUFFER_SIZE];
while((intchar = fis.read(bytes))!= -1){
for(int i = 0; i < intchar;i++){
int temp = bytes[i]& 0xff;
freq[temp]++;
}
}
fis.close();
for(int i = 0; i < 256; i++){
if(freq[i] != 0){
this.count++;
}
}
int index = 0;
for(int i = 0; i < 256; i++){
if(freq[i] != 0){
HuffmanFreq huffman = new HuffmanFreq();
huffman.character = (char)i;
huffman.freq = freq[i];
list.add(index, huffman);
}
}
}
}
//統計每個字符和其頻率的類
public class HuffmanFreq {
char character;
int freq;
HuffmanFreq() {
}
HuffmanFreq(int character,int freq) {
this.character = (char)character;
this.freq = freq;
}
char getCharacter() {
return character;
}
void setCharacter(int character) {
this.character = (char)character;
}
int getFreq() {
return freq;
}
void setFreq(int freq) {
this.freq = freq;
}
byte[] infoToByte(){
byte[] bt = new byte[6];
byte[] b1 = ByteAnd8Types.charToByte(character);
for(int i= 0; i < b1.length;i++){
bt[i] = b1[i];
}
byte[] b2 = ByteAnd8Types.intToBytes2(freq);
int index = 2;
for(int i= 0; i < b2.length;i++){
bt[index++] = b2[i];
}
return bt;
}
@Override
public String toString() {
return "Huffman [character=" + character + ", freq=" + freq + "]";
}
}
2.用統計結果構建哈夫曼樹:
//treeSize爲總節點數
private void creatTree(int treeSize){
int temp;
treeList = new ArrayList<HuffTreeNode>();
for(int i = 0; i < treeSize; i++){
HuffTreeNode node = new HuffTreeNode();
treeList.add(i, node);
}
for(int i = 0; i < charCount; i++){
HuffTreeNode node = treeList.get(i);
node.freq.freq = charList.get(i).getFreq();
node.freq.character = charList.get(i).getCharacter();
node.left = -1;
node.right = -1;
node.use = 0;
}
for(int i = charCount; i < treeSize; i++){
int index = i;
HuffTreeNode node = treeList.get(i);
node.use = 0;
node.freq.character = '#';
node.right = searchmin(index);
node.left = searchmin(index);
node.freq.freq = treeList.get(node.right).freq.freq + treeList.get(node.left).freq.freq;
temp = searchmin(++index);
if(temp == -1){
break;
}
treeList.get(temp).use = 0;
}
}
private int searchmin(int count){
int minindex = -1;
for(int i = 0; i < count; i++){
if(treeList.get(i).use == 0){
minindex = i;
break;
}
}
if(minindex == -1){
return -1;
}
for(int i = 0; i < count; i++){
if((treeList.get(i).freq.freq <= treeList.get(minindex).freq.freq) && treeList.get(i).use == 0){
minindex = i;
}
}
treeList.get(minindex).use = 1;
return minindex;
}
3.用哈夫曼樹生成哈夫曼編碼(從根結點開始,路徑左邊記爲0,右邊記爲1):
private void bulidhuftreecode(int root, String str){
if(treeList.get(root).getLeft() != -1 && treeList.get(root).getRight() != -1){
bulidhuftreecode(treeList.get(root).getLeft(), str+"0");
bulidhuftreecode(treeList.get(root).getRight(), str + "1");
}
else{
treeList.get(root).code = str;
}
}
4.哈夫曼編碼代替字符,進行壓縮,壓縮前首先要將文件頭(文件標誌,字符數量,最後一個字節有效位,密碼)字符和其頻率的那張表格寫入文件,以便於解壓縮
public void creatCodeFile(String path) throws Exception{
byte value = 0;
int index = 0;
int arr[] = new int[256];
int intchar;
for(int i = 0; i < charCount; i++){
arr[treeList.get(i).freq.character] = i;
}
File file = new File(path);
if(!file.exists()){
if(!file.createNewFile()){
throw new Exception("創建文件失敗");
}
}
int count = charList.size();
HuffmanHead head = new HuffmanHead(count, howlongchar(count), password);
//將文件頭信息寫入文件
this.write = new RandomAccessFile(file, "rw");
write.write(head.InfoToByte());
//將字符及其頻率的表寫入文件
for(HuffmanFreq freq : charList){
byte[] bt = freq.infoToByte();
write.write(bt);
}
//將字符用哈夫曼編碼進行壓縮,這裏讀寫都是採用緩存機制
byte[] readBuffer = new byte[BUFFER_SIZE];
while((intchar = read.read(readBuffer))!= -1){
ProgressBar.SetCurrent(read.getFilePointer());
for(int i = 0; i < intchar;i++){
int temp = readBuffer[i]& 0xff;
String code = treeList.get(arr[temp]).code;
char[] chars = code.toCharArray();
for(int j = 0; j < chars.length; j++){
if(chars[j] == '0'){
value = CLR_BYTE(value, index);
}
if(chars[j] == '1'){
value = SET_BYTE(value, index);
}
if(++index >= 8){
index = 0;
writeInBuffer(value);
}
}
}
}
//此方法速度較慢
// while((intchar = is.read()) != -1){
// String code = treeList.get(arr[intchar]).code;
// char[] chars = code.toCharArray();
//
// for(int i = 0; i < chars.length; i++){
// if(chars[i] == '0'){
// value = CLR_BYTE(value, index);
// }
// if(chars[i] == '1'){
// value = SET_BYTE(value, index);
// }
// if(++index >= 8){
// index = 0;
// oos.write(value);
// }
// }
// }
if(index != 0){
writeInBuffer(value);
}
byte[] Data = Arrays.copyOfRange(writeBuffer, 0, writeBufferSize);
write.write(Data);
write.close();
read.close();
}
//指定位,置1
byte SET_BYTE(byte value, int index){
return (value) |= (1 << ((index) ^ 7));
}
//指定位,置0
byte CLR_BYTE(byte value, int index){
return (value) &= (~(1 << ((index) ^ 7)));
}
//判斷指定位是否爲0,0爲false,1爲true
boolean GET_BYTE(byte value, int index){
return ((value) & (1 << ((index) ^ 7))) != 0;
}
如果一個字節一個字節往文件裏寫,速度會極慢,爲了提高效率,寫也採用緩存,先寫到緩存區,緩存區滿了後寫入文件,
private void writeInBuffer(byte value) throws Exception {
if(writeBufferSize < BUFFER_SIZE){
writeBuffer[writeBufferSize] = value;
if(++writeBufferSize >= BUFFER_SIZE){
write.write(writeBuffer);
writeBufferSize = 0;
}
} else{
throw new Exception("寫入文件出錯");
}
}
到這裏壓縮就完成了,以下爲解壓縮方法
1.從寫入文件中的字符統計的表讀出放入list裏
public void init() throws Exception{
char isHUf = read.readChar();
//驗證文件頭信息
if(isHUf != '哈'){
throw new Exception("該文件不是HUFFMAN壓縮文件");
}
this.charCount = read.readChar();
this.treeSize = 2*charCount -1;
this.lastIndex = read.readChar();
int password = read.readInt();
if(password != this.password.hashCode()){
System.out.println("密碼錯誤");
} else{
System.out.println("密碼正確,正在解壓");
}
//從文件中將字符統計的表讀出
byte[] buffer = new byte[charCount * 6];
read.seek(10);
read.read(buffer, 0, charCount * 6);
ProgressBar.SetCurrent(read.getFilePointer());
for(int i = 0; i < buffer.length; i+=6){
byte[] buff = Arrays.copyOfRange(buffer, i, i+2);
ByteBuffer bb = ByteBuffer.allocate (buff.length);
bb.put (buff);
bb.flip ();
CharBuffer cb = cs.decode (bb);
byte[] buff1 = Arrays.copyOfRange(buffer, i+2, i+6);
int size = ByteAnd8Types.bytesToInt2(buff1, 0);
HuffmanFreq freq = new HuffmanFreq(cb.array()[0], size);
charList.add(freq);
}
}
2.用統計結果構建哈夫曼樹(和以上代碼一樣)
3.用哈夫曼樹生成哈夫曼編碼(從根結點開始,路徑左邊記爲0,右邊記爲1)(和以上代碼一樣)
4.遍歷文件每個字節,根據哈夫曼編碼找到對應的字符,將字符寫入新文件
public void creatsourcefile(String pathname) throws Exception{
int root = treeList.size() - 1;
int fininsh = 1;
long len;
File file = new File(pathname);
if(!file.exists()){
if(!file.createNewFile()){
throw new Exception("創建文件失敗");
}
}
write = new RandomAccessFile(file, "rw");
int intchar;
byte[] bytes = new byte[1<<18];
int index = 0;
while((intchar = read.read(bytes))!= -1){
len = read.getFilePointer();
ProgressBar.SetCurrent(len);
for(int i = 0; i < intchar;i++){
for(;index < 8 && fininsh != 0;){
if(GET_BYTE(bytes[i], index)){
root = treeList.get(root).right;
} else{
root = treeList.get(root).left;
}
if(treeList.get(root).right== -1 && treeList.get(root).left == -1){
byte temp = (byte)treeList.get(root).freq.character;
writeInBuffer(temp);
root = treeList.size() - 1;
}
index++;
if(len == this.goalfilelenth && i == intchar-1){
if(index >= this.lastIndex){
fininsh = 0;
}
}
}
index = 0;
}
}
byte[] Data = Arrays.copyOfRange(writeBuffer, 0, writeBufferSize);
write.write(Data);
write.close();
write.close();
read.close();
}
四運行展示:
以上爲哈夫曼壓縮,需要具體代碼的,可以私信我,謝謝閱讀。下方也可以下載資源