Speex手冊----Speex編/解碼API的使用(libspeex)

前言:Speex官網:http://speex.org/ 可以再Documentation下找到PDF版或HTML OL版的英文手冊。可能會由於英文技能的匱乏或語音解碼領域的不熟悉會有翻譯錯誤,所以每段我都會付上英文原段落,也望各位發現後能夠不吝賜教,大家共同進步。

 

PS: 1) 如需轉載,註明出處,不勝感激; 2) 如侵您版權,及時通知,速刪之

 

5.1 編碼

5.2 解碼

5.3 編解碼選項(speex_*_ctl)

5.4 模式查詢

5.5 封包和帶內信令

補充

後記

 

The libspeex library contains all the functions for encoding and decoding speech with the Speex codec. When linking on a UNIX system, one must add -lspeex -lm to the compiler command line. One important thing to know is that libspeex calls are reentrant, but not thread-safe. That means that it is fine to use calls from many threads, but calls using the same state from multiple threads must be protected by mutexes. Examples of code can also be found in Appendix A and the complete API documentation is included in the Documentation section of the Speex website (http://www.speex.org/).

Speex編解碼器的libspeex包囊括了所有的語音編碼和解碼函數。在Linux系統中連接時,必須在編譯器命令行中加入-lspeex –lm。需要知道的是,雖然libspeex的函數調用是可重入的,但不是線程安全的,所以在多線程調用時,如果使用共享資源需要進行互斥保護。附錄A中有代碼實例,在Speex站點(http://www.speex.org/ )的文檔部分能下到完整的API文檔。

 

 

5.1 編碼

In order to encode speech using Speex, one first needs to:
#include <speex/speex.h>
Then in the code, a Speex bit-packing struct must be declared, along with a Speex encoder state:
SpeexBits bits;

void *enc_state;
The two are initialized by:
speex_bits_init(&bits);
enc_state = speex_encoder_init(&speex_nb_mode);
For wideband coding, speex_nb_mode will be replaced by speex_wb_mode. In most cases, you will need to know the frame size used at the sampling rate you are using. You can get that value in the frame_size variable (expressed in samples, not
bytes) with:
speex_encoder_ctl(enc_state,SPEEX_GET_FRAME_SIZE,&frame_size);
In practice, frame_size will correspond to 20 ms when using 8, 16, or 32 kHz sampling rate. There are many parameters that can be set for the Speex encoder, but the most useful one is the quality parameter that controls the quality vs bit-rate tradeoff.
This is set by:
speex_encoder_ctl(enc_state,SPEEX_SET_QUALITY,&quality);
where quality is an integer value ranging from 0 to 10 (inclusively). The mapping between quality and bit-rate is described in Fig. 9.2 for narrowband.
Once the initialization is done, for every input frame:
speex_bits_reset(&bits);
speex_encode_int(enc_state, input_frame, &bits);
nbBytes = speex_bits_write(&bits, byte_ptr, MAX_NB_BYTES);
where input_frame is a (short *) pointing to the beginning of a speech frame, byte_ptr is a (char *) where the encoded frame will be written,MAX_NB_BYTES is the maximumnumber of bytes that can be written to byte_ptr without causing an overflow and nbBytes is the number of bytes actually written to byte_ptr (the encoded size in bytes). Before calling speex_bits_write, it is possible to find the number of bytes that need to be written by calling speex_bits_nbytes(&bits), which returns a number of bytes.
It is still possible to use the speex_encode() function, which takes a (float *) for the audio. However, this would make an eventual port to an FPU-less platform (like ARM) more complicated. Internally, speex_encode() and speex_encode_int() are processed in the same way. Whether the encoder uses the fixed-point version is only decided by the compile-time flags, not at the API level.
After you’re done with the encoding, free all resources with:
speex_bits_destroy(&bits);
speex_encoder_destroy(enc_state);
That’s about it for the encoder.

使用Speex進行語音編碼,首先要:

#include < speex/speex.h >

在代碼中,需要聲明Speex比特包結構體,同時設置Speex編碼器狀態:

SpeexBits bits;

void * enc_state;

初始化兩變量:

speex_bits_init( &bits );

enc_state = speex_encoder_init( &speex_nb_mode );

speex_wb_mode代替爲speex_nb_mode,即可轉換爲寬帶編碼。很多時候,你在使用採樣率的需要知道幀的大小,可以通過變量frame_size(用樣本中的單位表示,不以字節爲單位)獲得,調用下面函數:

speex_encoder_ctl( enc_state, SPEEX_GET_FRAME_SIZE, &frame_size );

實踐表明,在採用81632kHz採樣率的時候,frame_size大約對應於20msSpeex編碼器還有很多參數可以設置,其中最有用的一個是質量參數,控制着比特率(bit-rate)交換的質量,通過下面函數設置:
speex_encoder_ctl( enc_state, SPEEX_SET_QUALITY, &quality );

quality是一個010(包含10)範圍內的整數,窄帶(narrowband)的質量和比特率(bit-rate)的對應關係如圖9.2所示。

初始化成功後,對於每幀的輸入:

speex_bits_reset( &bits );

speex_encode_int( enc_state, input_frame, &bits );

nbBytes = speex_bits_write( &bits, byte_ptr, MAX_NB_BYTES );

其中,input_frame是指向每個Speex幀開始的short型指針,byte_ptr是將寫入已被編碼的幀的char型指針,MAX_NB_BYTESbyte_ptr在不導致溢出時可被寫入的最大字節數,nbBytesbyte_ptr實際被寫入的字節數(編碼大小以字節爲單位)。在調用speex_bits_write之前,可能會通過speex_bits_nbytes(&bits)返回的字節數獲得需要被寫入的字節數,也可能使用speex_encode() 函數,它接受一個攜帶音頻數據的float*型參數。不過這將使缺少浮點運算單元(FPU)的平臺(如ARM)變的更爲複雜。實際上,speex_encodespeex_encode_int()用同樣的方法處理,編碼器是否使用定點數取決於編譯期的標誌位,不由API來控制。

完成編碼後,釋放所有資源:
speex_bits_destroy( &bits );

speex_encoder_destroy( enc_state );

這是關於編碼的部分。

  

5.2 解碼

In order to decode speech using Speex, you first need to:
#include <speex/speex.h>
You also need to declare a Speex bit-packing struct
SpeexBits bits;
and a Speex decoder state
void *dec_state;
The two are initialized by:
speex_bits_init(&bits);
dec_state = speex_decoder_init(&speex_nb_mode);
For wideband decoding, speex_nb_mode will be replaced by speex_wb_mode. If you need to obtain the size of the frames that will be used by the decoder, you can get that value in the frame_size variable (expressed in samples, not bytes) with:
speex_decoder_ctl(dec_state, SPEEX_GET_FRAME_SIZE, &frame_size);
There is also a parameter that can be set for the decoder: whether or not to use a perceptual enhancer. This can be set by:
speex_decoder_ctl(dec_state, SPEEX_SET_ENH, &enh);
where enh is an int with value 0 to have the enhancer disabled and 1 to have it enabled. As of 1.2-beta1, the default is now to enable the enhancer.
Again, once the decoder initialization is done, for every input frame:
speex_bits_read_from(&bits, input_bytes, nbBytes);
speex_decode_int(dec_state, &bits, output_frame);
where input_bytes is a (char *) containing the bit-stream data received for a frame, nbBytes is the size (in bytes) of that bit-stream, and output_frame is a (short *) and points to the area where the decoded speech frame will be written. A NULL value as the second argument indicates that we don’t have the bits for the current frame. When a frame is lost, the Speex decoder will do its best to "guess" the correct signal.
As for the encoder, the speex_decode() function can still be used, with a (float *) as the output for the audio. After you’re done with the decoding, free all resources with:
speex_bits_destroy(&bits);
speex_decoder_destroy(dec_state);

使用Speex解碼語音,首先要包含speex.h頭文件。

#include < speex/speex.h>

需要聲明Speex比特包的結構體和Speex解碼器的狀態

SpeexBits bits;

void* dec_state;

進行初始化
speex_bits_init( &bits );

dec_state = speex_decoder_init( &speex_nb_mode );

speex_wb_mode代替speex_nb_mode,可轉換爲寬帶(windband)解碼。可能過變量frame_size來獲得解碼的幀大小

speex_decoder_ctl( dec_state, SPEEX_GET_FRAME_SIZE, &frame_size );

還可以能過下面函數設置是否使用“知覺增強”功能

speex_decoder_ctl( dec_state, SPEEX_SET_ENH, &enh );

如果enh0則表是不啓用,1則表示啓用。在1.2-beta1中,默認是開啓的。

做完初始化工作後,則可對每個輸入幀進行如下操作:

speex_bits_read_from( &bits, input_bytes, nbBytes );

speex_decode_int( dec_state, &bits, output_frame );

其中,input_byteschar型指針,包含了一幀的比特流數據,nbBytes是那幀比特流數據的大小(以字節爲單位),output_frameshort型指針,指向一塊內存區域,存儲對語音幀的解碼。第二個參數爲空值(NULL)意味着沒有獲得到正確的比特(bit)數據,出現丟幀,Speex解碼器會儘可能猜測最爲準確的語音信號。

和編碼器類似,可以用speex_decode()函數的一個float*型參數獲得音頻輸出。

完成解碼後,釋放掉所有資源:

speex_bits_destory( &bits );

speex_decoder_destory( dec_state );

 

5.3 編解碼選項(speex_*_ctl)

The Speex encoder and decoder support many options and requests that can be accessed through the speex_encoder_ctl and
speex_decoder_ctl functions. These functions are similar to the ioctl system call and their prototypes are:
void speex_encoder_ctl(void *encoder, int request, void *ptr);
void speex_decoder_ctl(void *encoder, int request, void *ptr);
Despite those functions, the defaults are usually good for many applications and optional settings should only be used when one understands them and knows that they are needed. A common error is to attempt to set many unnecessary settings.
Here is a list of the values allowed for the requests. Some only apply to the encoder or the decoder. Because the last argument is of type void *, the _ctl() functions are not type safe, and shoud thus be used with care. The type spx_int32_t is the same as the C99 int32_t type.
SPEEX_SET_ENH‡ Set perceptual enhancer to on (1) or off (0) (spx_int32_t, default is on)
SPEEX_GET_ENH‡ Get perceptual enhancer status (spx_int32_t)
SPEEX_GET_FRAME_SIZE Get the number of samples per frame for the current mode (spx_int32_t)
SPEEX_SET_QUALITY† Set the encoder speech quality (spx_int32_t from 0 to 10, default is 8)
SPEEX_GET_QUALITY† Get the current encoder speech quality (spx_int32_t from 0 to 10)
SPEEX_SET_MODE† Set the mode number, as specified in the RTP spec (spx_int32_t)
SPEEX_GET_MODE† Get the current mode number, as specified in the RTP spec (spx_int32_t)
SPEEX_SET_VBR† Set variable bit-rate (VBR) to on (1) or off (0) (spx_int32_t, default is off)
SPEEX_GET_VBR† Get variable bit-rate (VBR) status (spx_int32_t)
SPEEX_SET_VBR_QUALITY† Set the encoder VBR speech quality (float 0.0 to 10.0, default is 8.0)
SPEEX_GET_VBR_QUALITY† Get the current encoder VBR speech quality (float 0 to 10)
SPEEX_SET_COMPLEXITY† Set the CPU resources allowed for the encoder (spx_int32_t from 1 to 10, default is 2)
SPEEX_GET_COMPLEXITY† Get the CPU resources allowed for the encoder (spx_int32_t from 1 to 10, default is 2)
SPEEX_SET_BITRATE† Set the bit-rate to use the closest value not exceeding the parameter (spx_int32_t in bits per second)
SPEEX_GET_BITRATE Get the current bit-rate in use (spx_int32_t in bits per second)
SPEEX_SET_SAMPLING_RATE Set real sampling rate (spx_int32_t in Hz)
SPEEX_GET_SAMPLING_RATE Get real sampling rate (spx_int32_t in Hz)
SPEEX_RESET_STATE Reset the encoder/decoder state to its original state, clearing all memories (no argument)
SPEEX_SET_VAD† Set voice activity detection (VAD) to on (1) or off (0) (spx_int32_t, default is off)
SPEEX_GET_VAD† Get voice activity detection (VAD) status (spx_int32_t)
SPEEX_SET_DTX† Set discontinuous transmission (DTX) to on (1) or off (0) (spx_int32_t, default is off)
SPEEX_GET_DTX† Get discontinuous transmission (DTX) status (spx_int32_t)
SPEEX_SET_ABR† Set average bit-rate (ABR) to a value n in bits per second (spx_int32_t in bits per second)
SPEEX_GET_ABR† Get average bit-rate (ABR) setting (spx_int32_t in bits per second)
SPEEX_SET_PLC_TUNING† Tell the encoder to optimize encoding for a certain percentage of packet loss (spx_int32_t in percent)
SPEEX_GET_PLC_TUNING† Get the current tuning of the encoder for PLC (spx_int32_t in percent)
SPEEX_SET_VBR_MAX_BITRATE† Set the maximum bit-rate allowed in VBR operation (spx_int32_t in bits per second)
SPEEX_GET_VBR_MAX_BITRATE† Get the current maximum bit-rate allowed in VBR operation (spx_int32_t in bits per second)
SPEEX_SET_HIGHPASS Set the high-pass filter on (1) or off (0) (spx_int32_t, default is on)
SPEEX_GET_HIGHPASS Get the current high-pass filter status (spx_int32_t)
† applies only to the encoder
‡ applies only to the decoder

Speex編碼器和解碼器可以通過訪問speex_encoder_ctlspeex_decoder_ctl函數來設置更多選項,類似於系統函數ioctl。它們的原型是:

void speex_encoder_ctl( void* encoder, int request, void* ptr );

void speex_decoder_ctl( void* decoder, int request, void* ptr );

儘管擁有這些函數,但一般的應用程序在默認情況下就足夠,如果要設置則需瞭解並知道爲什麼需要它們,勿隨變設置。

下面列出了各種需求的允許值,其中一些只能應用於編碼器或解碼器。因爲最後一個參數是void指針,所以_ctl()函數不是類型安全的,應小心使用。spx_int32_t類型同C99中的int32_t

SPEEX_SET_ENH:設置知覺增強,1開啓,0關閉(spx_int32_t,默認開啓)

SPEEX_GET_ENH:獲得知覺增強狀態( spx_int32_t)

SPEEX_SET_QUALITY:設置編碼質量(spx_int32_t 010,默認爲8

SPEEX_GET_QUALITY:獲得當前語音編碼質量(spx_int32_t 010

SPEEX_SET_MODE:設置模式,指明RTP協議規格(spx_int32_t

SPEEX_GET_MODE:獲得當前模式,指明的RTP協議規格(spx_int32_t

SPEEX_SET_VBR:設置變比特率(VBR),1開啓,0關閉(spx_int32_t 默認關閉)

SPEEX_GET_VBR 獲得變比特率功能當前是否開啓(spx_int32_t

SPEEX_SET_VBR_QUALITY:設置變比特率語音的編碼質量(浮點數從0.010.0,默認8.0

SPEEX_GET_VBR_QUALITY:獲得當前變比特率語音的編碼質量( 浮點數從0.010.0

SPEEX_SET_COMPLEXITY:設置編碼器的可用CPU資源( spx_int32_t110,默認爲2

SPEEX_GET_COMPLEXITY:獲取編碼器的可用CPU資源(spx_int32_t110,默認爲2

SPEEX_SET_BITRATE:設置不超過參數設置的最佳比特值(spx_int32_t 單位bits/s )

SPEEX_GET_BITRATE:獲取當前使用的比特率( spx_int32_t 單位 bits/s

SPEEX_SET_SAMPLING_RATE:設置實時採樣率(spx_int32_t 單位 Hz

SPEEX_GET_SAMPLING_RATE:獲取實時採樣率(spx_int32_t 單位 Hz

SPEEX_RESET_STATE:重置編/解碼器到原始狀態,並清除所有記憶(無參數)

SPEEX_SET_VAD:設置靜音檢測特性(VAD),1爲打開,0爲關閉( spx_int32_t, 默認爲關閉)

SPEEX_GET_VAD:獲取靜音檢測是否打開( spx_int32_t

SPEEX_SET_DTX:設計非連續性傳輸(DTX),1爲打開,0爲關閉(spx_int32_t, 默認爲關閉)

SPEEX_GET_DTX:獲取非連續性傳輸(DTX)是否打開(spx_int32_t

SPEEX_SET_ABR:設置平均比特率(ABR)值, 單位 bits/s(spx_int32_t,單位 bits/s )

SPEEX_GET_ABR:獲得平均比特率設置(spx_int32_t,單位bits/s

SPEEX_SET_PLC_TUNING:讓編碼器對一定的失包率開啓最優化編碼(spx_int32_t,單位 %)

SPEEX_GET_PLC_TUNING獲取編碼器爲PLC的當前調整(spx_int32_t,單位%)

SPEEX_SET_VBR_MAX_BITRATE:設置允許變比特率(VBR)使用的最大比特率(spx_int32_t,單位 bits/s

SPEEX_GET_VBR_MAX_BITRATE:獲取允許變比特率(VBR)使用的最大比特率(spx_int32_t,單位 bits/s

SPEEX_SET_HIGHPASS:設置高通濾波器,1爲打開,0爲關閉(spx_int32_t,默認爲打開)

SPEEX_GET_HIGHPASS:獲取高通濾波器狀態( spx_int32_t

僅用於編/解碼器。

 

5.4 模式查詢

Speex modes have a query system similar to the speex_encoder_ctl and speex_decoder_ctl calls. Since modes are read-only,it is only possible to get information about a particular mode. The function used to do that is:
void speex_mode_query(SpeexMode *mode, int request, void *ptr);

類似於調用speex_encoder_ctlspeex_decoder_ctlSpeex有模式查詢系統。因爲模式是隻讀的,所以只能獲得模式的詳細信息。使用如下函數:

void speex_mode_query( SpeexMode* mode, int request, void* ptr );

 

The admissible values for request are (unless otherwise note, the values are returned through ptr):
SPEEX_MODE_FRAME_SIZE Get the frame size (in samples) for the mode
SPEEX_SUBMODE_BITRATE Get the bit-rate for a submode number specified through ptr (integer in bps).

受理的請求值(除非另有說明,要不返回值都是通過ptr):

SPEEX_MODE_FRAME_SIZE 獲得模式的幀大小(樣本中)

SPEEX_SUBMODE_BITRATE:獲取通過ptr指定的子模式數量的比特率(以bps爲單位的整數)

 

5.5 封包和帶內信令

Sometimes it is desirable to pack more than one frame per packet (or other basic unit of storage). The proper way to do it is to call speex_encode N times before writing the stream with speex_bits_write. In cases where the number of frames is not determined by an out-of-band mechanism, it is possible to include a terminator code. That terminator consists of the code 15 (decimal) encoded with 5 bits, as shown in Table 9.2. Note that as of version 1.0.2, calling speex_bits_write automatically inserts the terminator so as to fill the last byte. This doesn’t involves any overhead and makes sure Speex can always detect when there is no more frame in a packet.

有時我們打包的數據不只一幀(或其他基本存儲單元),正確做法是在用speex_bits_write寫入流數據之前調用Nspeex_encode。這種情況下的幀數不是由帶外機制決定的,它會包含一個終結碼。如表9.2所示,這個終結碼是由用5bits編碼的Mode 15組成。如果是1.0.2版本需注意,調用speex_bits_write時,爲了填充最後字節,它會自動添加終結碼。這不會增加開銷,並能確保Speex一直檢測到包中沒有更多幀爲止。

 

It is also possible to send in-band “messages” to the other side. All these messages are encoded as “pseudo-frames” of mode 14 which contain a 4-bit message type code, followed by the message. Table 5.1 lists the available codes, their meaning and the size of the message that follows. Most of these messages are requests that are sent to the encoder or decoder on the other end, which is free to comply or ignore them. By default, all in-band messages are ignored.

當然也可以通過帶內“消息”的方法,所有這些消息是作爲Mode14的“僞幀”編碼的,Mode14包含4bit的消息類型代碼。表5.1列出了可用代碼的說明和大小,發送給編/解碼器的的消息大部分都可隨意的被接受或被忽略。默認情況下,所有帶內消息都被忽略掉了。

In-band signalling codes

 

5.1 帶內信號代碼

Finally, applications may define custom in-band messages using mode 13. The size of the message in bytes is encoded with 5 bits, so that the decoder can skip it if it doesn’t know how to interpret it.

最後,一些應用會使用Mode 13自定義帶內消息,消息的字節大小是用5bits編碼的,所以如果編碼器不知道如何解析它就會跳過。

 

補充:

本是第9章--Speex窄帶模式中的圖和表格,但本章中需要參考,貼上來

Analysis-by-synthesis closed-loop optimization on a sub-frame

Figure 9.2: Analysis-by-synthesis closed-loop optimization on a sub-frame.

 

 Quality versus bit rate

Table 9.2: Quality versus bit-rate

 

後記

因爲時間問題(英語學的太菜,翻譯對偶來說是件困難的事情),所以直接到第5章了,第3章--編譯和移植和第4章--編/解碼器命令行以後再進行翻譯整理。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章