NAPI之（一）——原理和實現

概述

NAPI是Linux新的網卡數據處理API，據說是由於找不到更好的名字，所以就叫NAPI(New API)，在2.5之後引入。

簡單來說，NAPI是綜合中斷方式與輪詢方式的技術。

中斷的好處是響應及時，如果數據量較小，則不會佔用太多的CPU事件；缺點是數據量大時，會產生過多中斷，

而每個中斷都要消耗不少的CPU時間，從而導致效率反而不如輪詢高。輪詢方式與中斷方式相反，它更適合處理

大量數據，因爲每次輪詢不需要消耗過多的CPU時間；缺點是即使只接收很少數據或不接收數據時，也要佔用CPU

時間。

NAPI是兩者的結合，數據量低時採用中斷，數據量高時採用輪詢。平時是中斷方式，當有數據到達時，會觸發中斷

處理函數執行，中斷處理函數關閉中斷開始處理。如果此時有數據到達，則沒必要再觸發中斷了，因爲中斷處理函

數中會輪詢處理數據，直到沒有新數據時纔打開中斷。

很明顯，數據量很低與很高時，NAPI可以發揮中斷與輪詢方式的優點，性能較好。如果數據量不穩定，且說高不高

說低不低，則NAPI則會在兩種方式切換上消耗不少時間，效率反而較低一些。

實現

來看下NAPI和非NAPI的區別：

(1) 支持NAPI的網卡驅動必須提供輪詢方法poll()。

(2) 非NAPI的內核接口爲netif_rx()，NAPI的內核接口爲napi_schedule()。

(3) 非NAPI使用共享的CPU隊列softnet_data->input_pkt_queue，NAPI使用設備內存(或者

設備驅動程序的接收環)。

(1) NAPI設備結構

/* Structure for NAPI scheduling similar to tasklet but with weighting */
struct napi_struct {
/* The poll_list must only be managed by the entity which changes the
* state of the NAPI_STATE_SCHED bit. This means whoever atomically
* sets that bit can add this napi_struct to the per-cpu poll_list, and
* whoever clears that bit can remove from the list right before clearing the bit.
*/
struct list_head poll_list; /* 用於加入處於輪詢狀態的設備隊列 */
unsigned long state; /* 設備的狀態 */
int weight; /* 每次處理的最大數量，非NAPI默認爲64 */
int (*poll) (struct napi_struct *, int); /* 此設備的輪詢方法，非NAPI爲process_backlog() */
#ifdef CONFIG_NETPOLL
...
#endif
unsigned int gro_count;
struct net_device *dev;
struct list_head dev_list;
struct sk_buff *gro_list;
struct sk_buff *skb;
};

(2) 初始化

初始napi_struct實例。

void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
int (*poll) (struct napi_struct *, int), int weight)
{
INIT_LIST_HEAD(&napi->poll_list);
napi->gro_count = 0;
napi->gro_list = NULL;
napi->skb = NULL;
napi->poll = poll; /* 設備的poll函數 */
napi->weight = weight; /* 設備每次poll能處理的數據包個數上限 */
list_add(&napi->dev_list, &dev->napi_list); /* 加入設備的napi_list */
napi->dev = dev; /* 所屬設備 */
#ifdef CONFIG_NETPOLL
spin_lock_init(&napi->poll_lock);
napi->poll_owner = -1;
#endif
set_bit(NAPI_STATE_SCHED, &napi->state); /* 設置NAPI標誌位 */
}

(3) 調度

在網卡驅動的中斷處理函數中調用napi_schedule()來使用NAPI。

/**
* napi_schedule - schedule NAPI poll
* @n: napi context
* Schedule NAPI poll routine to be called if it is not already running.
*/
static inline void napi_schedule(struct napi_struct *n)
{
/* 判斷是否可以調度NAPI */
if (napi_schedule_prep(n))
__napi_schedule(n);
}

判斷NAPI是否可以調度。如果NAPI沒有被禁止，且不存在已被調度的NAPI，

則允許調度NAPI，因爲同一時刻只允許有一個NAPI poll instance。

/**
* napi_schedule_prep - check if napi can be scheduled
* @n: napi context
* Test if NAPI routine is already running, and if not mark it as running.
* This is used as a condition variable insure only one NAPI poll instance runs.
* We also make sure there is no pending NAPI disable.
*/
static inline int napi_schedule_prep(struct napi_struct *n)
{
return !napi_disable_pending(n) && !test_and_set_bit(NAPI_STATE_SCHED, &n->state);
}
static inline int napi_disable_pending(struct napi_struct *n)
{
return test_bit(NAPI_STATE_DISABLE, &n->state);
}
enum {
NAPI_STATE_SCHED, /* Poll is scheduled */
NAPI_STATE_DISABLE, /* Disable pending */
NAPI_STATE_NPSVC, /* Netpoll - don't dequeue from poll_list */
};

NAPI的調度函數。把設備的napi_struct實例添加到當前CPU的softnet_data的poll_list中，

以便於接下來進行輪詢。然後設置NET_RX_SOFTIRQ標誌位來觸發軟中斷。

void __napi_schedule(struct napi_struct *n)
{
unsigned long flags;
local_irq_save(flags);
____napi_schedule(&__get_cpu_var(softnet_data), n);
local_irq_restore(flags);
}
static inline void ____napi_schedule(struct softnet_data *sd, struct napi_struct *napi)
{
/* 把napi_struct添加到softnet_data的poll_list中 */
list_add_tail(&napi->poll_list, &sd->poll_list);
__raise_softirq_irqoff(NET_RX_SOFTIRQ); /* 設置軟中斷標誌位 */
}

(4) 輪詢方法

NAPI方式中的POLL方法由驅動程序提供，在通過netif_napi_add()加入napi_struct時指定。

在驅動的poll()中，從自身的隊列中獲取sk_buff後，如果網卡開啓了GRO，則會調用

napi_gro_receive()處理skb，否則直接調用netif_receive_skb()。

POLL方法應該和process_backlog()大體一致，多了一些具體設備相關的部分。

(5) 非NAPI和NAPI處理流程對比

以下是非NAPI設備和NAPI設備的數據包接收流程對比圖：

NAPI方式在上半部中sk_buff是存儲在驅動自身的隊列中的，軟中斷處理過程中驅動POLL方法調用

netif_receive_skb()直接處理skb並提交給上層。

/**
* netif_receive_skb - process receive buffer from network
* @skb: buffer to process
* netif_receive_skb() is the main receive data processing function.
* It always succeeds. The buffer may be dropped during processing
* for congestion control or by the protocol layers.
* This function may only be called from softirq context and interrupts
* should be enabled.
* Return values (usually ignored):
* NET_RX_SUCCESS: no congestion
* NET_RX_DROP: packet was dropped
*/
int netif_receive_skb(struct sk_buff *skb)
{
/* 記錄接收時間到skb->tstamp */
if (netdev_tstamp_prequeue)
net_timestamp_check(skb);
if (skb_defer_rx_timestamp(skb))
return NET_RX_SUCCESS;
#ifdef CONFIG_RPS
...
#else
return __netif_receive_skb(skb);
#endif
}

__netif_receive_skb()在上篇blog中已分析過了，接下來就是網絡層來處理接收到的數據包了。

何進哥哥

發佈了4 篇原創文章 · 獲贊 38 · 訪問量 15萬+

私信關注

NAPI之（一）——原理和實現

概述

實現

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

TDengine docker安裝方法

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

go語言 defer延遲機制

經典KVM詳解，太詳細太深入了

深度學習殘差塊

堆疊hourglass網絡

深度學習資料整理目錄，與大家一起討論

openstack plugin 之（四）如何區分 OpenStack Neutron Extension 和 Plugin

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結