Linux Shell 高級編程(下)

1.概述

在前面的章節中，我們已經介紹了shell結構化命令，函數用法，正則表達式以及sed,gawk的基本用法，在這裏，主要介紹一下sed和awk的高級用法。sed是流編輯器，具有速度快的優點。sed 每次處理一行數據，在每一行數據上執行腳本命令。而gawk是一種編程語言，能夠對數據進行處理，從而生成數據報告。

2. sed的高級用法

(1)n與N的區別

n: 小寫n告訴sed編輯器移動到數據流中文本的下一行，而不是回到命令的開始。

N: 大寫N是將數據流的下一行添加到模式空間中去。即將數據流的兩個文本行合併，添加到同一模式空間中去。雖然文本行以換行符分隔，但此時流sed編輯器已經將他們作爲同一個文本進行處理了。

例子:

[root@localhost chapter18]# cat <data1
this is the header line

this is a data line.

this is the last line.

data1中的數據隔一行有一個空行。

小寫n:

[root@localhost chapter18]# sed '/header/{
n
d
}' data1
this is the header line
this is a data line.

this is the last line.

{}表示在匹配header上執行下面的腳本。

即匹配了data1中的第一行，然後n,移動到下一行，d刪除下一行文本，此時就刪除了空行。

大寫N：

[root@localhost chapter18]# sed '/header/{
N
d
}' data1
this is a data line.

this is the last line.

匹配了data1中的第一行，N將下一行添加到模式空間中去，那麼此時流編輯器就把它們作爲一行處理了，雖然它們之間有換行符，所以d就刪除了兩行，分別是第一行與空行。

有{}與無{}的區別:

[root@localhost chapter18]# sed '/header/n;d;' data1
this is the header line

這個命令與上面的命令形式是一樣的，只不過用;將命令分隔開了。

首先匹配header，然後下一行，之後一直刪除下一行。所以只留下了第一行。

[root@localhost chapter18]# sed '/header/N;d;' data1

如果是N就全刪除了。

首先匹配header,然後把下一行添加到模式空間中去，d刪除掉模式空間的內容。之後掃描下一行，N把next添加到模式空間中去......

(2) D與d的區別

d是刪除模式空間的當前行。

而D只刪除模式空間的第一行，直至換行符的所有字符，也包括換行符。

d Delete pattern space. Start next cycle.、

D Delete up to the first embedded newline in the pattern space. Start next cycle, but skip reading from the input if there is still data in the pattern space.

區別一下:

d 刪除模式空間，進行下一個循環，即處理下一行。

D刪除模式空間的第一行，如果此時模式空間仍然有數據，強制返回到腳本的開頭，則不處理新的輸入的數據。

例子1：

[root@localhost chapter18]# cat <data5

This is the header line.
this is the second line.

this is the last line.

[root@localhost chapter18]# sed '/^$/{
> N
> /heade/D}' data5
This is the header line.
this is the second line.

this is the last line.

例子2:

[root@localhost chapter18]# sed '/^$/N;/header/D' data5
this is the second line.

this is the last line.

這是個比較經典的例子。

分析:

目標: 刪除第一行之前的空行。

例子1：

首先通過^$找到空行，然後N將下一行添加到模式空間中去，/header/D 匹配模式空間的內容，所以D就刪除了第一個行，直到換行符爲止。

而此時模式空間還有內容，但由於不是空行，與/^$/不匹配所以就不會執行下面的腳本。

所以最終只刪除了模式空間的第一行。

例子2:

前面的過程是一樣的，通過D刪除了第一個行，直到換行符爲止。但由於模式空間中還有內容，必須處理, 與^$不匹配，不執行N，但與/header/D匹配，所以就刪除了第二行。所以header那一行也被刪除了。

主要是{}裏面的命令都會執行到前面的模式匹配上。

對於D,d,n,N，{},一定要注意區分。

(3)保留空間與模式空間

模式空間是一個活動緩衝，它在sed編輯器處理命令時保留被檢查的文本。然而，它並不是文本唯一可存儲的可用空間。

而保留空間是一個緩衝區，在處理模式空間的內容時，可以用保留空間暫時保留文本行。

sed 保留空間的命令

h 將模式空間的內容複製到保留空間

H 將模式空間的內容追加到保留空間

g 將保留空間的內容複製到模式空間

G 將保留空間的內容追加到模式空間

x 將模式空間的內容與保留空間的內容互換

[root@localhost chapter18]# cat <data2
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.

[root@localhost chapter18]# sed -n '/first/{
> h
> p
> n
> p
> g
> p
> }' data2
this is the first data line.
this is the second data line.
this is the first data line.

分析:

sed -n '/first/ 找到first那行。

h 將first那行復制到保留空間中去。

p 打印模式空間的那一行，即first那一行。

n 移動到下一行即second那一行。

p 打印模式空間的行即second.

g 將保留空間的複製到模式空間。

p 即打印first那一行。

[root@localhost chapter18]# sed -n '/first/{
h
n
p
g
p
}' data2
this is the second data line.
this is the first data line.

先打印了second,再打印了first.

(4)反轉整個文本

！表示否定命令。 !p表示不激活命令。

[root@localhost chapter18]# sed -n '/header/!p' data2
this is the first data line.
this is the second data line.
this is the last line.

即找到header時不打印。

反轉整個文本:

[root@localhost chapter18]# cat <data2
this is the header line.
this is the first data line.
this is the second data line.
this is the last line

[root@localhost chapter18]# sed -n '1!G;h;$p' data2
this is the last line.
this is the second data line.
this is the first data line.
this is the header line.

原理:

(1)將一行放到保留空間中

(2)將文本的下一行放到模式空間中去

(3)保留空間追加到模式空間中去

(4)將模式空間放到保留空間中去

(5)重複第2到4,直到所有的以相反的順序放到保留空間

(6)檢索並打印

$p表示最後一行纔打印。

(5)分支

分支命令允許在數據流的特定子集上執行命令。

[address] b [label]

address決定由哪一行或者是哪些行激發命令。

label決定於何處分支。如果label參數不存在，則分支命令分支到腳本結尾。

[root@localhost chapter18]# sed '2,3b;s/this is/is this/' data2
is this the header line.
this is the first data line.
this is the second data line.
is this the last line.

在2,3行進行分支，由於沒有label，則分支到了腳本的結尾。即跳過了2,3執行替換命令。

[root@localhost chapter18]# sed '/first/b jump1;s/ is/might be/;s/line/test/;:jump1 s/data/text/' data2
thismight be the header test.
this is the first text line.
thismight be the second text test.
thismight be the last test.

如果匹配first就跳到jump1處執行。如果不匹配就執行3個替換命令，包括jump1後的替換命令。

[root@localhost chapter18]# echo "this,is,a,test,to,remove,commas."|sed -n ':start s/,/ /p;/,/b start'
this is,a,test,to,remove,commas.
this is a,test,to,remove,commas.
this is a test,to,remove,commas.
this is a test to,remove,commas.
this is a test to remove,commas.
this is a test to remove commas.

:start 跳到start.

s/,/ /將,替換成空格。

/,/ b start如果有逗號就替換。

即刪除文本中的逗號。

(6)測試

[address]t [label]

如果替換命令成功匹配了並替換了一個模式，那麼測試命令分支到指定的標籤。

如:

[root@localhost chapter18]# cat <data2
this is the header line.
this is the first data line.
this is the second data line.
this is the last line.

[root@localhost chapter18]# sed 's/first/starting/;t;s/line/test/' data2
this is the header test.
this is the starting data line.
this is the second data test.
this is the last test.

找到first並且成功替換，則腳本跳過後續命令。

即第2行的line沒有被替換。

接着腳本讀下一行進行命令處理。所以第三行的被替換了。

[root@localhost chapter18]# echo "this,is,a,test,to,remove,commas."|sed -n ':start s/,/ /p;t start'
this is,a,test,to,remove,commas.
this is a,test,to,remove,commas.
this is a test,to,remove,commas.
this is a test to,remove,commas.
this is a test to remove,commas.
this is a test to remove commas.

如果成功替換了，則繼續執行start.

(7)模式替換

與號(&)是用來替換命令中的匹配模式，無論什麼樣的匹配模式在命令中都可以用與號來替換它。

[root@localhost chapter18]# echo "the cat sleeps in his hat"|sed -n 's/.at/"&"/p'
the "cat" sleeps in his hat

即把cat加雙引號。由於沒有g所以後面的hat沒匹配上。

sed使用圓括號定義替換模式中的子字符串元素。然後用特定的符號來引用了字符串元素，數字表示子字符串的位置，/1表示第一個圓括號，/2表示第二個圓括號。

[root@localhost chapter18]# echo "1234567"|sed ':start s//(.*[0-9]/)/([0-9]/{3/}/)//1,/2/p; t start'
1234,567
1,234,567
1,234,567

分析:

t測試前面的替換命令。

.*表示任意個字符。［0－9］以數字結尾。

[0-9]{3}以3數字結尾。

/1表示第一個括號。

/2表示第二個括號。

即把第一個括號與第二個括號中間加上一個逗號。

(8)行距

a.雙倍行距

[root@localhost chapter18]# sed '$!G' data2
this is the header line.

this is the first data line.

this is the second data line.

this is the last line.

G是將保留空間的內容追加到模式空間，保留空間的默認值是一個空行。$!G表示最抂行不添加。

b. 對有可能空行的文件使用雙倍行距

先刪除所有的空行，再使用雙倍行距。

[root@localhost chapter18]# cat <data6
This is line one.
This is line two.

This is line three.

This is line four.

[root@localhost chapter18]# sed '/^$/d;$!G' data6
This is line one.

This is line two.

This is line three.

This is line four.

(9)對文件中的行計數

=打印行號，N將下一行添加到模式空間，再用空格替換/n即可。

[root@localhost chapter18]# sed '=' data2|sed 'N;s//n/ /'
1 this is the header line.
2 this is the first data line.
3 this is the second data line.
4 this is the last line.

(10)打印文件最後幾行

打印文件的最後幾行可以造成滾動的效果。

[root@localhost chapter18]# sed ':start $q;N;5,$D;b start' /etc/passwd
samba1:x:503:503::/home/samba1:/bin/bash
testuser:x:507:507::/home/testuser:/bin/bash
root2:x:508:508::/home/root2:/bin/bash
chenjinzhong:x:509:509::/home/chenjinzhong:/bin/bash

:start一直循環，如果是最後一行($)，則退出(q).

如果是第5行，D刪除模式空間的第一行。

這樣會顯示最後4行。

(11)刪除行

a.刪除連續的空行

當遇到一個非空行與一個空行則不刪除。

[root@localhost chapter18]# sed '/./,/^$/!d' data6
This is line one.
This is line two.

This is line three.

This is line four

b. 刪除開頭的空行

即任意字符到結尾都不會被刪除。

[root@localhost chapter18]# cat <data7

this is the first line

this is the second line.
[root@localhost chapter18]# sed '/./,$!d' data7
this is the first line

this is the second line.

c.刪除結尾的空行

[root@localhost chapter18]# cat <data8
this is the first line
this is the second line

[root@localhost chapter18]# sed ':start /^/n*$/ {$d;N;b start}' data8
this is the first line
this is the second line

分析: ^/n*$ 即可能包括一個或多個空行。 /n* 0個或多個。 ^$一個空行。所以整個就是一個或多個空行。

如果是最後一行則刪除$d,如果不是最後一行，則將下一行添加到模式空間中去。

3. gawk的高級用法

(1)變量

a. gawk字段分隔符與記錄分隔符。

FIELDWIDTHS 以空格分割的數字列表，用空格定義每個數據字段的精確寬度

FS 輸入字段分割符

RS 輸入記錄分割符

OFS 輸出字段分割符

ORS 輸出記錄分割符

[root@localhost chapter19]# cat data1
data11,data12,data13,data14,data15
data21,data22,data23,data24,data25
data31,data32,data33,data34,data35
[root@localhost chapter19]# gawk 'BEGIN {FS=",";OFS="-"} { print $1,$2,$3}' data1
data11-data12-data13
data21-data22-data23
data31-data32-data33

FS指定輸入字段分割符，而OFS指定輸出字段分割符。

RS記錄輸入分割符。

[root@localhost chapter19]# cat <data2
Riley Mullen
123 Main Street
Chicago,1L 60601
(312)555-1234

Frank Williams
456 Oak Street
Indianapolis, IN 46201
(317)555-9876

Haley Snell
4231 ELM street
Detroit,MI 48201
(313)555-4938
[root@localhost chapter19]# gawk 'BEGIN{FS="/n";RS=""}{printf "%s%s/n",$1,$4}' data2
Riley Mullen(312)555-1234
Frank Williams(317)555-9876
Haley Snell(313)555-4938

b.關聯數組

關聯數組使用文本而不是使用數值作爲數組的索引。

[root@localhost chapter19]# gawk 'BEGIN{print ENVIRON["PATH"];print ENVIRON["HOME"]}'
/usr/lib/qt-3.3/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/opt/real/RealPlayer:/root/bin:/opt/real/RealPlayer:/opt/real/RealPlayer:/opt/real/RealPlayer:/opt/real/RealPlayer
/root

ENVIRON變量使用關聯數組來檢查shell變量。

[root@localhost chapter19]# gawk 'BEGIN{FS=":";OFS=":"}{print $1,$NF}' /etc/passwd
root:/bin/bash
bin:/sbin/nologin
daemon:/sbin/nologin
adm:/sbin/nologin
lp:/sbin/nologin
sync:/bin/sync
shutdown:/sbin/shutdown
halt:/sbin/halt
mail:/sbin/nologin

FS輸入字段分割符爲:，輸出字段分割符也爲:

NF表示最後一個字段的數組，那麼$NF就表示最後一個字段的值。

c.用戶自定義變量

[root@localhost chapter19]# gawk 'BEGIN{testing="this is a test";print testing;}'
this is a test

定義一個testing變量，然後print輸出。

[root@localhost chapter19]# cat <script1
BEGIN{FS=","}
{print $n}
[root@localhost chapter19]# gawk -f script1 n=2 data1
data12
data22
data32

-f執行腳本。設置變量n的值。但有一個問題，在命令行中設置的n的值在代碼的BEGIN部分不能使用。可以通過-v來解決。

[root@localhost chapter19]# cat <script2
BEGIN{ print "the starting value is",n;FS=","}
{ print $n }
[root@localhost chapter19]# gawk -f script2 n=3 data1
the starting value is
data13
data23
data33

[root@localhost chapter19]# gawk -f script2 -v n=3 data1
the starting value is 3
data13
data23
data33

這樣在BEGIN部分就能使用了。

(2)數組

定義數組變量:

var[index]=element

#!/bin/bash
gawk '
BEGIN{
var[0]=1
var[2]=3
total=var[0]+var[2]
print total
}'

(3)使用模式

匹配操作符：～

#!/bin/bash
gawk '
BEGIN{
FS=":"
}
{
if ($1 ~/root/)
print $1,$NF

}' /etc/passwd

$1 ~/root/ 表示第一個字段匹配root.

(4)結構化命令

if語句:

[root@localhost chapter19]# cat <data4
10
5
13
50
34

#!/bin/bash
gawk '{
if($1>20){
x=$1*2
print x
}
}' data4

while語句:

[root@localhost chapter19]# cat <data5
100 110 120
130 140 150
160 170 180

#!/bin/bash
gawk '{
total=0
i=1
while(i<4){
total+=$i
i++

}
avg=total/3
print "Average:",avg

}' data5

結果:

[root@localhost chapter19]# ./script8
Average: 110
Average: 140
Average: 170

當while計算出第一個記錄的值時，會去讀下一條記錄，然後去計算，記錄是以換行符爲分隔符的。

(5)格式化打印

printf "format string",var1,var2.....

#!/bin/bash
gawk '
BEGIN{
FS="/n"
RS=""
}
{
printf "%s-%s/n",$1,$4
}' data2

結果是:

[root@localhost chapter19]# ./script10
Riley Mullen-(312)555-1234
Frank Williams-(317)555-9876
Haley Snell-(313)555-4938

關於awk和sed的高級編程就介紹到這裏了。

Linux Shell 高級編程(下)

Linux 通用塊設備層基礎之buffer_head

Linux shell高級編程(上）

Linux網絡編程之IPv6

Linux VFS相關結構體

Linux Shell基本編程(下)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結