文件寫入的工作原理

原創

aaabbbbttt

2019-07-19 13:03

摘要

做運維的同學估計很多都遇到過如下這個問題：

程序啓動了多個線程或多個進程，這些線程或進程都會寫入一個文件，這時就有可能會造成文件錯亂的情況，也就是多個線程或進程同時寫入一個文件，造成這個文件錯亂了，有些行被插入到了另一些行裏去了。

這時很多同學想到了可以用文件鎖來解決這個問題，很好，但你知不知道觸發文件錯亂是有一定條件的，在一次寫入文件很小的情況下是不會造成文件錯亂的。

正文

操作系統最小原子的概念。

其實對於Linux系統，有一個最小操作原子的變量，有的是1024bytes，有的是4096bytes，如果一次寫入不超過這個閥值，是不會引起文件錯亂的。

下面我貼出一個shell腳本來模擬這種情況。

# ./test_appends.sh 4096Launching 20 worker processes
Each line will be 4096 characters long
Waiting for processes to exit
Testing output file
.......................[snip]....
All's good! The output file had no corrupted lines.
# ./test_appends.sh 4097Launching 20 worker processes
Each line will be 4097 characters long
Waiting for processes to exit
Testing output file
.......................[snip]....Found 27 instances of corrupted lines

#############################################################################

# This script aims to test/prove that you can append to a single file from

# multiple processes with buffers up to a certain size, without causing one

# process' output to corrupt the other's.

# The script takes one parameter, the length of the buffer. It then creates

# 20 worker processes which each write 50 lines of the specified buffer

# size to the same file. When all processes are done outputting, it tests

# the output file to ensure it is in the correct format.

#############################################################################

NUM_WORKERS=20

LINES_PER_WORKER=50

OUTPUT_FILE=/tmp/out.tmp

# each worker will output $LINES_PER_WORKER lines to the output file

run_worker() {

worker_num=$1

buf_len=$2

# Each line will be a specific character, multiplied by the line length.

# The character changes based on the worker number.

filler_len=$((${buf_len}-1)) # -1 -> leave room for \n

filler_char=$(printf \\$(printf '%03o' $(($worker_num+64))))

line=`for i in $(seq 1 $filler_len);do echo -n $filler_char;done`

for i in $(seq 1 $LINES_PER_WORKER)

echo $line >> $OUTPUT_FILE

done

}

if [ "$1" = "worker" ]; then

run_worker $2 $3

exit

buf_len=$1

if [ "$buf_len" = "" ]; then

echo "Buffer length not specified, defaulting to 4096"

buf_len=4096

rm -f $OUTPUT_FILE

echo Launching $NUM_WORKERS worker processes

for i in $(seq 1 $NUM_WORKERS)

$0 worker $i $buf_len &

pids[$i]=${!}

done

echo Each line will be $buf_len characters long

echo Waiting for processes to exit

for i in $(seq 1 $NUM_WORKERS)

wait ${pids[$i]}

done

# Now we want to test the output file. Each line should be the same letter

# repeated buf_len-1 times (remember the \n takes up one byte). If we had

# workers writing over eachother's lines, then there will be mixed characters

# and/or longer/shorter lines.

echo Testing output file

# Make sure the file is the right size (ensures processes didn't write over

# eachother's lines)

expected_file_size=$(($NUM_WORKERS * $LINES_PER_WORKER * $buf_len))

actual_file_size=`cat $OUTPUT_FILE | wc -c`

if [ "$expected_file_size" -ne "$actual_file_size" ]; then

echo Expected file size of $expected_file_size, but got $actual_file_size

else

# File size is OK, test the actual content

# Only use newer versions of grep because older ones are way too slow with

# backreferences

[[ $(grep --version) =~ [^[:digit:]]*([[:digit:]]+)\.([[:digit:]]+) ]]

grep_ver="${BASH_REMATCH[1]}${BASH_REMATCH[2]}"

if [ "$grep_ver" -ge "216" ]; then

num_lines=$(grep -v "^$.$\1\{$((${buf_len}-2))\}$" $OUTPUT_FILE | wc -l)

else

# Scan line by line in bash, which isn't that speedy, but is good enough

# Note: Doesn't work on cygwin for lines < 255

line_length=$((${buf_len}-1))

num_lines=0

for line in `cat $OUTPUT_FILE`

if ! [[ $line =~ ^${line:0:1}{$line_length}$ ]]; then

num_lines=$(($num_lines+1))

fi;

echo -n .

done

echo

if [ "$num_lines" -gt "0" ]; then

echo "Found $num_lines instances of corrupted lines"

else

echo "All's good! The output file had no corrupted lines. $size"

rm -f $OUTPUT_FILE

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

文件寫入的工作原理

螞蟻面試：Springcloud核心組件的底層原理，你知道多少？

認知提升的方法

C#開源的兩款功能強大的錄屏神器

記一次yum故障的問題解決

文件寫入的工作原理

python處理大文件的內存問題

select.select()文件句柄的限制

Ansible ad-hoc的command和shell模塊的區別

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結