最近業務中涉及到遠程服務器的日誌收集需求, 出於限制技術棧擴大的想法,使用PHP進行了實現.
實現過程中有些小小需要注意的點,記錄如下:
1. 主動獲取. 由於服務器較多, 如果使用Flume之類的架構, 需要在每臺服務器上安裝軟件, 這就產生了運維成本 . 所以我們使用 收集端主動獲取的方式. 不需要在生產者(服務端)安裝軟件.
2.SSH連接. 每臺服務器都配置了SSH連接權限,使用PHP的 ssh2擴展即可遠程連接並訪問服務器內容.
3.服務器日誌結構統一. 每臺服務器上的日誌文件都按同一目錄 規則放置,以簡化程序邏輯.
4.CLI運行. 收集是持續運行的程序,使用CLI模式,要注意,此時所使用的INI文件問題.
5.SSH連接異常. 有時,由於網絡問題,導致SSH連接或驗證失敗, 延時重試即可.
6.日誌截斷與壓縮. 通常,我們的運維會在每天的固定時間對日誌進行截斷和壓縮, 這就有了兩種類型的文件需要讀取:壓縮與未壓縮的日誌, 需要分別處理.
7.日誌中的時間戳. 以秒爲單位 的時間戳不足以區分請求, 我們增加$msec以毫秒計量, 同一毫秒內,同一IP來源,同一UA的可以認爲是一個請求.
8.讀取目錄. 使用readdir即可讀取SSH格式的遠程目錄, readdir("ssh2.sft://......"); 過濾掉不需要的文件後, 按文件創建時間排序,逐個處理.
9.讀取壓縮文件. 如果用file_get_contents會導致界面長期無響應, 我使用了fopen, fread 分步讀取. 一次讀取8K(再大也沒有用了). 每讀取一定次數後,輸出一個進度顯示.
10.壓縮文件緩存. 讀取成功後, 保存到緩存目錄 , 以便備份以及下次使用. 如果程序出錯或重新運行時, 先檢查緩存目錄, 如果有緩存文件,就不用從網絡上讀取了.
11.解壓縮. 使用gzdecode即可. 這會導致PHP內存需要暴增, 調整PHP.INI吧, 把內存限制擴大.
12.壓縮日誌處理完成記錄. 處理完成一個壓縮文件後, 在數據庫中記錄下來, 以後PHP程序運行後,就不用重複處理了.
13.未壓縮日誌處理. 未壓縮的日誌表明,此日誌仍在增長中. 不需要緩存. 使用數據庫記錄,當前文件指針(使用ftell,fseek). 記錄文件創建日期.
14.未壓縮日誌判斷. 當文件日期與記錄的日期不同時, 或文件小於記錄中的文件大小, 說明 此文件被更新了, 需要重置文件指針.
否則可以直接定位(fseek),以繼續從上次處理的位置進行.
15.日誌行分解. 使用正則即可,根據空格及定界符進行區分. 也可使用logParser第三方類庫來處理. 爲節省內存開銷.可使用Iterator 協程模式, 逐行返回.
16.日誌判重. 事先讀取每個服務器的最後 日誌時間戳(毫秒)以及IP,UA.
17.日誌保存. 我是使用了MYSQL來保存日誌. 每一行日誌執行一次MYSQL會極大浪費運行時間, 可以累積4000行再一次性插入.
18.錯誤處理. 除了SSH連接失敗外, 還會讀取半行日誌,導致分解失敗, 此時也拋出異常. 由主程序捕獲,並重新運行即可.
源程序如下:
<?php
/**
* Created by IcePHP Framework.
* User: 藍冰大俠
* Date: 2018/4/11
* Time: 15:09
*/
class MLogImport
{
/**
* 當前正在處理的站點名稱
* @var string
*/
private $feed;
/**
* 當前正在處理的站點主機地址
* @var string
*/
private $host;
/**
* 當前站點的登錄名/站點名稱
* @var string
*/
private $user;
/**
* 當前站點的登錄密碼
* @var string
*/
private $pass;
/**
* 當前站點的日誌所在目錄
* @var string
*/
private $logPath;
//用於本地保存服務器日誌的目錄,僅備份壓縮後的日誌
const CACHE_PATH = DIR_ROOT . 'run/serverLog/';
/**
* 當前站點的SSH連接
* @var resource
*/
private $sftp;
/**
* 當前正在處理的文件名
* @var string
*/
private $file;
/**
* 本站最後一條日誌
* @var array
*/
private $lastRow;
private $agentPatterns;
private $agentReplaces;
public function __construct()
{
$maps = [
'/AppleWebKit\/[\d\.]*/i' => 'AppleWebKit/...',
'/Mobile\/[\d\w]*/i' => 'Mobile/...',
'/Safari\/[\d\.]*/i' => 'Safari/...',
'/CriOS\/[\d\.]*/i' => 'CriOS/...',
'/GSA\/[\d\.]*/i' => 'GSA/...',
'/Version\/[\d\.]*/i' => 'Version/...',
'/Chrome\/[\d\.]*/i' => 'Chrome/...',
'/Edge\/[\d\.]*/i' => 'Edge/...',
'/Firefox\/[\d\.]*/i' => 'Firefox/...',
'/SamsungBrowser\/[\d\.]*/i' => 'SamsungBrowser/...',
'/build\/[\d\w\-\.]*/i' => 'build/...',
'/Silk\/[\d\.]*/i' => 'Silk/...',
'/Crosswalk\/[\d\.]*/i' => 'Crosswalk/...',
'/Gecko\/[\d\.]*/i' => 'Gecko/...',
'/NTENTBrowser\/[\.\d]*/i' => 'NTENTBrowser/...',
'/Snapchat\/[\d\w\-\.]*/i' => 'Snapchat/...',
'/Java\/[\d\.]*/i' => 'Java/...',
'/UCBrowser\/[\d\.]*/i' => 'UCBrowser',
'/\(Linux[^\)]*SAMSUNG[^\)]*\)/i' => 'SAMSUNG...',
'/\([^\)]*IPAD[^\)]*\)/i' => 'IPAD...',
'/\([^\)]*SM-[^\)]*\)/i' => 'SM...',
'/LG\-[\w\d]*/i' => 'LG...',
'/LGL\d[\w\d]*/i' => 'LGL...',
'/itel it\d*/i' => 'itel...',
'/XT\d*/i' => 'XT...',
'/TECNO\-[\w\d]*/i' => 'TECNO...',
'/RCT[\d\w]*/i' => 'RCT...',
'/Micromax\s[\w\d]*/i' => 'Micromax...',
'/LGMS[\d]*/i' => 'LGMS...',
'/GT\-[\w\d]*/i' => 'GT...',
'/HUAWEI\s[\-\w\d]*/i' => 'HUAWEI...',
'/Lenovo\s[\-\w\d]*/i' => 'Lenovo...',
'/SCH\-[\w\d]*/i' => 'SCH...',
'/rv\:[\d\.]*/i' => 'rv:...',
'/Lumia\s\d+/i' => 'Lumia...',
'/Instagram\s[\d\.]*/i' => 'Instagram...',
'/iPhone OS 5[_\d]*/i' => 'iOS 5...',
'/iPhone OS 6[_\d]*/i' => 'iOS 6...',
'/iPhone OS 7[_\d]*/i' => 'iOS 7...',
'/iPhone OS 8[_\d]*/i' => 'iOS 8...',
'/iPhone OS 9[_\d]*/i' => 'iOS 9...',
'/iPhone OS 10[_\d]*/i' => 'iOS 10...',
'/iPhone OS 11[_\d]*/i' => 'iOS 11...',
'/iOS 5[_\d]*/i' => 'iOS 5...',
'/iOS 6[_\d]*/i' => 'iOS 6...',
'/iOS 7[_\d]*/i' => 'iOS 7...',
'/iOS 8[_\d]*/i' => 'iOS 8...',
'/iOS 9[_\d]*/i' => 'iOS 9...',
'/iOS 10[_\d]*/i' => 'iOS 10...',
'/iOS 11[_\d]*/i' => 'iOS 11...',
'/Android 2[\.\d]*/i' => 'Android 2...',
'/Android 3[\.\d]*/i' => 'Android 3...',
'/Android 4[\.\d]*/i' => 'Android 4...',
'/Android 5[\.\d]*/i' => 'Android 5...',
'/Android 6[\.\d]*/i' => 'Android 6...',
'/Android 7[\.\d]*/i' => 'Android 7...',
'/Android 8[\.\d]*/i' => 'Android 8...',
'/QuantcastSDK[^\s]*(\s\(\d+\))?/i' => 'QuantcastSDK...',
];
$this->agentPatterns = array_keys($maps);
$this->agentReplaces = array_values($maps);
}
/**
* 記錄一個站點的賬號,密碼,日誌路徑
* @param string $host 主機/賬號
* @param string $user 站點名稱/登錄名
* @param string $pass 登錄密碼
* @param string $logPath 日誌文件路徑
*/
public function site(string $host, string $user, string $pass, string $logPath): void
{
$this->feed = $user;
$this->host = $host;
$this->user = $user;
$this->pass = $pass;
$this->logPath = $logPath;
//重新連接的間隔時間
$interval = 1;
$connect = null;
while (true) {
//連接主機
$connect = ssh2_connect($this->host, '22');
//賬號密碼驗證成功
if (false !== ssh2_auth_password($connect, $this->user, $this->pass)) {
break;
}
//間隔時間2秒,4,8,...
$interval *= 2;
echo "auth wrong at $this->host, retry after $interval seconds\r\n";
//間隔指定 時間後,重新連接
sleep($interval);
}
//登錄成功
echo "\r\nlogin $this->feed\r\n";
//讀取文件列表
$this->sftp = ssh2_sftp($connect);
if(!$this->sftp){
throw new Exception('ssh2_sftp fail.');
}
$handle = opendir("ssh2.sftp://{$this->sftp}{$this->logPath}"); //ssh2.sftp://Resource #33/home/.....
if (!$handle) {
throw new Exception('open dir ssh2.sftp fail.');
}
$zippedFiles = [];
$unzippedFile = '';
while (false !== ($file = readdir($handle))) {
$filePath = "ssh2.sftp://{$this->sftp}{$this->logPath}/$file";
//必須是文件,目錄的不要
if (!is_file($filePath)) continue;
//必須是訪問日誌
if (left($file, 10) !== 'access.log') continue;
//如果是壓縮文件
if (substr($file, -3) === '.gz') {
//4.5之前的不處理(這天改格式了)
if (substr($file, 11, 8) < '20180405') continue;
$zippedFiles[] = $file;
} else {
$unzippedFile = $file;
}
}
closedir($handle);
//本站最後請求時間
$this->lastRow = table('log')->row('*', ['feedName' => $this->feed], 'id desc')->toArray();
//按創建時間正序排序
asort($zippedFiles);
//逐個文件處理壓縮文件
foreach ($zippedFiles as $file) {
$this->file = $file;
$this->zipped();
}
//如果有非壓縮日誌,處理
if ($unzippedFile) {
$this->file = $unzippedFile;
$this->unzipped();
}
}
/**
* 讀取遠程 文件內容
* @param $indicator string 遠程 文件指示器
* @param $size int 文件大小
* @return Iterator 遍歷器
*/
private function readUnzipped(string $indicator, int $size): Iterator
{
echo "Begin read File:$this->file:" . STool::kmgt($size) . "\r\n";
//打開文件,指向上次讀取的位置
$f = fopen($indicator, 'r');
if (!$f) {
return;
}
if ($this->offset) {
fseek($f, $this->offset);
echo "Seek to $this->offset\r\n";
}
//總行數
$lines = 0;
//逐行讀取
while (!feof($f)) {
$lines++;
$line = fgets($f);
//更新偏移量
$this->offset = ftell($f);
//返回行數
yield $line;
//每200行輸出一個顯示
if ($lines % 500 == 0) {
echo "read $this->feed $this->file Lines:$lines\r\n";
}
}
fclose($f);
echo "read $this->feed $this->file Lines:$lines\r\n";
echo "End.\r\n";
}
/**
* 讀取遠程 文件內容
* @return string 緩存文件路徑
*/
private function readZipped(): string
{
//構造遠程文件地址
$indicator = "ssh2.sftp://$this->sftp$this->logPath/$this->file";
//文件大小
$fileSize = filesize($indicator);
$size = STool::kmgt($fileSize);
//如果有緩存文件且緩存文件大小一致,則使用緩存文件
$cacheFile = self::CACHE_PATH . $this->feed . '/' . $this->file;
if (is_file($cacheFile) and filesize($cacheFile) == $fileSize) {
echo "Read Zipped File From Cache:" . $this->file . ' ' . $size . "\r\n";
return $cacheFile;
}
//從服務器讀文件
echo "Begin read File:{$this->file}:" . $size . "\r\n";
$fileHandle = fopen($indicator, 'rb');
if (!$fileHandle) {
dump($indicator, 'OPEN FAIL');
exit;
}
//讀取遠程文件內容
$content = '';
$i = 0;
while (!feof($fileHandle)) {
//每次能讀回8K字節
$content .= fread($fileHandle, 65536);
//每128K顯示一次讀取進度
$i++;
if ($i % 16 == 0) {
echo "$this->feed $this->file Reading :" . STool::kmgt(strlen($content)) . "/$size\r\n";
}
}
fclose($fileHandle);
//保存到緩存文件中
echo "Save to cache:" . $cacheFile . " \r\n";
makeDir(dirname($cacheFile));
file_put_contents($cacheFile, $content);
//返回壓縮文件內容
return $cacheFile;
}
/**
* 字符串分行
* @param string $content
* @return Iterator
*/
public function explode(string $content): Iterator
{
$size = strlen($content);
$pointer = 0;
while ($pointer < $size) {
$next = strpos($content, "\n", $pointer);
if ($next === false) {
$line = substr($content, $pointer);
$next = $size;
} else {
$line = substr($content, $pointer, $next - $pointer);
}
yield $line;
$pointer = $next + 1;
}
}
private function valid(string $url): bool
{
return false !== strpos($url, '/?s=') or false !== strpos($url, '/?ss=') or preg_match('/^\/.*\/.*\/$/i', $url);
}
/**
* 處理一個壓縮日誌文件
*/
private function zipped(): void
{
//檢查文件已經處理過
$fileTable = table('zipped');
if ($fileTable->exist(['feedName' => $this->feed, 'fileName' => $this->file])) return;
//讀取文件內容
$gz=gzopen($this->readZipped(),'r');
echo "\r\nBegin Process File\r\n";
//$memTable = $this->createTemporaryTable(uniqid('tmp_'));
//要插入的日誌表
$logTable = table('log');
//要插入的行緩衝區
$rows = [];
$insertRowsCount = 0;
$content = null;
$key=0;
while(!gzeof($gz)) {
$line=gzgets($gz);
if ((++$key) % 30000 == 0) {
echo "Analysis LINES:$key\r\n";
}
//空行不處理
$line = trim($line);
if (!$line) continue;
//行分解
$parts = $this->explodeLine($line);
if (!$parts) continue;
//判斷 是否是 搜索 行
if (!$this->valid($parts['url'])) continue;
//檢查是否已經處理過
if ($this->lastRow) {
if ($parts['timestamp'] < $this->lastRow['timestamp']) continue;
if ($parts['timestamp'] == $this->lastRow['timestamp'] and $parts['url'] == $this->lastRow['url'] and $parts['ip'] == $this->lastRow['ip']) {
continue;
}
}
//加入緩衝 區
$parts['feedName'] = $this->feed;
$rows[] = $parts;
//每4000行執行一次插入,再多就會出現placeholder太多
if (count($rows) >= 4000) {
$logTable->inserts($rows);
$insertRowsCount += count($rows);
SDebug::clearMsgs();
$rows = [];
}
}
//處理最後剩餘的行
if (count($rows)) {
$logTable->inserts($rows);
$insertRowsCount += count($rows);
SDebug::clearMsgs();
}
echo "insert LINES:$insertRowsCount\r\n";
//標記此文件已經處理過
//$fileTable->begin();
//$this->move($memTable);
$fileTable->insert(['feedName' => $this->feed, 'fileName' => $this->file]);
//$fileTable->commit();
}
/**
* 將臨時表中的日誌轉移到正式表中
* @param STable $memTable 臨時表對象
*/
private function move(STable $memTable)
{
$fields = ['feedName', 'accessTime', 'timestamp', 'ip', 'requestTime', 'responseTime', 'method', 'url', 'code', 'length', 'referrer', 'agentId', 'created', 'updated', 'forward'];;
$fieldsStr = implode(',', $fields);
$memTable->execute("Insert" . " Into log($fieldsStr) select $fieldsStr from " . $memTable->name());
$memTable->deleteAll();
}
/**
* 當前文件的偏移
* @var int
*/
private $offset;
/**
* 處理一個未壓縮的日誌文件
*/
private function unzipped(): void
{
//檢查上次處理情況
$fileTable = table('unzipped');
//如果沒有記錄,則生成一條初始記錄
if ($fileTable->notExist(['feedName' => $this->feed])) {
$fileTable->insert(['feedName' => $this->feed, 'offset' => 0, 'size' => 0, 'timestamp' => 0]);
}
//取出處理信息,其中包含 offset(上次文件指針位置),size(上次文件大小), lasttime(上次最後時間)
$info = $fileTable->row('*', ['feedName' => $this->feed]);
//構造遠程文件地址
$indicator = "ssh2.sftp://$this->sftp$this->logPath/$this->file";
//文件大小
$fileSize = filesize($indicator);
//文件變小了, 說明是新文件
if ($fileSize < $info['size']) {
$this->offset = 0;
} else {
// 取首行
$f = fopen($indicator, 'r');
$firstLine = fgets($f);
fclose($f);
$first = $this->explodeLine($firstLine);
$timestamp = $first['timestamp'];
if ($timestamp > $info['timestamp']) {
$this->offset = 0;
} else {
$this->offset = $info['offset'];
}
}
echo "\r\nBegin Process File\r\n";
//要插入的日誌表
$logTable = table('log');
//要插入的行緩衝區
$rows = [];
$insertedRowsCount = 0;
$iterator = $this->readUnzipped($indicator, $fileSize);
$lastTime = 0;
foreach ($iterator as $key => $line) {
//空行不處理
$line = trim($line);
if (!$line) continue;
//分解 日誌行
$parts = $this->explodeLine($line);
if (!$parts) continue;
//判斷 是否是 搜索 行
if (!$this->valid($parts['url'])) continue;
//判斷是否已經導入
if ($this->lastRow and (floatval($parts['timestamp']) < floatval($this->lastRow['timestamp']))) continue;
$rows[] = array_merge($parts, [
'feedName' => $this->feed
]);
//最大的時間戳
$lastTime = $parts['timestamp'];
//批量插入
if (count($rows) >= 100) {
$insertedRowsCount += count($rows);
$logTable->inserts($rows);
$fileTable->update(['size' => $fileSize, 'offset' => $this->offset, 'timestamp' => $lastTime], ['feedName' => $this->feed]);
echo "Insert LINES:$insertedRowsCount\r\n";
SDebug::clearMsgs();
$rows = [];
}
}
//處理最後剩餘的行
if (count($rows)) {
$insertedRowsCount += count($rows);
$logTable->inserts($rows);
$fileTable->update(['size' => $fileSize, 'offset' => $this->offset, 'timestamp' => $lastTime], ['feedName' => $this->feed]);
echo "Insert LINES:$insertedRowsCount\r\n";
SDebug::clearMsgs();
}
}
/**
* 分解一行日誌
* @param $line string
* @return array
* @throws Exception 匹配失敗
*/
private function explodeLine(string $line): array
{
//[08/Apr/2018:03:30:17 +0800] 1523129417.075 72.178.128.43 - 0.114 - "GET /index.php/blog/search/?s=lowering%20ldl%20cholesterol&subid=tgr_zhen_BFX0J1ILON6N__rmlwuf_73004751 HTTP/1.1" 499 0 "http://168634854.keywordblocks.com/Cholesterol_Hdl_Ldl_Ratio.cfm?&vsid=1661264105777118&vi=1523124812717930856&dytm=1523124813100&kbbq=%26sde%3D1%26adepth%3D1%26ddepth%3D3&tdAdd[]=%7C%40%7Csde%3D1%7C%40%7Cadepth%3D1%7C%40%7Cddepth%3D3&sbdrId=135&vgd_matchstr=CommercialUrlOn%7Chlid%3D2002&matchstring=CommercialUrlOn%7Chlid%3D2002&vgd_bdata=ss%3D320x568%7C%7CMM%3D1.0%7C%7Cbb%3D145%7C%7CMP%3D.*%2Fcholesterol-management%2F.*%7C%7Cfbb%3D0%7C%7CRB%3D34.18110604079318%7C%7Cbtd%3D2341877441767294977%7C%7Ccbid%3D34.18110604079318%7C%7CMB%3D15.0%7C%7CMC%3DAUTO%7C%7Curl_l%3D50%7C%7Chour_group_l%3D20%7C%7CRImp%3D9.0%7C%7Cbid%3D15.1%7C%7Cdevice_l%3D20%7C%7CisRef%3D0&verid=111299&acid=427573913652889251523124810846&hvsid=00001523124812870012196577713996&upk=1523124813.1380&sttm=1523124812870&=&kp=1&kbc=143697&bdrid=4&subBdr=135&kt=266&ki=5912010&ktd=274911461948&kbc2=rpc%3D0.14&fdkt=266&lkpgd=UUID%3Duuid_s8_3_1523124813_778621763%7C%7CSI%3D863%7C%7CMPTD%3D232%7C%7CPTD%3D6922032661652308480%7C%7CSID%3D14%7C%7CCI%3D863%7C%7CMN%3D8%7C%7Cerpm%3D-1.0%7C%7CMI%3D863%7C%7CKTGD%3D3866%7C%7CKSE%3D1523124813242%7C%7CAN%3D5%7C%7CHID%3D3%7C%7CPTD2%3D16896&&lktgd=3866&&fp=biwFab2EOSptF9Dp9P5pLIuIHpVTe2ha94T5u6HCtebISTPUlc1la6_ujtvHa-nb8nGHPkJ_EnIwZF7mo3KnR2p3XYd1wmF70O9szYDQ9ufyP0OyS-gxVg%3D%3D&c=O5LJq2Lix-2w0IdspaXDCw&cme=rs5xevxSmJb0u22ZZHKqUTjYupvdJAHcw4kmb0sBhK6UBgyb-EKIO8Yg8DI2Uv0ZcpIG4AQvPb75jBLoeAG5VMn2cBgcO0Er9uHnU2G2b5527aplb-EHrVG_De8s_c_9-9bkhpH6jUmk3eK5uGthWBagtuatdg2SBe72cEUSh9aPY9sVJrkoOPaGQsQOH5rqAz1TMLK3_fisF-ozH6JyNg%3D%3D%7C%7CNDHRnZ9Gz3KXlI-i9OnZqQ%3D%3D%7C5gDUJdTGiJzedmq9hanWYg%3D%3D%7CtrJ5NInYpv_AyRdJRHyQbAoA6iGqXTxu%7CRrUTbnOe6Nf9cTuAtIJVy9no3H-wuOVy%7CN7fu2vKt8_s%3D%7Cl44MelaykDW0jQJG6bjukdQlinX0DB9oV4Sm9gijr_bD43Zl1UaHw39JatxHgP46euFaB3PMdSqZJqb8JKnexHrlF_K3RJ5R%7CJf0d-WoAdPuDA6UD6Gc_F1zJX7Ucny7osFvXic8Z4MU%3D%7Cue9AR4Lxeuwq7AuXzY3UTfqQIZ7T1ETAepQ5ZjhMUrn8F4iL72pDJxv9w1vxSK2jeiEactQl6VTIdrnkiwcfmH0laLhDYgMhmFyUaT5z0ZmFu4kbMwh587f73k-Z2prl2NRyNqvZoZL_mL9UwcCaoUiGM916VV0SyiuEizF5kMH-PgMGZNtaVAulY1i6cP1h%7C&ib=0&cid=8CU12LGKP&crid=285618735&size=300x250&lpid=&tsid=1&ksu=233&chid=&https=0&kwdsMaxTm=400&ugd=3&maxProviderPixel1=%2F%2Fc.ad-srv.co%2Fpixel&maxProviderPixel2=%2F%2Fc.adyield.co%2Fpixel&rms=1523124813&&sc=TX&asn=11427&kals=base&kalog=SI%3D863%7C%7CTPTD%3D516%7C%7CCI%3D863%7C%7CUUID%3Duuid_s12_nc1b_4_1523124812_210054693%7C%7CSID%3D11%7C%7CHID%3D4%7C%7CMI%3D863%7C%7CMPTD%3D176&kasts=tstype%3DBASE_BAG%7C%7C&kata=8ce5&clsKb=2&ecref=w77E%3ASS7mE8NQ.BJGYO.NmYS7mE8NSuSTOj%2BTJeJjQ%2BImLY1j%2BD1zyJS%3Fx7YMN1YE18yzvJY4RuuT%26x7YM7JLYvBw17n8QnzmLY1jnjOjnNwmjJQ7JLmj%26x7YMQmxLNJv%26x7YMYJO8xYvG%26x7YMNmz7Jz7vfHFWiXW9FiuH%26yM7yv%26yM78vUBOofiiAXAf9uWuH%26yMOJvY%26yMOYv%26yM1Evu7f%26yMzBvy%26yMN8vu9HHHfuWWX%26yM18vX9hiuHFXXFA%26yMjEvi9fhX9A%26yMj8v%26UvBw17n8QnzmLY1jnjOjnNwmjJQ7JLmj%26yNj8Ov%3Dd9C%3DgdBfCqpRD%3DfKDVQK6rMLAkw9YDzi%20YAVJOGawyN077a0Ezk7z8903%20AuogkUFBTjI%20%3DK8jODdV1K8Qg4KTBMBNR&kct=20512&abpl=2" "Mozilla/5.0 (iPhone; CPU iPhone OS 11_2_6 like Mac OS X) AppleWebKit/604.5.6 (KHTML, like Gecko) Version/11.0 Mobile/15D100 Safari/604.1" "-"
$ns = '([^\s]*)';
$str = '"([^"]*)"';
$datetime = '\[([^\]]*)\]';
//正則匹配
$matched = preg_match("/$datetime $ns $ns \- $ns $ns $str $ns $ns $str $str $str/i", $line, $matches);
if (!$matched) {
throw new Exception('NOT MATCH');
}
//空格區別 MODE URL HTTP協議
list($mode, $url, $protocol) = explode(' ', $matches[6]);
return [
'accessTime' => datetime(strtotime($matches[1])), //訪問時間(秒)
'timestamp' => floatval($matches[2]),//訪問時間戳(帶毫秒)
'ip' => $matches[3], //請求者IP
'requestTime' => floatval($matches[4]), //Nginx處理請求的時間
'responseTime' => floatval($matches[5]), //Nginx完成整個響應的時間
'method' => $mode, //GET/POST/...
'url' => $url, //請求地址
'code' => $matches[7], //響應代碼
'length' => intval($matches[8]), //響應正文長度
'referrer' => left($matches[9], 250), //引用
'agentId' => $this->getAgentId($matches[10]), //用戶代理
'forward' => $matches[11] //真實IP
];
}
//獲取Agent與ID的對應關係
private function getAgentMap()
{
$rows = table('agent')->select('id,agent', null, 'agent')->toArray();
return array_column($rows, 'id', 'agent');
}
//根據一個Agent,獲取對應ID,如果沒有則創建一個對應關係
private function getAgentId($agent)
{
//如果UA爲空
if (!$agent) {
return 0;
}
//靜態內存緩存
static $maps;
if (!$maps) {
$maps = $this->getAgentMap();
}
//縮減[FBAN/FBIOS;...]
$sub = mid($agent, '[', ']');
if ($sub) {
$agent = str_replace('[' . $sub . ']', '[...]', $agent);
}
$agent = str_replace(' (KHTML, like Gecko)', '', $agent);
//變種歸併
$agent = preg_replace($this->agentPatterns, $this->agentReplaces, $agent);
//Agent縮減到250個字符
$agent = left($agent, 191);
if (!isset($maps[$agent])) {
$id = table('agent')->insertIgnore(['agent' => $agent]);
$maps[$agent] = $id;
}
return $maps[$agent];
}
}
————————————————
版權聲明:本文爲CSDN博主「藍冰大俠」的原創文章,遵循CC 4.0 BY-SA版權協議,轉載請附上原文出處鏈接及本聲明。
原文鏈接:https://blog.csdn.net/bluehire/article/details/79985203