vesoft 於新林:開源,讓 Nebula Graph 圖數據庫更好

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"萬物互聯讓數據呈指數級增長,在面對一些海量高併發的場景時,傳統的關係型數據庫已不適用。而隨着人們對數據處理需求的快速增長,加之圖數據庫的技術也逐漸成熟,人們發現用樹和拓撲圖更容易表達這充滿關係的現實世界。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Nebula Graph 作爲一款開源的分佈式圖數據庫,定位爲大規模的屬性圖。以處理大規模海量數據爲設計目標,是世界上唯一能夠容納千億個頂點和萬億條邊,並提供毫秒級查詢延時的圖數據庫解決方案。接下來,就讓我們跟 vesoft 的 CTO 於新林一起來聊一聊 Nebula Graph。視頻下方也有文字版。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"embedcomp","attrs":{"type":"video","data":{"id":"450455","name":"對話vesoft於新林","poster":"","url":"https:\/\/media001.geekbang.org\/8684473dfdf943848f555a9fae60e4be\/d388e25ff6f745e29817cd6abd6eb73b-86a3651cc50e0adc54f801c2de18f55f-sd.m3u8"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":12}}],"text":"vesoft CTO 於新林"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"InfoQ:請於老師從自己的工作經歷方面做一下個人介紹。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"於新林:"},{"type":"text","text":"我上個工作是在阿里,大約待了十四年,前十年在支付寶做支付寶的首席架構師,後四年在阿里雲的IoT 負責 IoT 的基礎 PaaS。2021年5、6月份加入歐若數網,現在在做圖數據領域相關的工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"InfoQ:剛開始,公司爲什麼想做Nebula Graph 這款產品呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"於新林"},{"type":"text","text":":公司CEO在圖數據庫這一領域的沉澱和積累較深。他之前在Facebook,再到後來的螞蟻一直是做圖數據庫的。因爲CEO有比較深的圖數據庫技術背景,剛好也在這一領域涉獵多年,所以就出來做圖數據庫相關的技術,然後就做了Nebula Graph這款產品。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"InfoQ:據瞭解,目前Nebula Graph是一款開源的分佈式圖數據庫,爲什麼剛開始就選擇了開源?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"於新林"},{"type":"text","text":":第一,我們公司成立在2018年10月份左右,那個時候感覺業界整個開源的勢頭越來越強,這是一個大的趨勢;第二,因爲我們公司是做圖數據庫的,圖數據庫是底層的infrastructure 的產品,這種產品只要是商業模式找到了,其實開源是一個比較好的手段,它可以快速地讓你接觸到你的客戶、接觸到開發者,可以讓客戶和開發者給你提需求、提Bug,甚至貢獻代碼,這樣更適合產品的發展,進行快速迭代,所以當時我們選擇了開源。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"InfoQ:作爲公司的CTO,您當初在Nebula Graph的架構設計上有哪些考慮?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"於新林"},{"type":"text","text":":對於架構設計,首先對於一個產品我們有一些大的原則,這些原則包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一,我們選擇開源;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二,因爲數據量越來越大,我們要支持海量、大併發的業務,它肯定是一個分佈式的(架構);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三,我們希望有一個相對比較靈活的架構,所以採用了計算層和存儲層分離的架構;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第四,現在基於K8s的雲原生越來越火,並且它也是一個必然的趨勢,我們現在支持雲原生,全面擁抱雲原生;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第五,我們之前主要的精力是放在OLTP上,但是圖的計算和分析,一定也是未來的趨勢,所以說我們也會做HTAP的融合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上是我們基於這些大的設計原則來做的Nebula Graph這款產品。另外圖數據庫本身的發展,要依賴於整個的生態,還包括周邊的一系列的配套的工具、產品,我們也希望和社區一起基於Nebula Graph打造一個圖數據庫相關領域的生態,類似於周邊的一些SDK、工具、平臺,讓這個生態能夠更加繁榮。這是一些基本的思考。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"InfoQ:Nebula Graph 研發過程中的核心技術有哪些?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"於新林:"},{"type":"text","text":"因爲我們是一個分佈式的、存儲和計算分離的框架。第一,圖數據庫領域本身就有一定難度。一般像底層基礎設施難的是操作系統,再上面是數據庫,所以核心技術就包括怎麼解決分佈式的問題,一旦數據存在不同的物理機上,一定涉及到分佈式事務的一致性問題;第二,怎麼去處理這種高併發、實時海量的數據的問題。第三,因爲數據庫本身要穩定,又穩又快,而且要安全,所以怎麼樣去通過各種手段來保證數據庫的穩定,給客戶提供持續可靠的服務。剛說到數據庫本身要速度快,所以要怎麼優化?因此在查詢層類似於優化器、調度器、解析器,都要去做比較深入細緻的工作,才能真的讓性能提升上來。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"很多客戶對安全性要求也高,包括用戶的權限、網絡通信、甚至底層數據加密的存儲等一系列的問題。光從分佈式的計算存儲分離的數據庫來看,技術的挑戰點就很大、很多,以上是我認爲一些比較難的點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"InfoQ:對比市面上其他的圖數據庫,Nebula Graph 有哪些優勢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"於新林:"},{"type":"text","text":"第一,因爲Nebula Graph 是一個開源的、完全自研的圖數據庫,所以我們的可控性就高了很多,當然自研包括我們跟社區共建產品;第二,因爲可控性很高(自研),我們可以做很多極致的優化。我們最大的場景可以到千億點和萬億邊,現在有的客戶已經達到這種規模,在這種海量高併發場景下采用(Nebula Graph)很適合。另外在這種高併發下,客戶對於響應時間的要求比較嚴格。因爲Nebula Graph是一個OLTP(聯機事務數據庫),在性能上,在海量高併發下可以實現毫秒級返回,這些都是我們目前感覺做的相對比較好的點,當然也有一些不足我們要持續的優化和完善。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"InfoQ:在Nebula Graph 研發過程中,你們面臨的比較大的挑戰有哪些?針對這些挑戰,你們又是如何解決的?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"於新林"},{"type":"text","text":":研發當中肯定會面臨很多的挑戰。拿穩定性來講,客戶一般對於數據庫產品的要求比較高。要保證數據庫的穩定其實有很多方面,包括兼容怎麼做、內存怎麼處理,在高併發下服務會不會Crash 掉。對於一些網絡和故障,分佈式處理怎麼樣?爲了應對穩定性這一問題,我們做了很多的工作,包括混沌測試、壓測、收集客戶的場景的特製案例等,在這些方面都做了很多的優化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"還有一些自身的挑戰,比如因爲(Nebula Graph)本身是一個分佈式的數據庫,分佈式數據庫一定會涉及到最終的ACID,這是一個分佈式事務的問題,怎麼樣保證事務的最終一致性,其實挑戰是比較大的。再一個就是,在這種集羣下,它怎麼樣保證集羣之間數據的同步、集羣出故障如何快速地處理故障切換服務等等一系列的挑戰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其實再細化一些,例如在查詢層優化器做的好不好,其實也是一個很大的挑戰。一條語句,經過語法、語義解析之後產生的這種執行計劃,還涉及到索引選擇一系列的問題,優化的策略也包括基於規則的優化、基於成本的優化,甚至後續出現類似於智能的優化。另外調度器的挑戰也很大。例如,執行一條語句,其實可以考慮在一臺機器上執行,但會有一個問題,這臺機器的計算資源是有限的,當其他機器空閒的時候,很難充分利用集羣的優勢,這時候可以考慮通過把這臺機器的任務調度到其他機器上,能儘快的併發執行,這是關於調度器和執行器的優化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於存儲層,一個是剛剛說過的分佈式事物,另外一個是可以考慮大的內存,這樣可以把整個圖數據庫的拓撲圖搬到內存裏去,一旦搬到內存裏去,就涉及到內存和磁盤的一致性的問題,當內存空間不夠的時候,又怎麼和磁盤進行交換?另外,圖一定會涉及到多跳查詢,每次出現多跳查詢就會涉及到與集羣中其他進程的交互,怎麼通過一些內存和存儲的優化去降低這種頻繁的多跳交互帶來的網絡開銷或者性能降低?很多挑戰是需要我們去解決的。也希望社區的小夥伴有時間多給我們貢獻,或者加入我們一起解決問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"InfoQ:目前,Nebula Graph 適用的業務場景有哪些?於老師可以舉一些比較典型的例子。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"於新林"},{"type":"text","text":":典型的應用場景像金融欺詐的檢測,類似於反洗錢;公安的一些伴隨、跟隨,還有犯罪團伙的發現;國安和公安這類業務場景,通過一些關係找到一些犯罪團伙;像知識圖譜;像平時我們的開發同學遇到的數據血緣關係、系統的鏈路調度關係;還有比較陌生的一些,像芯片設計的EDA軟件等都會用到圖數據庫。我覺得圖領域在將來的場景會越來越豐富、越來越滲透到整個社會發展的各行各業裏面去,其實它的空間還是蠻大的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"InfoQ:Nebula Graph 的下一步打算是什麼?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"於新林"},{"type":"text","text":":第一步,先把Nebula Graph上雲,提供DBaaS服務,類似於PaaS服務,客戶不用去考慮怎麼樣去搭建、部署、運維一個數據庫,只要使用數據庫的服務就好了。我們現在已經在和一些雲廠商合作,把我們圖數據庫搬到雲上來。第二步,我們其實後續也在加大AP這個領域的投入,現在已經有AP相關的服務,但是跟我們理想中的還有差距,之後通過加大對AP領域的投入,真正把TP和AP融合起來,做到HTAP。第三步,我們後續也是希望能把圖的計算和圖的分析以SaaS化的服務形式搬上雲,這樣對一些客戶來講,就不需要去搭建一套圖計算和圖分析的平臺,只需要用SaaS服務把數據上傳,進行圖計算和圖分析便可得到結果,這樣可實現按需付費,按照計算單元和時間付費,這是我們接下來的一些大的思考和規劃。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"InfoQ:研發過程中,有沒有令您印象深刻的事?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"於新林"},{"type":"text","text":":令我印象最深的一件事是我們有一個版本發佈以後,客戶說他們性能急劇下降,但是我們自己都測過,根本沒出現過這個問題。我們也很詫異爲什麼會出現這種情況,於是立刻聯繫客戶,希望在第一時間去解決客戶的問題,並與客戶迅速建立電話會議。經過排查後發現,客戶這種場景是我們以前案例裏沒有的。其實這個也是客戶和我們一起共建社區的一個過程,通過這種案例,也可以提高我們產品的健壯性和穩定性。在這樣緊急的情況下,經過大家一起共建,很快就發現了問題,並解決了問題。當天就打了一個Patch 包解決了客戶的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"諸如此類的事情還有很多,但那次我印象特別深,因爲很快地響應客戶並快速地解決了客戶的問題,同時感謝客戶給我們貢獻我們以前沒有想到或者遇到的場景。這是令我印象比較深的一件事情。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" "}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章