William McKnight on Columnar Databases

原創

omg2012

2020-06-23 02:34

http://www.infoq.com/news/2011/09/nosqlnow-columnar-databases

Columnar databases offer better data storage capabilities for certain business use cases compared to the traditional relational database management systems (RDBMS).

列式數據庫在一些業務場景比傳統關係型數據庫管理系統提供更好的數據存儲能力。

William McKnight spoke at the NoSQL Now 2011 Conference last week about the columnar databases and how they can be effective for certain data storage needs.

He said the data queries using RDBMS solutions (which are based on the row-wise design) send up a lot of data. Data Input/Output (I/O) has become the true bottleneck in the data processing needs today and when you do I/O, it’s better to get more data while you are there. The real way to avoid this problem is to only do the I/O that you really need. Columnar databases provide the ability to pick the columns needed instead of getting the whole row and not using the other columns (overhead) after the data retrieval. They offer a better solution in use cases where the work load needs a small percentage of the overall column bytes.

數據處理的瓶頸在數據IO，列式數據庫可以只檢索需要的列，減小了IO量，提高了效率。

In columnar databases, the data is stored in columns keeping all columns in the same order. William discussed the data page layout of relational database record and compared it with that of a column database table. There is some overhead involved in the row page design (in RDBMS databases) because the row scan or index scan is used for data queries and it can be an expensive option given all the data involved. He showed an example of a use case where the data query took 500,000 I/Os for a row-based database versus 235 I/Os for a Columnar database.

There are different columnar data storage options like Decomposed Storage Model, Positional Representation, Modified B-Tree/Row Length Encryption, and Bitmap. He also talked about materialization strategies which include Function of 'projection', Early and Late Materialization.

Some of the columnar database vendors are Vertica, ParAccel, Sybase IQ, InfoBright, Exasol, VectorWise and open source products like MonetDB and InfiniDB.

William said that the relational row based data warehouses and data marts will still be there. Beside the data warehouse and Hadoop, you will have column databases to process the data lot faster. He concluded the session by saying the database designers should start with good design principles and then decide if you want to put the data in row based or column based solution.

數據庫設計師首先要有一個好的設計原則，然後再決定是需要基於行還是基於列的數據存儲方案。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

William McKnight on Columnar Databases

10分鐘搞定Mysql主從部署配置

如何使用 JS 判斷用戶是否處於活躍狀態

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

Java ThreadPoolShutdown

“她”來了，陪伴賽道鉅變！爲GPT-4o加上你的一個數字分身

nodejs學習06——小案例

吵架的英語

William McKnight on Columnar Databases

The Problem with Cloud-Computing Standardization

Ruby on Rails 3.1 Released, Brings Assets Pipeline, Streaming, and Javascript Changes

PEOPLE IN AMERICA - Katharine Hepburn, 1907-2003: An Independent and Intelligent Actress

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結