William McKnight on Columnar Databases

http://www.infoq.com/news/2011/09/nosqlnow-columnar-databases

Columnar databases offer better data storage capabilities for certain business use cases compared to the traditional relational database management systems (RDBMS). 

列式數據庫在一些業務場景比傳統關係型數據庫管理系統提供更好的數據存儲能力。

William McKnight spoke at the NoSQL Now 2011 Conference last week about the columnar databases and how they can be effective for certain data storage needs.

He said the data queries using RDBMS solutions (which are based on the row-wise design) send up a lot of data. Data Input/Output (I/O) has become the true bottleneck in the data processing needs today and when you do I/O, it’s better to get more data while you are there. The real way to avoid this problem is to only do the I/O that you really need. Columnar databases provide the ability to pick the columns needed instead of getting the whole row and not using the other columns (overhead) after the data retrieval. They offer a better solution in use cases where the work load needs a small percentage of the overall column bytes.

數據處理的瓶頸在數據IO,列式數據庫可以只檢索需要的列,減小了IO量,提高了效率。

 
In columnar databases, the data is stored in columns keeping all columns in the same order. William discussed the data page layout of relational database record and compared it with that of a column database table. There is some overhead involved in the row page design (in RDBMS databases) because the row scan or index scan is used for data queries and it can be an expensive option given all the data involved. He showed an example of a use case where the data query took 500,000 I/Os for a row-based database versus 235 I/Os for a Columnar database.

 
 
There are different columnar data storage options like Decomposed Storage Model, Positional Representation, Modified B-Tree/Row Length Encryption, and Bitmap. He also talked about materialization strategies which include Function of 'projection', Early and Late Materialization.

Some of the columnar database vendors are VerticaParAccelSybase IQInfoBright, Exasol, VectorWise and open source products like MonetDB and InfiniDB.

William said that the relational row based data warehouses and data marts will still be there. Beside the data warehouse and Hadoop, you will have column databases to process the data lot faster. He concluded the session by saying the database designers should start with good design principles and then decide if you want to put the data in row based or column based solution.

數據庫設計師首先要有一個好的設計原則,然後再決定是需要基於行還是基於列的數據存儲方案。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章