Bigtable: A Distributed Storage System for Structured Data : part1 Abstract and Introduction

Abstract
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. 
Many projects at Google store data in Bigtable,including web indexing, Google Earth, and Google Finance. 
These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving).
Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. 
In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.


摘要
BigTable是一種用於管理結構化數據的分佈式存儲系統,旨在將數據擴展到數千個商品服務器上的龐大數據量。
Google的許多項目都會在Bigtable中存儲數據,包括網絡索引,Google地球和Google財經。
在數據大小(從URL到網頁到衛星圖像)和延遲要求(從後端批量處理到實時數據服務)方面,這些應用對Bigtable有非常不同的要求。
儘管有這些不同的需求,Bigtable已經爲所有這些Google產品成功提供了靈活,高性能的解決方案。
在本文中,我們描述了Bigtable提供的簡單數據模型,它爲客戶端動態控制數據佈局和格式,並描述了BigTable的設計和實現。


1 Introduction
Over the last two and a half years we have designed,implemented, and deployed a distributed storage system for managing structured data at Google called Bigtable.
Bigtable is designed to reliably scale to petabytes of data and thousands of machines. 
Bigtable has achieved several goals: wide applicability, scalability, high performance, and high availability. 
Bigtable is used by more than sixty Google products and projects, including Google Analytics, Google Finance, Orkut, Personalized Search, Writely, and Google Earth. 
These products use Bigtable for a variety of demanding workloads,which range from throughput-oriented batch-processing jobs to latency-sensitive serving of data to end users.
The Bigtable clusters used by these products span a wide range of configurations, from a handful to thousands of servers, and store up to several hundred terabytes of data.
In many ways, Bigtable resembles a database: it shares many implementation strategies with databases. 
Parallel databases and main-memory databases have achieved scalability and high performance, but Bigtable provides a different interface than such systems. 
Bigtable does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format, and allows clients to reason about the locality properties of the data represented in the underlying storage. 
Data is indexed using row and column names that can be arbitrary strings. 
Bigtable also treats data as uninterpreted strings, although clients often serialize various forms of structured and semi-structured data into these strings. 
Clients can control the locality of their data through careful choices in their schemas. Finally, Bigtable schema parameters let clients dynamically control whether to serve data out of memory or from disk.


1介紹
在過去的兩年半中,我們設計,實施和部署了一個分佈式存儲系統,用於管理Google的結構化數據,稱爲Bigtable。
Bigtable旨在可靠地擴展到數千億的數據和數千臺機器。
Bigtable已經實現了幾個目標:
(1)廣泛的適用性,
(2)可擴展性,
(3)高性能和高可用性。
Bigtable被Google Analytics(分析),Google財經,Orkut,個性化搜索,Writely和Google Earth等六十多個Google產品和項目所使用。
這些產品使用BigTable來應對各種苛刻的工作負載,這些工作負載範圍從面向吞吐量的批量處理作業到延遲敏感的數據服務。
這些產品使用的Bigtable集羣涵蓋範圍廣泛的配置,從少數到數千臺服務器,並存儲多達數百兆的數據。
在許多方面,Bigtable類似於數據庫:它與數據庫共享許多實現策略。
並行數據庫和主內存數據庫實現了可擴展性和高性能,但Bigtable提供了與這些系統不同的接口。
Bigtable不支持完整的關係數據模型;相反,它爲客戶端提供了支持動態控制數據佈局和格式的簡單數據模型,並允許客戶端對基礎存儲中表示的數據的位置屬性進行推理。
使用可以是任意字符串的行和列名稱對數據進行索引。
儘管客戶端經常將各種形式的結構化和半結構化數據序列化爲這些字符串,但Bigtable也將數據視爲無解碼字符串。
客戶可以通過對其模式的仔細選擇來控制其數據的位置。
最後,BigTable模式參數讓客戶機動態地控制是否從內存或磁盤中提供數據。


Section 2 describes the data model in more detail, and Section 3 provides an overview of the client API. 
Section 4 briefly describes the underlying Google infrastructure on which Bigtable depends. 
Section 5 describes the fundamentals of the Bigtable implementation, and Section 6 describes some of the refinements that we made to improve Bigtable’s performance. 
Section 7 provides measurements of Bigtable’s performance. 
We describe several examples of how Bigtable is used at Google in Section 8, and discuss some lessons we learned in designing and supporting Bigtable in Section 9.  
Finally, Section 10 describes related work, and Section 11 presents our conclusions.


第2節更詳細地描述了數據模型,
第3節提供了客戶端API的概述。
第4節簡要介紹了Bigtable所依賴的基礎Google基礎架構。
第5節描述了Bigtable實現的基本原理,第6節介紹了我們爲改進BigTable的性能而做的一些改進。
第7節提供了Bigtable的性能測量。
我們將在第8節中介紹Google在Bigtable中的幾個示例,並討論了我們在第9節中設計和支持Bigtable時學到的一些經驗教訓。
最後,第10節介紹相關工作,第11節介紹了我們的結論。


發佈了63 篇原創文章 · 獲贊 24 · 訪問量 6萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章