Amazon Simple Storage Service (Amazon S3)是面向 Internet 的存儲服務,具有高擴展性、可靠性、安全性和快速價廉的特點,提供 99.999999999% 的持久性,可存儲無限量的數據,每個對象最多包含 5 TB 的數據。S3支持版本控制、對象生命週期管理、加密、靜態網站託管、Select SQL查詢等。
S3 JAVA SDK
S3 架構設計與編程語言無關,提供 REST 和 SOAP 接口。HTTP 上的 SOAP 支持已棄用,但仍可在 HTTPS 上使用。SOAP 將不支持新 S3 功能,建議使用 REST API。
藉助 REST,可以使用標準的 HTTP 請求創建、提取和刪除存儲桶和對象。直接利用REST API進行代碼開發是複雜的,AWS SDK包裝了底層REST API,可以簡化編程任務。
配置AWS Credentials
爲使用AWS SDK,必須提供AWS憑證,在 ~/.aws/credentials (Windows 用戶爲 C:\Users\USER_NAME.aws\credentials) 中創建:
[default]
aws_access_key_id = your_access_key_id
aws_secret_access_key = your_secret_access_key
POM
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-bom</artifactId>
<version>1.11.433</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
如要使用全部的SDK,不需使用BOM,簡單聲明如下:
<dependencies>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.11.433</version>
</dependency>
</dependencies>
S3基本操作
演示了createBucket、listBuckets、putObject、getObject、listObjects、deleteObject、deleteBucket等S3基本操作。
package org.itrunner.aws.s3;
import com.amazonaws.HttpMethod;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.*;
import java.io.File;
import java.net.URL;
import java.util.Date;
import java.util.List;
public class S3Util {
private static AmazonS3 s3;
static {
s3 = AmazonS3ClientBuilder.standard().withRegion(Regions.CN_NORTH_1).build();
}
private S3Util() {
}
/*
* Create a new S3 bucket - Amazon S3 bucket names are globally unique
*/
public static Bucket createBucket(String bucketName) {
return s3.createBucket(bucketName);
}
/*
* List the buckets in your account
*/
public static List<Bucket> listBuckets() {
return s3.listBuckets();
}
/*
* List objects in your bucket
*/
public static ObjectListing listObjects(String bucketName) {
return s3.listObjects(bucketName);
}
/*
* List objects in your bucket by prefix
*/
public static ObjectListing listObjects(String bucketName, String prefix) {
return s3.listObjects(bucketName, prefix);
}
/*
* Upload an object to your bucket
*/
public static PutObjectResult putObject(String bucketName, String key, File file) {
return s3.putObject(bucketName, key, file);
}
/*
* Download an object - When you download an object, you get all of the object's metadata and a stream from which to read the contents.
* It's important to read the contents of the stream as quickly as possibly since the data is streamed directly from Amazon S3 and your
* network connection will remain open until you read all the data or close the input stream.
*/
public static S3Object get(String bucketName, String key) {
return s3.getObject(bucketName, key);
}
/*
* Delete an object - Unless versioning has been turned on for your bucket, there is no way to undelete an object, so use caution when deleting objects.
*/
public static void deleteObject(String bucketName, String key) {
s3.deleteObject(bucketName, key);
}
/*
* Delete a bucket - A bucket must be completely empty before it can be deleted, so remember to delete any objects from your buckets before
* you try to delete them.
*/
public static void deleteBucket(String bucketName) {
s3.deleteBucket(bucketName);
}
}
生成預簽名URL
默認,S3對象爲私有,只有所有者具有訪問權限。但是,對象所有者可以使用自己的安全憑證來創建預簽名的URL,授予有限時間內的對象下載許可,從而與其他用戶共享對象,收到預簽名URL的任何人都可以訪問對象。
當創建預簽名URL時,必須提供安全憑證、存儲桶名稱和對象鍵、HTTP 方法 (指定爲GET來下載對象) 和過期時間。
public String generatePresignedUrl(String bucketName, String key, int minutes) {
// Sets the expiration date
Date expiration = new Date();
long expTimeMillis = expiration.getTime();
expTimeMillis += 1000 * 60 * minutes;
expiration.setTime(expTimeMillis);
// Generate the presigned URL.
GeneratePresignedUrlRequest generatePresignedUrlRequest = new GeneratePresignedUrlRequest(bucketName, key).withMethod(HttpMethod.GET).withExpiration(expiration);
URL url = s3.generatePresignedUrl(generatePresignedUrlRequest);
return url.toString();
}
從對象中選擇內容
利用Amazon S3 Select,可以使用SQL語句篩選 S3 對象的內容,檢索所需的部分數據。Amazon S3 Select 適用於以CSV或JSON格式存儲的對象,這些對象可以通過GZIP或BZIP2壓縮和服務器端加密。
S3 Select的要求和限制
要求:
- 必須擁有所查詢的對象的 s3:GetObject 權限。
- 如果查詢的對象已進行加密,則必須使用 https,並必須在請求中提供加密密鑰。
限制:
- SQL 表達式的最大長度爲 256 KB。
- 結果中記錄的最大長度爲 1 MB。
SQL語法
Amazon S3 Select 支持部分SQL,語法如下:
SELECT column_name FROM table_name [WHERE condition] [LIMIT number]
其中table_name爲S3Object。
SELECT子句支持*。
文件格式爲CSV時,引用列可以使用列編號或列名,列編號從1開始:
select s._1 from S3Object s
Select s.name from S3Object s
使用列名時,程序中必須設置FileHeaderInfo爲Use。
可以使用雙引號指示列名區分大小寫:
SELECT s."name" from S3Object s
不使用雙引號列名不區分大小寫。
比如,CSV文件內容如下:
username,email
Jason,[email protected]
Coco,[email protected]
SQL語句可以爲:
select s.email from S3Object s where s.username='Jason'
更多SQL信息請查看Amazon S3 Select 和 Amazon Glacier Select 的 SQL 參考。
查詢CSV文件
以下示例將查詢結果保存在outputPath文件中:
public static void selectCsvObjectContent(String bucketName, String csvObjectKey, String sql, String outputPath) throws Exception {
SelectObjectContentRequest request = generateBaseCSVRequest(bucketName, csvObjectKey, sql);
final AtomicBoolean isResultComplete = new AtomicBoolean(false);
try (OutputStream fileOutputStream = new FileOutputStream(new File(outputPath));
SelectObjectContentResult result = s3.selectObjectContent(request)) {
InputStream resultInputStream = result.getPayload().getRecordsInputStream(
new SelectObjectContentEventVisitor() {
/*
* An End Event informs that the request has finished successfully.
*/
@Override
public void visit(SelectObjectContentEvent.EndEvent event) {
isResultComplete.set(true);
}
}
);
copy(resultInputStream, fileOutputStream);
}
/*
* The End Event indicates all matching records have been transmitted. If the End Event is not received, the results may be incomplete.
*/
if (!isResultComplete.get()) {
throw new Exception("S3 Select request was incomplete as End Event was not received.");
}
}
private static SelectObjectContentRequest generateBaseCSVRequest(String bucket, String key, String query) {
SelectObjectContentRequest request = new SelectObjectContentRequest();
request.setBucketName(bucket);
request.setKey(key);
request.setExpression(query);
request.setExpressionType(ExpressionType.SQL);
InputSerialization inputSerialization = new InputSerialization();
CSVInput csvInput = new CSVInput();
csvInput.setFileHeaderInfo(FileHeaderInfo.USE);
inputSerialization.setCsv(csvInput);
inputSerialization.setCompressionType(CompressionType.NONE);
request.setInputSerialization(inputSerialization);
OutputSerialization outputSerialization = new OutputSerialization();
outputSerialization.setCsv(new CSVOutput());
request.setOutputSerialization(outputSerialization);
return request;
}
參考文檔
Amazon Simple Storage Service Documentation
Working with Amazon S3 Objects
Using the SDK
Programming Examples
Generate a Pre-signed Object URL using AWS SDK for Java
AWS Java Sample