Quickstart
Triton Inference Server兩種獲取途徑:
- NVIDIA GPU Cloud (NGC),預編譯好的container;
- GitHub上源碼,可用cmake自行編譯container;
Run Triton Inference Server
運行server
$ nvidia-docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/full/path/to/example/model/repository:/models <docker image> tritonserver --model-repository=/models
Note: 模型所在文件夾/full/path/to/example/model/repository,
server成功開啓會出現打印輸出一下內容:參見
I0828 23:42:45.635957 1 main.cc:417] Starting endpoints, 'inference:0' listening on
I0828 23:42:45.649580 1 grpc_server.cc:1730] Started GRPCService at 0.0.0.0:8001
I0828 23:42:45.649647 1 http_server.cc:1125] Starting HTTPService at 0.0.0.0:8000
I0828 23:42:45.693758 1 http_server.cc:1139] Starting Metrics Service at 0.0.0.0:8002
Verify Inference Server Is Running Correctly
使用derver的狀態節點驗證server的各種狀態,在host使用curl命令發送獲取HTTP的服務狀態查詢的請求
$ curl localhost:8000/api/status
id: "inference:0"
version: "0.6.0"
uptime_ns: 23322988571
model_status {
key: "resnet50_netdef"
value {
config {
name: "resnet50_netdef"
platform: "caffe2_netdef"
}
...
version_status {
key: 1
value {
ready_state: MODEL_READY
}
}
}
}
ready_state: SERVER_READY
最後的ready_state
返回SERVER_READY
表示inference服務已經成功上線,可正常處理請求。參見
Getting The Client Examples
獲取並運行client端docker,xx.yy是版本號:
$ docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3-clientsdk
$ docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:<xx.yy>-py3-clientsdk
client也可自己編譯,參見
示例,Image Classification Example
在tritonserver_client
中,運行image-client
應用,採用的是樣例模型庫中的resnet50_netdef
模型,參見
c++發送請求
$ /workspace/install/bin/image_client -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg
Request 0, batch size 1
Image '../images/mug.jpg':
504 (COFFEE MUG) = 0.723991
python端發送請求:
$ python /workspace/install/python/image_client.py -m resnet50_netdef -s INCEPTION /workspace/images/mug.jpg
Request 0, batch size 1
Image '../images/mug.jpg':
504 (COFFEE MUG) = 0.778078556061