Deploy InferenceService with ONNX model¶
Setup¶
- Your ~/.kube/config should point to a cluster with KServe installed.
- Your cluster's Istio Ingress gateway must be network accessible.
Create the InferenceService¶
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "style-sample"
spec:
predictor:
model:
protocolVersion: v2
modelFormat:
name: onnx
storageUri: "gs://kfserving-examples/models/onnx"
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "style-sample"
spec:
predictor:
onnx:
storageUri: "gs://kfserving-examples/models/onnx"
Note
For the default kserve installation, While using new schema, you must specify protocolVersion as v2 for onnx. Otherwise, you will get a no runtime found error.
Expected Output
$ inferenceservice.serving.kserve.io/style-sample configured
Run a sample inference¶
- Setup env vars
The first step is to determine the ingress IP and ports and set
INGRESS_HOST
andINGRESS_PORT
export ISVC_NAME=style-sample
export SERVICE_HOSTNAME=$(kubectl get inferenceservice ${ISVC_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host:${SERVICE_HOSTNAME}" http://localhost:8080//v2/health/ready
pip install -r requirements.txt
jupyter notebook
Uploading your own model¶
The sample model for the example in this readme is already uploaded and available for use. However if you would like to modify the example to use your own ONNX model, all you need to do is to upload your model as model.onnx
to S3, GCS or an Azure Blob.