Deploy OPEA ChatQnA workload on OCP

Overview

The workload is based on the OPEA ChatQnA Application running on Intel® Gaudi Accelerator with OpenShift and OpenShift AI. Refer to the OPEA Generative AI Examples for more details about the OPEA workloads.

Note: It is still under heavy development, and the updates are expected.

Prerequisites

  • Provisioned RHOCP cluster. Follow steps here

  • The Persistent storage using NFS is ready. Refer to documentation for the details to set it up.

    Note: Refer to documentation for setting up other types of persistent storages.

  • Provisioned Intel Gaudi accelerator on RHOCP cluster. Follow steps here

  • RHOAI is installed. Follow steps here

  • The Intel Gaudi AI accelerator is enabled with RHOAI. Follow steps here

  • Minio based S3 service ready for RHOAI. Follow steps here

Deploy Model Serving for OPEA ChatQnA Microservices with RHOAI

Create OpenShift AI Data Science Project

  • Click Search -> Routes -> rhods-dashboard from the OCP web console and launch the RHOAI dashboard.

  • Follow the dashboard and click Data Science Projects to create a project. For example, OPEA-chatqna-modserving.

Preload the models

  • Refer to link and download the model Llama2-70b-chat-hf.

  • Refer to link and upload the model to minio/s3 storage.

  • Click OPEA-chatqna-modserving, and choose Data Connection section. In the fields, add your access and secret keys from minio. Follow link.

Launch the Model Serving with Intel Gaudi AI Accelerator

Alt text

  • In the project OPEA-chatqna-modserving –> Models section and follow the image below.

Alt text

  • The model server is now in the creation state. Once ready, the status will be updated to green and the inference endpoint can be seen. Refer to the image below.

Alt text

Deploy ChatQnA Megaservice and Database

Create namespace

  oc create namespace opea-chatqna

Create persistent volumes

The NFS is used to create the Persistent Volumes for ChatQnA MegaService to claim and use.

Make sure to update NFS server IP and path in persistent_volumes.yaml before applying command below. For example:

  nfs:
    server: 10.20.1.2 # nfs server
    path: /my_nfs # nfs path
$ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/workloads/opea/chatqna/persistent_volumes.yaml

  • Check that the persistent volumes are created:

$ oc get pv
NAME                           CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      
chatqna-megaservice-pv-0        100Mi      RWO            Retain           Available
chatqna-megaservice-pv-1        100Mi      RWO            Retain           Available
chatqna-megaservice-pv-2        100Mi      RWO            Retain           Available

Building OPEA ChatQnA MegaService Container Image

create_megaservice_container.sh

Deploy Redis Vector Database Service

$ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/workloads/opea/chatqna/redis_deployment_service.yaml

Check that the pod and service are running:

$ oc get pods
NAME                                   READY   STATUS      RESTARTS   AGE
redis-vector-db-6b5747bf7-sl8fr        1/1     Running     0          21s
$ oc get svc
NAME                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
redis-vector-db       ClusterIP   1.2.3.4          <none>        6379/TCP,8001/TCP   43s

Deploy ChatQnA MegaService

Update the inference endpoint from the in the chatqna_megaservice_deployment.

$ oc apply -f https://raw.githubusercontent.com/intel/intel-technology-enabling-for-openshift/main/workloads/opea/chatqna/chatqna_megaservice_deployment.yaml

Check that the pod and service are running:

$ oc get pods
NAME                                   READY   STATUS      RESTARTS   AGE
chatqna-megaservice-54487649b5-sgsh2   1/1     Running     0          95s         
$ oc get svc
NAME                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
chatqna-megaservice   ClusterIP   1.2.3.4          <none>        8000/TCP            99s

Verify the Megaservice

Use the command below:

  curl <megaservice_pod_ip>/v1/rag/chat_stream \
  -X POST \
  -d '{"query":"What is a constellation?"}' \
  -H 'Content-Type: application/json'