fastchat.serve.cacheflow_worker
A model worker executes the model based on Cacheflow.
Install Cacheflow first. Then, assuming controller is live: 1. ray start –head 2. python3 -m fastchat.serve.cacheflow_worker –model-path path_to_vicuna
launch Gradio: 3. python3 -m fastchat.serve.gradio_web_server –concurrency-count 10000