:py:mod:`fastchat.serve.cacheflow_worker`
=========================================

.. py:module:: fastchat.serve.cacheflow_worker

.. autoapi-nested-parse::

   A model worker executes the model based on Cacheflow.

   Install Cacheflow first. Then, assuming controller is live:
   1. ray start --head
   2. python3 -m fastchat.serve.cacheflow_worker --model-path path_to_vicuna

   launch Gradio:
   3. python3 -m fastchat.serve.gradio_web_server --concurrency-count 10000