install requirements change run.sh to pick the model of your choice, and then run it to start vllm server run client with python dialog.py --host localhost