Grabbing the model from Hugging Face
![[Pasted image 20250623151343.png]]

FP16 conversion
![[Pasted image 20250623152303.png]]

Quantising the model to Q4
![[Pasted image 20250623160031.png]]

Running the llama.cpp web server

https://github.com/ggml-org/llama.cpp/blob/master/tools/quantize/README.md
![[Pasted image 20250623162350.png]]