Use LLM through llama.cpp

100% local on-premise using, fully private LLMs with llama.cpp

2 lines of code, OpenAI compatible!

Step 1: brew install llama.cpp

Step 2: llama-server –hf-repo microsoft/Phi-3-mini-4k-instruct-gguf –hf-file Phi-3-mini-4k-instruct-q4.gguf

Step 3: curl hostname:8080/v1/chat/completions

You can point to any GGUF on the HF hub.

That’s it.