100% local on-premise using, fully private LLMs with llama.cpp
2 lines of code, OpenAI compatible!
Step 1: brew install llama.cpp
Step 2: llama-server –hf-repo microsoft/Phi-3-mini-4k-instruct-gguf –hf-file Phi-3-mini-4k-instruct-q4.gguf
Step 3: curl hostname:8080/v1/chat/completions
You can point to any GGUF on the HF hub.
That’s it.