Назад к каталогу
llm-server-docs

llm-server-docs

Сообщество

от varunvasudeva1

0.0
0 отзывов

End-to-end documentation to set up your own local & fully private LLM server on Debian. Equipped with chat, web search, RAG, model management, MCP servers, image generation, and TTS.

Установка

sudo docker run hello-world

Описание

# Local LLaMA Server Setup Documentation _TL;DR_: End-to-end documentation to set up your own local & fully private LLM server on Debian. Equipped with chat, web search, RAG, model management, MCP servers, image generation, and TTS, along with steps for configuring SSH, firewall, and secure remote access via Tailscale. Software Stack: - Inference Engine ([Ollama](https://github.com/ollama/ollama), [llama.cpp](https://github.com/ggml-org/llama.cpp), [vLLM](https://github.com/vllm-project/vllm)) - Search Engine ([SearXNG](https://github.com/searxng/searxng)) - Model Server ([llama-swap](https://github.com/mostlygeek/llama-swap), `systemd` service) - Chat Platform ([Open WebUI](https://github.com/open-webui/open-webui)) - MCP Proxy Server ([mcp-proxy](https://github.com/sparfenyuk/mcp-proxy), [MCPJungle](https://github.com/mcpjungle/MCPJungle)) - Text-to-Speech Server ([Kokoro FastAPI](https://github.com/remsky/Kokoro-FastAPI)) - Image Generation Server ([ComfyUI](https://github.com/comfyanonymous/ComfyUI)) ![Software Stack Architectural Diagram](llm-server-architecture.png) ## Table of Contents - [Local LLaMA Server Setup Documentation](#local-llama-server-setup-documentation) - [Table of Contents](#table-of-contents) - [About](#about) - [Priorities](#priorities) - [Prerequisites](#prerequisites) - [General](#general) - [Schedule Startup Script](#schedule-startup-script) - [Configure Script Permissions](#configure-script-permissions) - [Configure Auto-Login (optional)](#configure-auto-login-optional) - [Docker](#docker) - [Nvidia Container Toolkit](#nvidia-container-toolkit) - [Create a Network](#create-a-network) - [Helpful Commands](#helpful-commands) - [HuggingFace CLI](#huggingface-cli) - [Manage Models](#manage-models) - [Download Models](#download-models) - [Delete Models](#delete-models) - [Search Engine](#search-engine) - [SearXNG](#searxng) - [Open WebUI Integration](#open-webui-integration) - [Inference Engine](#inference-engine) - [Ollama](#ollama) - [llama.cpp](#llamacpp) - [vLLM](#vllm) - [Open WebUI Integration](#open-webui-integration-1) - [Ollama vs. llama.cpp](#ollama-vs-llamacpp) - [vLLM vs. Ollama/llama.cpp](#vllm-vs-ollamallamacpp) - [Model Server](#model-server) - [llama-swap](#llama-swap) - [`systemd` Service](#systemd-service) - [Open WebUI Integration](#open-webui-integration-2) - [llama-swap](#llama-swap-1) - [`systemd` Service](#systemd-service-1) - [Chat Platform](#chat-platform) - [Open WebUI](#open-webui) - [MCP Proxy Server](#mcp-proxy-server) - [mcp-proxy](#mcp-proxy) - [MCPJungle](#mcpjungle) - [Comparison](#comparison) - [Open WebUI Integration](#open-webui-integration-3) - [mcp-proxy](#mcp-proxy-1) - [MCPJungle](#mcpjungle-1) - [VS Code/Claude Desktop Integration](#vs-codeclaude-desktop-integration) - [Text-to-Speech Server](#text-to-speech-server) - [Kokoro FastAPI](#kokoro-fastapi) - [Open WebUI Integration](#open-webui-integration-4) - [Image Generation Server](#image-generation-server) - [ComfyUI](#comfyui) - [Open WebUI Integration](#open-webui-integration-5) - [SSH](#ssh) - [Firewall](#firewall) - [Remote Access](#remote-access) - [Tailscale](#tailscale) - [Installation](#installation) - [Exit Nodes](#exit-nodes) - [Local DNS](#local-dns) - [Third-Party VPN Integration](#third-party-vpn-integration) - [Updating](#updating) - [General](#general-1) - [Nvidia Drivers \& CUDA](#nvidia-drivers--cuda) - [Ollama](#ollama-1) - [llama.cpp](#llamacpp-1) - [vLLM](#vllm-1) - [llama-swap](#llama-swap-2) - [Open WebUI](#open-webui-1) - [mcp-proxy/MCPJungle](#mcp-proxymcpjungle) - [Kokoro FastAPI](#kokoro-fastapi-1) - [ComfyUI](#comfyui-1) - [Troubleshooting](#troubleshooting) - [`ssh`](#ssh-1) - [Nvidia Drivers](#nvidia-drivers) - [Ollama](#ollama-2) - [vLLM](#vllm-2) - [Open WebUI](#open-webui-2) - [Monitoring](#monitoring) - [Notes](#notes) - [Software](#software) - [Hardware](#hardware) - [References](#references) - [Acknowledgements](#acknowledgements) ## About This repository outlines the steps to run a server for running local language models. It uses Debian specifically, but most Linux distros should follow a very similar process. It aims to be a guide for Linux beginners like me who are setting up a server for the first time. The process involves installing the requisite drivers, setting the GPU power limit, setting up auto-login, and scheduling the `init.bash` script to run at boot. All these settings are based on my ideal setup for a language model server that runs most of the day but a lot can be customized to suit your needs. > [!IMPORTANT] > No part of this guide was written using AI - any hallucinations are the good old human kind. While I've done my absolute best to ensure corre

Отзывы (0)

Пока нет отзывов. Будьте первым!