Title: Run Large LLMs Locally on NVIDIA Spark (No Cloud, 100% Private)
Description: Want to run large language models locally without sending your data to the cloud? In this video, I show you exactly how to run powerful LLMs on NVIDIA Spark using Open WebUI and Ollama — completely private and hosted on your own hardware.
We walk step-by-step through setting up Open WebUI with Ollama using NVIDIA Sync and Docker, pulling models like LLaMA 70B, DeepSeek, GPT-OSS 20B, and Qwen, and configuring everything so it runs smoothly on an enterprise-class GPU. You’ll see how to install the container, configure ports, create an admin account, download models, and switch between them — all without relying on ChatGPT or any external cloud service.
I also cover real-world expectations like first-run delays, GPU memory usage, slower inference trade-offs, and how to update or stop containers to reclaim resources. By the end, you’ll have a fully private, local AI assistant capable of running surprisingly large models right at home.
If you care about data privacy, self-hosted AI, or running LLMs locally, this setup is one of the easiest and most powerful ways to get started.
👍 If this helped, like the video, subscribe, and drop a comment with models or features you’d like me to test next.
NAME="open-webui"
IMAGE="ghcr.io/open-webui/open-webui:ollama"
cleanup() {
echo "Signal received; stopping ${NAME}…"
docker stop "${NAME}" >/dev/null 2>&1 || true
exit 0
}
trap cleanup INT TERM HUP QUIT EXIT
Ensure Docker CLI and daemon are available
if ! docker info >/dev/null 2>&1; then
echo "Error: Docker daemon not reachable." >&2
exit 1
fi
if [ -n "$(docker ps -q --filter "name=^${NAME}$" --filter "status=running")" ]; then
echo "Container ${NAME} is already running."
else
# Exists but stopped? Start it.
if [ -n "$(docker ps -aq --filter "name=^${NAME}$")" ]; then
echo "Starting existing container ${NAME}…"
docker start "${NAME}" >/dev/null
else
# Not present: create and start it.
echo "Creating and starting ${NAME}…"
docker run -d -p 12000:8080 --gpus=all \
#i added this for spped testing
-e PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
-e CUDA_VISIBLE_DEVICES=0 \
-e NVIDIA_TF32_OVERRIDE=0 \
-e COMMANDLINE_ARGS="--precision fp16" \ #halves VRAM usage, much faster than FP32
#end of test
-v open-webui:/app/backend/data \
-v open-webui-ollama:/root/.ollama \
--name "${NAME}" "${IMAGE}" >/dev/null
fi
fi
echo "Running. Press Ctrl+C to stop ${NAME}."
Keep the script alive until a signal arrives
while :; do sleep 86400; done
If you enjoy the video, please consider liking, subscribing, and sharing!
Facebook: https://www.facebook.com/madhouse74
X (Twitter): https://x.com/MadTc74
LinkedIn: https://www.linkedin.com/in/mad-tc-086046285/
GitLab: https://gitlab.com/MadTcTutorials
💡 Disclaimer: Some of these links are affiliate links. They cost you nothing but help support the channel. Thank you!
Amazon: https://amzn.to/3TEscdX
Unraid (Referral Discount): https://unraid.net/pricing?via=4e1dee
Logitech Brio PRO X 4K Webcam: https://amzn.to/410ouxO
Shure SM4 Studio Recording Microphone: Shure SM4 Studio Recording Microphone
M-Audio M-Track Duo – USB Audio: https://amzn.to/49T5Cot
HyperX Headset: https://amzn.to/4gWESpp
U7-Pro AP WiFi7 PoE+: https://amzn.to/4qVGBRg
Ubiquiti Switch Enterprise 24 PoE: https://amzn.to/4qPjyHL
Ubiquiti Enterprise Security Gateway and Network Appliance with 10G SFP+: https://amzn.to/4anvURm
NVIDIA DGX Spark: https://amzn.to/46mImyl
💸 Free Stock Through Robinhood
Robinhood: Join Robinhood with my link and we'll both pick our own free stock 🤝 https://join.robinhood.com/travisb-707fa1
00:00 Intro
00:10 Quick Demo
02:07 Getting Started
02:33 Instructions
04:04 Step One: Configure Docker Permissions
05:16 Step Two: Verify Docker Setup and Pull Container
05:39 Step Three: Open Nvidia Sync
05:56 Step Four: Add Open WebUI Custom Port Configuration
07:25 Step Five: Launch Open WebUI
10:01 Step Six: Create Administrator Account
10:28 What is New
10:48 Step Seven: Download and Configure a Model
11:47 Step Eight: Test the Model
13:18 Change Profile Photo
13:58 Step Nine: Stop The Open WebUi
14:37 Step Ten: Next Steps - Download Other Models
17:12 Testing New Model
18:24 What's Next
19:15 Like and Subscribe Bro
run LLMs locally, NVIDIA Spark LLM, Open WebUI Ollama setup, local AI server, self hosted LLM, private AI assistant, Ollama Open WebUI, run LLaMA locally, local large language model, NVIDIA Spark AI, Docker LLM setup, enterprise GPU AI, offline AI models, private ChatGPT alternative, local AI workflow, run AI without cloud, GPU LLM inference, NVIDIA Sync Open WebUI, Ollama models local, Qwen LLM local, DeepSeek local AI, GPT OSS local
#LocalLLM, #NVIDIASpark, #OpenWebUI, #Ollama, #SelfHostedAI, #PrivateAI, #RunLLMLocally, #LocalAI, #AIPrivacy, #DockerAI, #LLaMA, #HomeLabAI