local AI on NVIDIA · GPU-pipelined · zero cloud

Speak. Melt. Polished text.

The voice framework that runs hot. Whisper large-v3 + any Ollama LLM, fused into one CUDA-cooked pipeline on the GPU you already paid for. NVIDIA and Microsoft just called local, private AI on NVIDIA the future of the PC - Voxmelt did it for your voice first. Tap the always-on-top widget over any app, speak, and a local model rewrites it in the tone you pick - mirrored straight into the full workbench. Zero cloud, no per-minute meter, zero data harvesting. Your voice melts the silicon, polished text is what drips out.

VX·COREGPU RUNTIME
voice in
AI output · streaming
cleanedpunctuated0 cloud
GPU LOAD96%
VRAM6.2 / 8.0 GB
WhisperSTT
gemma3:4bcleanup
TEMP64°C
12 ms/ token
NVIDIA CUDA · Ampere → Blackwell · tensor-toastedHIPAA / GDPR-safe4x realtime · overclockedbuilt for the personal AI computer · RTX Spark-ready
new · May 31, 2026NVIDIA + Microsoft · RTX Spark keynote

The PC just got reinvented around local, private AI on NVIDIA. NVIDIA and Microsoft unveiled RTX Spark - a new class of Windows PC built to run AI on your device, not the cloud. Voxmelt has done exactly that for your voice since day one. You don't need to wait for fall hardware to live in that future - just the NVIDIA GPU already in your rig.

Voxmelt is an independent product, not affiliated with or endorsed by NVIDIA or Microsoft.

why people melt local

100% local processing - zero cloud, zero cap.

Voxmelt
Studio · 9 days left Sign in
COLOR THEME
Nightwing
Matrix
Cyberpunk
RecordWhisper smallClean up · Default
AI TONE / PRESET
Clean up · Default
Coding prompt · Feature
Chat Responder · Corporate Polish
Summarize · Bullets
Custom prompt · Editable
gemma2:27b
AI MODEL · OLLAMA
gemma2:27b · 14.56 GB
gemma3:4b · 3.11 GB
gemma4:e4b · 8.95 GB
qwen3:8b · 5.20 GB
qwen3:14b · 9.20 GB
Free GPU Voice Off AI On
RECORD
COMPOSE
VOICEOVER
TEMPLATES
HISTORY
MINI
RTX 3080 Ti
GPU
20%
VRAM
4.5/12G
TEMP
42°C
PWR
138W
Click to record
Ctrl+Shift+R
Microphone access
Whisper small loaded
GPU available
TRANSCRIPT0 words
ProcessAC
Paste or type text here to process with AI, or use the mic to record…
AI OUTPUT
AC
AI output will appear here
Record or paste text, then process
MODELS
Whisper small loadedgemma2:27b off
WHISPERsmall
Tiny↓75 MB ~1.2 GB
Base↓145 MB ~1.3 GB
Small↓480 MB ~1.6 GB
Medium↓1.5 GB ~2.6 GB
Large v3↓3.0 GB ~4.2 GB

Small is the active model. great accuracy · uses ~1.6 GB VRAM.

Load on app launch
Pre-loads the model so the first recording is instant.
Advanced
AI PROCESSINGgemma2:27b
Keep both loaded
Both share VRAM · needs 24+ GB

Performance may be unstable. Both models compete for VRAM.

Connected · 5 models installedRefresh
Ollama v0.24.0INSTALLED MODELS
gemma2:27b
14.56 GB on disk
Remove
gemma3:4b
3.11 GB on disk
Remove
gemma4:e4b
8.95 GB on disk
Remove
qwen3:8b
5.20 GB on disk
Remove
qwen3:14b
9.20 GB on disk
Remove
Readysmall / float16RTX 3080 Ti
Ctrl+Shift+R to toggle
DISPLAY 1 · 27″
2560 × 1440 · the workbench
prompt.txt
FileEditView
Paste your cleaned coder prompt here…
Ln 1, Col 1UTF-8100%
READY - TAP TO DICTATE
Tap mic to dictate
TRANSCRIPT AI
Your transcript will appear here as you speak…
DISPLAY 2 · 24″ portrait
1080 × 1920 · Notepad
100% local Zero cloud No API keys Always-on-top Real-time sync Voice or paste AI tone presets Custom presets 1-tap copy Per-app paste GPU-accelerated Whisper large-v3 Ollama-powered Works air-gapped 4x realtime Auto VRAM swap Global hotkey 3 themes
the thermal pipeline

Two heavy AI models. One GPU. Fully orchestrated.

Voxmelt runs a speech-to-text engine and an LLM cleanup engine on the same consumer GPU - swapping them in and out of VRAM as VRAM permits. Tap a model to load it, or flip the dial to swap mode and watch only one ride the card at a time.

ENGINE 01
STT engine
Whisper large-v3
the recording brain
Throughput4x realtime
Precisionfloat16 · int8
VRAM~2 GB
Languages10+ auto-detect
Backendfaster-whisper · CUDA
sandboxed sidecar · stdio only
GPU orchestrator
RTX 4080 Ti· 16 GB
VRAM
1612840
0.0/16G
STTLLM
ENGINE 02
LLM engine
Any Ollama model
the cleanup brain
Models22+ catalogued
VRAM1 GB to 42 GB
Outputtoken-stream
Presets12 built-in + custom
BackendOllama · 127.0.0.1
localhost-only · zero telemetry
Auto memory math
Reads VRAM continuously. Decides per-run whether to coexist or evict. You touch nothing.
Zero-touch handoff
You stop talking, we swap models, polished text streams in - token by token. No buttons, no wait spinners.
Built for your GPU
12 GB card? Both warm with headroom. 24 GB? Both stay loaded forever. 48 GB? 70B alongside Whisper, no sweat.
see it cook

Speak. Stop. It's already done.

No "go" button, no second-guessing. The instant your mic cools, the cleanup fires and polished text streams in token by token. Here's the whole cycle, faked for the browser - the real thing runs on your GPU, offline.

0.0s
Ready
Transcript · Whisper large-v3
Click the mic. We'll fake-record a sample, transcribe it, then clean it up.
AI Cleanup · Gemma 3 · 4B
Empty
Local · 0 cloudRTX 3080 Ti · 12 GB
not for everyone

For people who keep their data on their own machine.

If you've got a capable NVIDIA GPU and a hard rule that sensitive audio never leaves the building, this was built for you.

Developers

Talk through a commit message or a spec; clean text lands where your cursor is. Works fully air-gapped.

Creators & streamers

Caption and transcribe footage offline on the same RTX card you edit and game on. No upload, no wait.

Writers

Draft at the speed of speech. A local LLM tidies grammar and filler so the first pass reads like a third.

Clinicians

Dictate notes between patients without a single byte of PHI touching the cloud. HIPAA stays simple when nothing leaves the room.

Legal & finance

Privileged calls and filings transcribed on-box. No third-party processor, no data-residency paperwork.

Air-gapped teams

Defence, research, and secure labs run Voxmelt with the network cable unplugged. By design, not by promise.

on-ramp to the rtx spark era

The future they announced. The hardware you already own.

Jensen Huang says “the PC is being reinvented.” Satya Nadella wants “unmetered intelligence to every home and every desk.” That future ships this fall on RTX Spark Windows PCs with up to 128 GB of unified memory - and Voxmelt will run even better on it. But you don't have to wait for new silicon: the RTX card in your rig runs the whole local pipeline today.

unmetered, by design - no per-minute cloud meterprivate by architecture - audio never leaves your machinelocal AI on the NVIDIA GPU you already paid forRTX Spark-ready when the new hardware lands

Voxmelt is independent and not affiliated with, sponsored by, or endorsed by NVIDIA or Microsoft. “NVIDIA,” “RTX Spark,” and “Windows” are trademarks of their respective owners, used here descriptively only.

Melt local. Keep private.

12 MB installer. Runs on every NVIDIA CUDA GPU from Ampere up - GeForce RTX, workstation, or data-center. 15-day free run of the full thermal pipeline - no card on file, no catch.

Windows 10/11 · NVIDIA CUDA · Ampere → Blackwell · ~12 MB on disk · 0 cloud calls