local AI on NVIDIA · GPU-pipelined · zero cloud

Speak. Melt. Polished text.

The voice framework that runs hot. Whisper large-v3 + any Ollama LLM, fused into one CUDA-cooked pipeline on the GPU you already paid for. NVIDIA and Microsoft just called local, private AI on NVIDIA the future of the PC - Voxmelt did it for your voice first. Tap the always-on-top widget over any app, speak, and a local model rewrites it in the tone you pick - mirrored straight into the full workbench. Zero cloud, no per-minute meter, zero data harvesting. Your voice melts the silicon, polished text is what drips out.

VX·COREGPU RUNTIME

voice in

AI output · streaming

cleanedpunctuated0 cloud

GPU LOAD96%

VRAM6.2 / 8.0 GB

WhisperSTT

gemma3:4bcleanup

TEMP64°C

12 ms/ token

Download for Windows watch it cook

NVIDIA CUDA · Ampere → Blackwell · tensor-toastedHIPAA / GDPR-safe4x realtime · overclockedbuilt for the personal AI computer · RTX Spark-ready

new · May 31, 2026NVIDIA + Microsoft · RTX Spark keynote

The PC just got reinvented around local, private AI on NVIDIA. NVIDIA and Microsoft unveiled RTX Spark - a new class of Windows PC built to run AI on your device, not the cloud. Voxmelt has done exactly that for your voice since day one. You don't need to wait for fall hardware to live in that future - just the NVIDIA GPU already in your rig.

Read the NVIDIA announcement Microsoft's Windows blog

Voxmelt is an independent product, not affiliated with or endorsed by NVIDIA or Microsoft.

why people melt local

100% local processing - zero cloud, zero cap.

Voxmelt

Studio · 9 days left Sign in

COLOR THEME

Nightwing

Matrix

Cyberpunk

RecordWhisper small→Clean up · Default

AI TONE / PRESET

Clean up · Default

Coding prompt · Feature

Chat Responder · Corporate Polish

Summarize · Bullets

Custom prompt · Editable

→gemma2:27b

AI MODEL · OLLAMA

gemma2:27b · 14.56 GB

gemma3:4b · 3.11 GB

gemma4:e4b · 8.95 GB

qwen3:8b · 5.20 GB

qwen3:14b · 9.20 GB

Free GPU Voice Off AI On

RECORD

COMPOSE

VOICEOVER

TEMPLATES

HISTORY

MINI

RTX 3080 Ti

GPU

20%

VRAM

4.5/12G

TEMP

42°C

PWR

138W

Click to record

Ctrl+Shift+R

Microphone access

Whisper small loaded

GPU available

TRANSCRIPT0 words

ProcessAC

Paste or type text here to process with AI, or use the mic to record…

AI OUTPUT

AI output will appear here

Record or paste text, then process

MODELS

Whisper small loadedgemma2:27b off

WHISPERsmall

Tiny↓75 MB ~1.2 GB

Base↓145 MB ~1.3 GB

Small↓480 MB ~1.6 GB

Medium↓1.5 GB ~2.6 GB

Large v3↓3.0 GB ~4.2 GB

Small is the active model. great accuracy · uses ~1.6 GB VRAM.

Load on app launch

Pre-loads the model so the first recording is instant.

Advanced

AI PROCESSINGgemma2:27b

Keep both loaded

Both share VRAM · needs 24+ GB

Performance may be unstable. Both models compete for VRAM.

Connected · 5 models installedRefresh

Ollama v0.24.0INSTALLED MODELS

gemma2:27b

14.56 GB on disk

Remove

gemma3:4b

3.11 GB on disk

Remove

gemma4:e4b

8.95 GB on disk

Remove

qwen3:8b

5.20 GB on disk

Remove

qwen3:14b

9.20 GB on disk

Remove

Readysmall / float16RTX 3080 Ti

Ctrl+Shift+R to toggle

DISPLAY 1 · 27″

2560 × 1440 · the workbench

prompt.txt

FileEditView

Paste your cleaned coder prompt here…

Ln 1, Col 1UTF-8100%

READY - TAP TO DICTATE

Tap mic to dictate

TRANSCRIPT AI

Your transcript will appear here as you speak…

DISPLAY 2 · 24″ portrait

1080 × 1920 · Notepad

100% local Zero cloud No API keys Always-on-top Real-time sync Voice or paste AI tone presets Custom presets 1-tap copy Per-app paste GPU-accelerated Whisper large-v3 Ollama-powered Works air-gapped 4x realtime Auto VRAM swap Global hotkey 3 themes

the thermal pipeline

Two heavy AI models. One GPU. Fully orchestrated.

Voxmelt runs a speech-to-text engine and an LLM cleanup engine on the same consumer GPU - swapping them in and out of VRAM as VRAM permits. Tap a model to load it, or flip the dial to swap mode and watch only one ride the card at a time.

ENGINE 01

STT engine

Whisper large-v3

the recording brain

Throughput4x realtime

Precisionfloat16 · int8

VRAM~2 GB

Languages10+ auto-detect

Backendfaster-whisper · CUDA

sandboxed sidecar · stdio only

GPU orchestrator

RTX 4080 Ti· 16 GB

VRAM

1612840

0.0/16G

STTLLM

ENGINE 02

LLM engine

Any Ollama model

the cleanup brain

Models22+ catalogued

VRAM1 GB to 42 GB

Outputtoken-stream

Presets12 built-in + custom

BackendOllama · 127.0.0.1

localhost-only · zero telemetry

Auto memory math

Reads VRAM continuously. Decides per-run whether to coexist or evict. You touch nothing.

Zero-touch handoff

You stop talking, we swap models, polished text streams in - token by token. No buttons, no wait spinners.

Built for your GPU

12 GB card? Both warm with headroom. 24 GB? Both stay loaded forever. 48 GB? 70B alongside Whisper, no sweat.

see it cook

Speak. Stop. It's already done.

No "go" button, no second-guessing. The instant your mic cools, the cleanup fires and polished text streams in token by token. Here's the whole cycle, faked for the browser - the real thing runs on your GPU, offline.

0.0s

Ready

Transcript · Whisper large-v3

Click the mic. We'll fake-record a sample, transcribe it, then clean it up.

AI Cleanup · Gemma 3 · 4B

Empty

Local · 0 cloudRTX 3080 Ti · 12 GB

not for everyone

For people who keep their data on their own machine.

If you've got a capable NVIDIA GPU and a hard rule that sensitive audio never leaves the building, this was built for you.

Developers

Talk through a commit message or a spec; clean text lands where your cursor is. Works fully air-gapped.

Creators & streamers

Caption and transcribe footage offline on the same RTX card you edit and game on. No upload, no wait.

Writers

Draft at the speed of speech. A local LLM tidies grammar and filler so the first pass reads like a third.

Clinicians

Dictate notes between patients without a single byte of PHI touching the cloud. HIPAA stays simple when nothing leaves the room.

Legal & finance

Privileged calls and filings transcribed on-box. No third-party processor, no data-residency paperwork.

Air-gapped teams

Defence, research, and secure labs run Voxmelt with the network cable unplugged. By design, not by promise.

on-ramp to the rtx spark era

The future they announced. The hardware you already own.

Jensen Huang says “the PC is being reinvented.” Satya Nadella wants “unmetered intelligence to every home and every desk.” That future ships this fall on RTX Spark Windows PCs with up to 128 GB of unified memory - and Voxmelt will run even better on it. But you don't have to wait for new silicon: the RTX card in your rig runs the whole local pipeline today.

unmetered, by design - no per-minute cloud meterprivate by architecture - audio never leaves your machinelocal AI on the NVIDIA GPU you already paid forRTX Spark-ready when the new hardware lands

Voxmelt is independent and not affiliated with, sponsored by, or endorsed by NVIDIA or Microsoft. “NVIDIA,” “RTX Spark,” and “Windows” are trademarks of their respective owners, used here descriptively only.

Melt local. Keep private.

12 MB installer. Runs on every NVIDIA CUDA GPU from Ampere up - GeForce RTX, workstation, or data-center. 15-day free run of the full thermal pipeline - no card on file, no catch.

Download for Windows peek the pricing

Windows 10/11 · NVIDIA CUDA · Ampere → Blackwell · ~12 MB on disk · 0 cloud calls