feature deep-dive

Record. Compose. Narrate. All local.

Two AI engines on a single consumer GPU, three creative surfaces, 80+ template operations, and hands-free controls to drive all of it. Fully offline, fully private. Here is how each piece works.

01 · STT engine

Whisper large-v3 - the recording brain

Engine one of two. Whisper large-v3 runs on your GPU via faster-whisper (CTranslate2 + CUDA). Five model sizes from tiny to large-v3, three compute precisions, 10+ languages with auto-detect. On an RTX 3080 Ti, large-v3 transcribes at 4x realtime in float16 - faster than you talk.

Unlimited continuous dictation, overlapping chunks with auto-dedup
Chunk duration tunable (5 / 15 / 30 / 60 s)
Auto language detect, or pin one per session
Sandboxed Python sidecar - stdio only, zero network access

02 · LLM engine

Any Ollama model - the cleanup brain

Engine two of two. Pair Whisper's raw transcript with any local LLM Ollama can serve - Gemma 3, Qwen 3, DeepSeek-R1, Llama 4 Scout, Phi-4, Mistral, whatever. 22+ in the catalog, custom prompts unlimited. The orchestrator parses param counts from model tags and estimates VRAM automatically, so 'pull a new model and it just works' is real.

14 template packs with 80+ tones (dictation, rewrite, chat, email, social, video, meeting notes, coding prompts...)
Custom system prompts for medical notes, legal briefs, meeting minutes
Token-level streaming, cancel mid-run, run-count tracking
Localhost-only (127.0.0.1:11434), zero telemetry, verifiable in netstat

03 · the orchestrator

Smart VRAM swap - why both engines coexist

This is the part no other app ships. Auto reads your card's total VRAM, measures Whisper's actual footprint, estimates the LLM's from its tag, and decides per-run whether to coexist or evict. Always-evict is safest for unknown models. Never-evict keeps recording snappy over AI speed. Auto picks the best of both for your hardware - you never tune a thing.

Mic in. Clean text out.

end to end

Live VRAM telemetry, updated every 2 seconds
Color-coded fit chips: green (both fit), amber (tight), red (won’t run)
One-click Free GPU - dumps every model so you can game or render
Idle auto-free after 1 / 5 / 15 / 30 min. Alt-tab to a game, VRAM comes back

VRAM Handoff · 12 GB card

Whisper 3 GB

9.0 GB free

Whisper · loadedLLM · cold

whisper warmrecordingevictingllm warmreloading

$ Mic warm. LLM cold. Ready to record.

04 · air-gapped

Privacy by architecture, not by ToS

Both engines live on your machine. Whisper runs in a sandboxed sidecar talking JSON over stdio - no sockets, no shared memory. Ollama is bolted to 127.0.0.1. We don't ship telemetry, analytics, or even a license server you have to ping. Built for HIPAA, GDPR, SOC 2, and air-gapped boxes where 'we promise we won't' isn't good enough.

No API keys. No account required to record
No internet after install - works on an air-gapped box
Whisper sidecar has zero network access by design
Ollama localhost-only - receipts in netstat, not promises

beyond transcription

Three creative surfaces built on those engines.

Record is the flagship. Compose and Voiceover turn the same local AI into a text studio and an audio studio - no mic required, no cloud, same GPU.

05 · compose

Compose - the text-first AI workspace.

Not everything starts with your voice. Paste any text - a rough draft, a colleague's email, meeting notes from another tool - pick a template and a tone, and let the local LLM reshape it. Same AI engine as Record, same templates, same tier access. No recording needed, no cloud, same GPU. Think of it as a private, local alternative to pasting into ChatGPT - except your text never leaves your machine.

Paste or type any text - runs through the same 80+ template tones
Decoupled template + tone selection - mix any action with any style
Streaming output with copy, re-run, and Read Aloud on the result
"Use last transcript" shortcut - bridge a recording into a different tone
Same daily run limits, same tier gating, same model picker as Record
Full AI Output panel - tone-shift chips, voice edit, WAV export

Compose, live · pick a template + tone

Templates · 17

Translate

PolyglotFree

Translate the transcript to another language.

gemma3:4b

Tone · 6 options

Spoken input · sample

i'd like one more coffee please, and tell the chef this is the best pasta i've ever had

AI output · streaming0 chars

Local · 100% on GPU · 0 tokens billed

06 · voiceover studio

Voiceover - text to audio, on your GPU.

Turn any script into narration without a recording booth. Four distinct personas (Clara, Marcus, Maya, Sam) crossed with eight delivery tones (Professional, Conversational, Warm, Calm, Bright, Authoritative, Storyteller, Energetic) give you 32 voice combinations - all synthesized locally on your GPU via Kokoro ONNX. No per-character billing, no cloud API, no rate limits. Export clean WAV files straight to your Downloads folder. Unlike ElevenLabs or Play.ht, your scripts never leave your machine and there is no usage meter ticking.

4 personas x 8 tones = 32 voice combinations, all on-device
Speed control from 0.7x to 1.5x (Pro+) - match any pacing need
WAV export to Downloads - drag into your editor, timeline, or DAW
Sentence-by-sentence playback tracking with live highlighting
"Use last AI output" shortcut - polish in Compose, narrate in Voiceover
No per-character billing - generate as much audio as your GPU can handle

Personas

Clara - clear, articulate female voice (Free)
Marcus - measured, corporate male voice (Free)
Maya - warm, expressive female voice (Pro)
Sam - relaxed, conversational male voice (Pro)

Delivery tones

Professional, Conversational, Warm (Free)
Calm, Bright, Authoritative (Pro)
Storyteller, Energetic (Pro)
Speed control 0.7x - 1.5x (Pro)

The studio, live · 4 personas x 8 tones

voice studio · 32 combinations

Free voice

Persona · 4

Delivery tone · 8

Speed

Clara · Professional

1.0x speed

Welcome to Voxmelt. Turn any script into natural narration, right here on your GPU. No cloud, no per-character billing, no usage meter ticking. Pick a voice, hit generate, and export a clean WAV file.

Kokoro-82M ONNX · CPU synth, 0 VRAM · 0 tokens billed

07 · template engine

80+ AI operations - one click, every time.

Templates are how Voxmelt turns a raw transcript or pasted text into exactly the format you need. Each template pack contains multiple tones - not just "rewrite" but "rewrite as a casual Slack reply" or "rewrite as a formal decline email" or "rewrite as a LinkedIn post." Pick the action, pick the style, get the output. Every template works in both Record and Compose.

Quick edit

Vibe Dictation (6 tones), Rewrite (5), Proofread (4)

Tone shift

Diplomatic (5 tones), Persuasive (5)

AI and code prompts

AI Prompt Architect (5), Coding Prompt (4)

Communications

Chat Responder (6), Email (5), Social Post (4), Video Script (4)

Explain and translate

Meeting Recap (4), Notes (4), Translate (6 languages)

Custom

Write your own system prompt. Free: 1 slot, Pro: 5, Studio: 30

The catalog, live · filter by persona

Vibe Dictation

Polish raw spoken thoughts, rants, or logic into structured text in any vibe.

Voice Polisher6 tones

Chat Responder

Turn raw spoken reactions into perfectly vibed replies for Slack, Teams, or text.

Ghostwriter6 tones

AI prompt

Convert spoken thought into a clean, structured LLM prompt.

Data Architect5 tones

Coding prompt

Developer-ready spec for AI coding assistants.

Data Architect4 tones

Meeting Recap

Turn a recorded meeting, call, or standup into action items, decisions, minutes, or a TL;DR.

Info Distiller4 tones

Distill Notes

Compress a long voice memo, lecture, or brain-dump into bullets, an outline, a TL;DR, or study notes.

Info Distiller4 tones

Signal Extractor

Strip away all the noise and keep only the dense nucleus - a one-line gist, a pure deliverables list, or a single atomic note.

Info Distiller3 tones

Translate

Default

Translate the transcript to another language.

Polyglot6 tones

Custom prompt

Pick a persona, add your own instructions - runs 100% on your GPU, like every built-in.

Customsingle output

Email

Turn a spoken intent into a ready-to-send email - replies, follow-ups, declines, outreach, apologies.

Ghostwriter5 tones

Social post

Shape a raw thought into a post that lands - a LinkedIn post, an X thread, a hook, or a caption.

Ghostwriter4 tones

Rewrite

The everyday rewriter - paraphrase, shorten, expand, formalize, or sharpen any text.

Voice Polisher5 tones

Proofread

Fix and tighten - grammar, clarity, conciseness, and direct active voice.

Voice Polisher4 tones

Diplomatic

Recast a blunt or risky message so it lands well - tactful, calm, and relationship-safe, without losing the point.

Voice Polisher5 tones

Persuasive

Recast flat text into something compelling - confident, benefit-led, and hard to say no to, with the meaning intact.

Voice Polisher5 tones

Explain

Make any idea click - explain it simply, step by step, by analogy, or with a worked example.

Info Distiller4 tones

Video script

Spoken idea to a script you can read on camera or send straight to the Voiceover page.

Ghostwriter4 tones

Open the full template explorer

hands-free controls

Two ways to drive it without touching the keyboard.

The engines do the thinking. These are how you run them - eyes off, hands free, never breaking flow.

08 · voice commands

No-Hands Mode - run it all by voice.

A second, always-on Whisper tiny model (~0.3 GB VRAM) listens in the background and fuzzy-matches your speech against short, phonetically-distinct phrases. Say "go ahead" to start dictating, "wrap up" to stop, "copy that" to grab the clean text, "warm up" to load the AI - no hotkey, no mouse, no looking at the window. The listener steps aside the instant real dictation starts so it never fights your main model for the mic or the GPU, then resumes the moment you stop. Say it, and it happens.

Always-on tiny listener - ~0.3 GB VRAM, runs beside large-v3
8 built-in commands, every phrase remappable to your own
2-word phrases tuned for ~80%+ accuracy on the tiny model
Auto-pauses during dictation, auto-resumes the second you stop
Load or unload models by voice while idle ("warm up" / "cool down")
The command listener is local too - nothing leaves your machine

always-on · tiny listener

heard

"go ahead"

fuzzy match0.88

↑ 0.45 fire

waiting for a command…

tiny · int8 · ~0.3 GB VRAM · 2s window · 50% overlap · 100% local

where it earns its keep

Vibe coders

Hands on the keys in Cursor or Copilot - say "go ahead", dictate the next prompt, "wrap up", "copy that", paste. Ship without breaking flow to hunt for a hotkey.

Hands-busy & accessibility

Cooking, soldering, mid-workout, or resting your wrists from RSI - run dictation start to finish by voice, from across the room.

Streamers & presenters

Trigger capture and cleanup without alt-tabbing out of your scene. Your voice is the shortcut; the window can stay hidden.

09 · mini mode

Mini Mode - a pill that floats over everything.

Collapse Voxmelt into a tiny always-on-top pill you can drop anywhere and snap to any screen edge. It records in its own window with its own Whisper pipeline, mirrors your theme live, and shows the transcript and AI output in compact tabs - so you can dictate into any app without the full workbench in the way. Pair it with No-Hands Mode and the main window can stay closed in the tray while you run the whole flow from the pill. Out of the way, one tap away.

Always-on-top, drag anywhere, snap-to-edge, optional lock
Its own recording pipeline - keeps working with the window closed to tray
Live transcript + AI-output tabs in a 72px pill
Full theme sync - 5 themes, custom colors, custom opacity
One-tap copy, paste into any app; mic mirrors the main record button
"Ready halo" lights up when No-Hands Mode is armed

always-on-top · floats over any app

message-to-team.txt

Type here…

Voxmelt MiniTap mic to dictate

Floating pill - dictate into any app

theme

always-on-top · own pipeline · works closed-to-tray · 5 themes

see the full dual-monitor cinema

where it earns its keep

Voice-input power users

Park the pill in a corner and dictate into Slack, email, chat, or docs all day - no full workbench hogging the screen.

Small screens & dual-monitor

Keep your work full-screen; the pill rides the edge of a laptop panel or a second display, always one tap away.

Vibe coders

Float it over the IDE and talk your prompts straight into Copilot or Cursor - eyes on the code, not on a separate app.

Your silicon is ready to melt.

The local-AI-on-NVIDIA future NVIDIA and Microsoft just announced - shipping for your voice today, on the GPU you already own. RTX Spark-ready when the new hardware lands.

Download for Windows peek the pricing