feature deep-dive

Record. Compose. Narrate. All local.

Two AI engines on a single consumer GPU, three creative surfaces, 80+ template operations, and hands-free controls to drive all of it. Fully offline, fully private. Here is how each piece works.

01 · STT engine

Whisper large-v3 - the recording brain

Engine one of two. Whisper large-v3 runs on your GPU via faster-whisper (CTranslate2 + CUDA). Five model sizes from tiny to large-v3, three compute precisions, 10+ languages with auto-detect. On an RTX 3080 Ti, large-v3 transcribes at 4x realtime in float16 - faster than you talk.

  • Unlimited continuous dictation, overlapping chunks with auto-dedup
  • Chunk duration tunable (5 / 15 / 30 / 60 s)
  • Auto language detect, or pin one per session
  • Sandboxed Python sidecar - stdio only, zero network access
02 · LLM engine

Any Ollama model - the cleanup brain

Engine two of two. Pair Whisper's raw transcript with any local LLM Ollama can serve - Gemma 3, Qwen 3, DeepSeek-R1, Llama 4 Scout, Phi-4, Mistral, whatever. 22+ in the catalog, custom prompts unlimited. The orchestrator parses param counts from model tags and estimates VRAM automatically, so 'pull a new model and it just works' is real.

  • 14 template packs with 80+ tones (dictation, rewrite, chat, email, social, video, meeting notes, coding prompts...)
  • Custom system prompts for medical notes, legal briefs, meeting minutes
  • Token-level streaming, cancel mid-run, run-count tracking
  • Localhost-only (127.0.0.1:11434), zero telemetry, verifiable in netstat
03 · the orchestrator

Smart VRAM swap - why both engines coexist

This is the part no other app ships. Auto reads your card's total VRAM, measures Whisper's actual footprint, estimates the LLM's from its tag, and decides per-run whether to coexist or evict. Always-evict is safest for unknown models. Never-evict keeps recording snappy over AI speed. Auto picks the best of both for your hardware - you never tune a thing.

Mic in. Clean text out.
end to end
MicSpeakWhisperTranscribeOllamaClean upClipboardDone
  • Live VRAM telemetry, updated every 2 seconds
  • Color-coded fit chips: green (both fit), amber (tight), red (won’t run)
  • One-click Free GPU - dumps every model so you can game or render
  • Idle auto-free after 1 / 5 / 15 / 30 min. Alt-tab to a game, VRAM comes back
VRAM Handoff · 12 GB card
Whisper 3 GB
9.0 GB free
Whisper · loadedLLM · cold
whisper warmrecordingevictingllm warmreloading
$ Mic warm. LLM cold. Ready to record.
04 · air-gapped

Privacy by architecture, not by ToS

Both engines live on your machine. Whisper runs in a sandboxed sidecar talking JSON over stdio - no sockets, no shared memory. Ollama is bolted to 127.0.0.1. We don't ship telemetry, analytics, or even a license server you have to ping. Built for HIPAA, GDPR, SOC 2, and air-gapped boxes where 'we promise we won't' isn't good enough.

  • No API keys. No account required to record
  • No internet after install - works on an air-gapped box
  • Whisper sidecar has zero network access by design
  • Ollama localhost-only - receipts in netstat, not promises
beyond transcription

Three creative surfaces built on those engines.

Record is the flagship. Compose and Voiceover turn the same local AI into a text studio and an audio studio - no mic required, no cloud, same GPU.

05 · compose

Compose - the text-first AI workspace.

Not everything starts with your voice. Paste any text - a rough draft, a colleague's email, meeting notes from another tool - pick a template and a tone, and let the local LLM reshape it. Same AI engine as Record, same templates, same tier access. No recording needed, no cloud, same GPU. Think of it as a private, local alternative to pasting into ChatGPT - except your text never leaves your machine.

  • Paste or type any text - runs through the same 80+ template tones
  • Decoupled template + tone selection - mix any action with any style
  • Streaming output with copy, re-run, and Read Aloud on the result
  • "Use last transcript" shortcut - bridge a recording into a different tone
  • Same daily run limits, same tier gating, same model picker as Record
  • Full AI Output panel - tone-shift chips, voice edit, WAV export
Compose, live · pick a template + tone
Templates · 17
Translate
PolyglotFree
Translate the transcript to another language.
gemma3:4b
Tone · 6 options
Spoken input · sample
i'd like one more coffee please, and tell the chef this is the best pasta i've ever had
AI output · streaming0 chars
Local · 100% on GPU · 0 tokens billed
06 · voiceover studio

Voiceover - text to audio, on your GPU.

Turn any script into narration without a recording booth. Four distinct personas (Clara, Marcus, Maya, Sam) crossed with eight delivery tones (Professional, Conversational, Warm, Calm, Bright, Authoritative, Storyteller, Energetic) give you 32 voice combinations - all synthesized locally on your GPU via Kokoro ONNX. No per-character billing, no cloud API, no rate limits. Export clean WAV files straight to your Downloads folder. Unlike ElevenLabs or Play.ht, your scripts never leave your machine and there is no usage meter ticking.

  • 4 personas x 8 tones = 32 voice combinations, all on-device
  • Speed control from 0.7x to 1.5x (Pro+) - match any pacing need
  • WAV export to Downloads - drag into your editor, timeline, or DAW
  • Sentence-by-sentence playback tracking with live highlighting
  • "Use last AI output" shortcut - polish in Compose, narrate in Voiceover
  • No per-character billing - generate as much audio as your GPU can handle

Personas

  • Clara - clear, articulate female voice (Free)
  • Marcus - measured, corporate male voice (Free)
  • Maya - warm, expressive female voice (Pro)
  • Sam - relaxed, conversational male voice (Pro)

Delivery tones

  • Professional, Conversational, Warm (Free)
  • Calm, Bright, Authoritative (Pro)
  • Storyteller, Energetic (Pro)
  • Speed control 0.7x - 1.5x (Pro)
The studio, live · 4 personas x 8 tones
voice studio · 32 combinations
Free voice
Persona · 4
Delivery tone · 8
Speed
Clara · Professional
1.0x speed
Welcome to Voxmelt. Turn any script into natural narration, right here on your GPU. No cloud, no per-character billing, no usage meter ticking. Pick a voice, hit generate, and export a clean WAV file.
Kokoro-82M ONNX · CPU synth, 0 VRAM · 0 tokens billed
07 · template engine

80+ AI operations - one click, every time.

Templates are how Voxmelt turns a raw transcript or pasted text into exactly the format you need. Each template pack contains multiple tones - not just "rewrite" but "rewrite as a casual Slack reply" or "rewrite as a formal decline email" or "rewrite as a LinkedIn post." Pick the action, pick the style, get the output. Every template works in both Record and Compose.

Quick edit

Vibe Dictation (6 tones), Rewrite (5), Proofread (4)

Tone shift

Diplomatic (5 tones), Persuasive (5)

AI and code prompts

AI Prompt Architect (5), Coding Prompt (4)

Communications

Chat Responder (6), Email (5), Social Post (4), Video Script (4)

Explain and translate

Meeting Recap (4), Notes (4), Translate (6 languages)

Custom

Write your own system prompt. Free: 1 slot, Pro: 5, Studio: 30

The catalog, live · filter by persona

Vibe Dictation

Polish raw spoken thoughts, rants, or logic into structured text in any vibe.

Voice Polisher6 tones

Chat Responder

Turn raw spoken reactions into perfectly vibed replies for Slack, Teams, or text.

Ghostwriter6 tones

AI prompt

Convert spoken thought into a clean, structured LLM prompt.

Data Architect5 tones

Coding prompt

Developer-ready spec for AI coding assistants.

Data Architect4 tones

Meeting Recap

Turn a recorded meeting, call, or standup into action items, decisions, minutes, or a TL;DR.

Info Distiller4 tones

Distill Notes

Compress a long voice memo, lecture, or brain-dump into bullets, an outline, a TL;DR, or study notes.

Info Distiller4 tones

Signal Extractor

Strip away all the noise and keep only the dense nucleus - a one-line gist, a pure deliverables list, or a single atomic note.

Info Distiller3 tones

Translate

Default

Translate the transcript to another language.

Polyglot6 tones

Custom prompt

Pick a persona, add your own instructions - runs 100% on your GPU, like every built-in.

Customsingle output

Email

Turn a spoken intent into a ready-to-send email - replies, follow-ups, declines, outreach, apologies.

Ghostwriter5 tones

Social post

Shape a raw thought into a post that lands - a LinkedIn post, an X thread, a hook, or a caption.

Ghostwriter4 tones

Rewrite

The everyday rewriter - paraphrase, shorten, expand, formalize, or sharpen any text.

Voice Polisher5 tones

Proofread

Fix and tighten - grammar, clarity, conciseness, and direct active voice.

Voice Polisher4 tones

Diplomatic

Recast a blunt or risky message so it lands well - tactful, calm, and relationship-safe, without losing the point.

Voice Polisher5 tones

Persuasive

Recast flat text into something compelling - confident, benefit-led, and hard to say no to, with the meaning intact.

Voice Polisher5 tones

Explain

Make any idea click - explain it simply, step by step, by analogy, or with a worked example.

Info Distiller4 tones

Video script

Spoken idea to a script you can read on camera or send straight to the Voiceover page.

Ghostwriter4 tones
hands-free controls

Two ways to drive it without touching the keyboard.

The engines do the thinking. These are how you run them - eyes off, hands free, never breaking flow.

08 · voice commands

No-Hands Mode - run it all by voice.

A second, always-on Whisper tiny model (~0.3 GB VRAM) listens in the background and fuzzy-matches your speech against short, phonetically-distinct phrases. Say "go ahead" to start dictating, "wrap up" to stop, "copy that" to grab the clean text, "warm up" to load the AI - no hotkey, no mouse, no looking at the window. The listener steps aside the instant real dictation starts so it never fights your main model for the mic or the GPU, then resumes the moment you stop. Say it, and it happens.

  • Always-on tiny listener - ~0.3 GB VRAM, runs beside large-v3
  • 8 built-in commands, every phrase remappable to your own
  • 2-word phrases tuned for ~80%+ accuracy on the tiny model
  • Auto-pauses during dictation, auto-resumes the second you stop
  • Load or unload models by voice while idle ("warm up" / "cool down")
  • The command listener is local too - nothing leaves your machine
always-on · tiny listener
heard
"go ahead"
fuzzy match0.88
↑ 0.45 fire
waiting for a command…
tiny · int8 · ~0.3 GB VRAM · 2s window · 50% overlap · 100% local
where it earns its keep

Vibe coders

Hands on the keys in Cursor or Copilot - say "go ahead", dictate the next prompt, "wrap up", "copy that", paste. Ship without breaking flow to hunt for a hotkey.

Hands-busy & accessibility

Cooking, soldering, mid-workout, or resting your wrists from RSI - run dictation start to finish by voice, from across the room.

Streamers & presenters

Trigger capture and cleanup without alt-tabbing out of your scene. Your voice is the shortcut; the window can stay hidden.

09 · mini mode

Mini Mode - a pill that floats over everything.

Collapse Voxmelt into a tiny always-on-top pill you can drop anywhere and snap to any screen edge. It records in its own window with its own Whisper pipeline, mirrors your theme live, and shows the transcript and AI output in compact tabs - so you can dictate into any app without the full workbench in the way. Pair it with No-Hands Mode and the main window can stay closed in the tray while you run the whole flow from the pill. Out of the way, one tap away.

  • Always-on-top, drag anywhere, snap-to-edge, optional lock
  • Its own recording pipeline - keeps working with the window closed to tray
  • Live transcript + AI-output tabs in a 72px pill
  • Full theme sync - 5 themes, custom colors, custom opacity
  • One-tap copy, paste into any app; mic mirrors the main record button
  • "Ready halo" lights up when No-Hands Mode is armed
always-on-top · floats over any app
message-to-team.txt
Type here…
Voxmelt MiniTap mic to dictate

Floating pill - dictate into any app

theme
always-on-top · own pipeline · works closed-to-tray · 5 themes
where it earns its keep

Voice-input power users

Park the pill in a corner and dictate into Slack, email, chat, or docs all day - no full workbench hogging the screen.

Small screens & dual-monitor

Keep your work full-screen; the pill rides the edge of a laptop panel or a second display, always one tap away.

Vibe coders

Float it over the IDE and talk your prompts straight into Copilot or Cursor - eyes on the code, not on a separate app.

Your silicon is ready to melt.

The local-AI-on-NVIDIA future NVIDIA and Microsoft just announced - shipping for your voice today, on the GPU you already own. RTX Spark-ready when the new hardware lands.

Download for Windowspeek the pricing