Descript Keyboard Shortcuts
Descript's core trick — editing audio and video by deleting words in a text transcript rather than scrubbing a waveform — completely reshapes what its shortcuts need to do compared to a traditional editor like Premiere or Audition. Delete and backspace on selected transcript text is genuinely the primary editing action in most Descript sessions, since removing a filler word or a bad take is as simple as selecting the text and pressing delete, with the underlying audio/video automatically trimmed to match, a fundamentally different interaction model than positioning a playhead and razor-cutting a waveform manually. Because Descript also includes Overdub (AI voice cloning for corrections) and Studio Sound (AI audio cleanup), a few of its shortcuts trigger these AI-powered actions directly from the transcript view, reflecting how deeply integrated Descript's AI tooling is into the core editing flow rather than being a separate bolted-on feature accessed through a different part of the app. Podcasters editing a weekly interview show are probably Descript's clearest use case: instead of hunting through a waveform to find and cut a rambling tangent or a string of ‘ums,’ you read the transcript like a document, delete the offending sentences the way you'd edit a Word draft, and the audio simply follows — a workflow that can turn what used to be an hour of waveform-scrubbing into something closer to copyediting a paragraph.
Transcript Editing
| Action | Windows | Mac | Description |
|---|---|---|---|
| Delete selected transcript text (and underlying media) | Delete/Backspace | Delete | Deletes the selected words from the transcript, automatically cutting the corresponding audio/video segment to match — the primary editing action in Descript, replacing the waveform-scrubbing-and-razor-cutting workflow of traditional editors. |
| Undo | Ctrl+Z | Cmd+Z | Rolls back the most recent edit, whether that was a transcript-text deletion or a traditional timeline change, using the same undo convention nearly every editor shares. |
| Find and replace in transcript | Ctrl+F | Cmd+F | Runs a text search across the whole transcript for a word or phrase, jumping straight to each hit — including the option to batch-remove filler words like 'um' throughout the recording without scrubbing through it by ear. |
| Remove filler words automatically | Edit menu > Remove Filler Words (no default key) | — | Scans the transcript for common filler words like 'um' and 'uh' and offers to remove them in a batch, an automated shortcut for a cleanup pass that would otherwise require manually finding and deleting each instance one at a time. |
Playback Navigation
| Action | Windows | Mac | Description |
|---|---|---|---|
| Play / pause | Space | Space | Toggles playback from the current position, standard convention shared with virtually all audio/video editing software regardless of its underlying editing model. |
| Jump playback to clicked word | Click word in transcript | Click word in transcript | Moves the playhead to the exact timestamp of a clicked word in the transcript, letting you navigate the recording by reading rather than by scrubbing a waveform or timeline visually. |
| Jump to next/previous scene or clip | Ctrl+Right / Ctrl+Left (in Storyboard/Scenes) | — | Moves the playhead between distinct scene or clip boundaries in a multi-clip project, useful for reviewing a video edit sequence-by-sequence rather than scrubbing continuously through the whole timeline. |
Ai Tools
| Action | Windows | Mac | Description |
|---|---|---|---|
| Apply Studio Sound (AI audio cleanup) | Effects panel > Studio Sound (no dedicated key) | — | Applies AI-powered audio cleanup to reduce background noise and improve clarity, one of Descript's built-in AI tools accessible directly from the editing interface rather than requiring a separate audio-restoration application. |
| Generate Overdub correction for selected text | Select text > Overdub menu | — | Generates an AI voice clone (from a previously trained Overdub voice model) to seamlessly replace a selected word or phrase, letting you fix a spoken mistake without re-recording, one of Descript's most distinctive and occasionally controversial AI features. |
| Apply Eye Contact correction to video | Effects panel > Eye Contact (no dedicated key) | — | Applies an AI correction that adjusts a speaker's gaze in a webcam-style recording to appear as though they're looking directly at the camera, useful when a presenter was reading from notes or a second monitor rather than looking into the lens. |
Frequently Asked Questions
Does deleting a word from the transcript permanently remove that audio, or can it be recovered?
Deletions are non-destructive within the project — Descript maintains the original recording and applies edits as an editable layer on top, so deleted words can typically be recovered through undo history or by referencing the original unedited transcript/recording, similar to how non-destructive editing works in traditional NLEs.
How accurate is Descript's automatic transcription that the whole editing model depends on?
Transcription accuracy is generally strong for clear speech in good recording conditions but can degrade with heavy accents, overlapping speakers, background noise, or poor audio quality, and since editing is transcript-driven, transcription errors can occasionally cause edits to land on the wrong word — Descript allows manual correction of the transcript to fix this.
Is Overdub voice cloning something anyone can use on any voice, or does it require consent from the speaker?
Overdub requires you to record consent phrases and train a voice model specifically from your own voice (or a voice you have explicit permission to clone) before it can be used, a safeguard built into the feature specifically to prevent generating synthetic speech in someone else's voice without their authorization.
Does the automatic filler-word removal ever cut something that wasn't actually a filler word?
Occasionally — because the detection is pattern-based on common filler sounds, it can sometimes flag a word that happens to sound similar in context but was meaningful to the sentence, which is why Descript surfaces the proposed removals for review rather than silently deleting them, letting you reject individual suggestions before committing to the batch cleanup.
Can Eye Contact correction be overdone or look unnatural?
Yes, particularly with larger gaze corrections or lower-quality source footage, since the effect is generating a synthetic approximation of direct eye contact rather than capturing it optically — most editors apply it conservatively and review the result at full resolution before finalizing, rather than assuming the default setting is automatically the most natural-looking option for every clip.
Does Descript work well for straightforward video editing that has nothing to do with spoken dialogue, like a music video?
Less so — Descript's transcript-driven editing model provides the biggest advantage specifically when a project is dialogue- or narration-heavy, since the transcript is the thing you're actually editing against; for music videos or heavily visual b-roll sequences with minimal speech, a traditional timeline-based editor like Premiere Pro generally offers more direct and precise control over cuts.
Does Descript have a shortcut for removing filler words automatically from a transcript?
The Filler Word removal feature is triggered from Descript's Studio Sound or Correct panel rather than a keyboard shortcut, since it runs an automated detection pass across the whole transcript and presents suggested removals for review before applying them, a multi-step workflow that doesn't reduce cleanly to a single keystroke.