Gemini Transcribe

https://gemini-transcribe.fly.dev/

A web application for transcribing audio and video files using Google's Gemini Flash model.

Flash is a very interesting model to explore for audio transcription because:

We can prompt for specific transcription outputs, as it processes both audio and text inputs
It has built-in speaker diarization
It can attempt to detect not only words but also silence, sentiment, and sounds beyond human voices
It can translate the transcription, in particular to languages other than English

Google claims Flash's 1.5 word error rate is 9.6% in the FLEURS benchmark (September, 2024). This project is now using the experimental Flash 2.0, which does not appear to have been benchmarked yet.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
src		src
static		static
.dockerignore		.dockerignore
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
Dockerfile		Dockerfile
README.md		README.md
components.json		components.json
eslint.config.js		eslint.config.js
fly.toml		fly.toml
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
svelte.config.js		svelte.config.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemini Transcribe

About

Releases

Packages

Languages

mikeesto/gemini-transcribe

Folders and files

Latest commit

History

Repository files navigation

Gemini Transcribe

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages