AI

AI models for audio video android Windows

1. ) Multimodal Generative AI

2.) AI Audio model

2.2) AI music audio generative model

2.3) Audio dubbing AI

Just Dub it (os) : 2.5GB VRAM, A joint audio-visual model is all you need for video dubbing. sync lips also.

2.4) video to audio model

waveflow from meta (os) : : Audio Generation in Waveform Space. High-fidelity audio synthesized directly in raw waveform space — no VAE. no latent compression.
PrismAudio (VIDEO) (os): 6 GB VRAM

2.5) TTS Text to Audio AI model

Scenema Audio: Hindi voice working. voice cloaning bhi kar raha hai. reference audio bhi dal sakte hai. 16 GB VRAM, 32 GB SYSTEM RAM.
RESEMBLE.AI DRAMA BOX: clone voices,
LongCat-AudioDiT (video) (os) : voice cloaning, 6-15 GB VRAM
omni voice (video) (os): voice cloaning, Zero-Shot Text-to-Speech with Diffusion Language Models, 3GB VRAM

3.) AI Video model

FashionChameleon from alibaba: Towards Real-Time and Interactive Human-Garment Video Customization

3.1) AI Audio video Model

3.2) AI images + videos model

3.3) AI images + audio videos

4.) AI Image model

4.1) AI Panoroma image model

PanoWorld : A Generative Spatial World Model for Consistent Whole- House Panorama Synthesis

5.) AI 3d art model

6.) AI gaming model

7.) AI Transcription model / Speech Recognition model / Automatic Speech Recognition Model

8.) AI MODEL FOR Scientist works

9.) AI Video Language Model VLM

Marlin VML is a 2B video VLM tuned for the two questions developers actually like ask their videos: what is happening, and when? It produces structured Scene + Event captions with second-precise timestamps, and resolves natural-language queries to span-grounded (start, end) ranges in the video.

10.) AI Live Translation Model:

Ronak