Look into audio-visual alignment models where soundscape and lip-sync, wind, footsteps
https://coda.io/@rohan-mac/honest-videogen-review-data-doesn-t-lie-ai-made-videos-get-publi
Look into audio-visual alignment models where soundscape and lip-sync, wind, footsteps, and ambience synchronize with generated scenes.