Generative Image Dynamics

Zhengqi Li, Richard Tucker, Noah Snavely, Aleksander Holynski

Google Research

CVPR 2024 Best Paper Award

Our approach models an image-space prior on scene dynamics that can be used to turn a single image into a seamless looping video or an interactive dynamic scene.

Our method automatically turns single still images into seamless looping videos.

Abstract

We present an approach to modeling an image-space prior on scene motion. Our prior is learned from a collection of motion trajectories extracted from real video sequences depicting natural, oscillatory dynamics such as trees, flowers, candles, and clothes swaying in the wind. We model this dense, long-term motion prior in the Fourier domain:given a single image, our trained model uses a frequency-coordinated diffusion sampling process to predict a spectral volume, which can be converted into a motion texture that spans an entire video. Along with an image-based rendering module, these trajectories can be used for a number of downstream applications, such as turning still images into seamlessly looping videos, or allowing users to realistically interact with objects in real pictures by interpreting the spectral volumes as image-space modal bases, which approximate object dynamics.

We can simulate the response of object dynamics to an interactive user excitation using
modal analsysis by Davis et al. , interpreting generated spectrum volume as image-space modal basis.

Try it yourself!

Click and drag a point on the image below, release to see how the scene moves!
(For speed, this demo renders using mesh-warping rather than the higher-quality rendering model shown in the paper.)

[Demo requires browser with WebGL2 support.]

Try a different image by clicking on the icons below:

We can minify (top) or magnify (bottom) animated motions by adjusting the amplitude of motion textures.

Slow-motion videos can be generated by interpolating predicted motion textures.

Related Work

We would like to acknowledge the following inspring prior work, which proposed frequency space motion representations for video processing and animation.

Animating Pictures with Stochastic Motion Textures (Yung-Yu Chuang, et al.)
Image-space Modal Bases for Plausible Manipulation of Objects in Video (Davis, Chen, and Durand)
Visual Vibration Analysis (Abe Davis)

Acknowledgements

Thanks to Abe Davis, Rick Szeliski, Andrew Liu, Qianqian Wang, Boyang Deng, Xuan Luo, and Lucy Chai for helpful proofreading, comments and discussions.

BibTeX


      @inproceedings{li2024_GenerativeImageDynamics,
      title     = {Generative Image Dynamics},
      author    = {Li, Zhengqi and Tucker, Richard and Snavely, Noah and Holynski, Aleksander},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year      = {2024}
    }