Home ➤ Reseach Talks ➤ 056 21 06 2023
Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Anusha Lokumarambage
This study has addressed story visualization and continuation task, which are limited in text-to-image models. Authors have mainly focused on stable diffusion-based autoregressive model to be aware of the history of descriptions. For this task authors have utilized proror, flintstone and VIST datasets for training the model, using the first frame as a source frame and the model is tasked to generate the rest of the frames using captions. The history-aware conditioning network, consisting of CLIP and BLIP has been used to caption encoding and previous caption encoding. For real-world applications, adaptive ARLDM has been introduced to generate unseen details.
Page: /