Generation middle image in a sequence image


I would like to know how one would approach this problem.
I have an input and and output image, how can i generation middle image to make this into a action sequence image.

Can this be adapted to a lstm nn or gans, I using pix2pix (cGans) to create image base on thresholding? How would you approach this problem?

Thanks for sharing ideas!


