AI Video Generation - General Discussion, Tips, Tricks, Frustrations and Showcases

Incredible! Truly inspiring.
Thank you Casshern2 for your work and for sharing.
Is the entire model A.I. generated or just specific parts?
Thanks again!
 
@Casshern2 I just saw this thread. I've recently started playing with ComfyUI and various workflows. I have been looking for something that would produce good image-to-video results, and so far, I've only got some success with WAN 2.1. It seems like you've been using WAN 2.2, right? Are you using a workflow that integrates both high noise and low noise diffusion, or a single WAN 2.2 model? From my experience, WAN 2.1 is competent but not great at adhering to text prompts; been using a CFG value of 1.5, as suggested in the post I downloaded the workflow from. Anyway, I still have a lot to play around, but I'd like to try my hand at a working WAN 2.2 setup. Can you share some more insights of your workflow and the models you are using currently? Maybe I missed the post if you already did.

Thanks!
 
Incredible! Truly inspiring.
Thank you Casshern2 for your work and for sharing.
Is the entire model A.I. generated or just specific parts?
Thanks again!
Hey, there, sorry for the late reply, was under the weather for a bit. All AI using Forge and an initial model for the text-to-image named cyberrealisticPony_v125 then resized/upscaled using a model named lucentxlPonyByKlaabu_b20 for image-to-image
 
  • Like
Reactions: columbussnake
@Casshern2 I just saw this thread. I've recently started playing with ComfyUI and various workflows. I have been looking for something that would produce good image-to-video results, and so far, I've only got some success with WAN 2.1. It seems like you've been using WAN 2.2, right? Are you using a workflow that integrates both high noise and low noise diffusion, or a single WAN 2.2 model? From my experience, WAN 2.1 is competent but not great at adhering to text prompts; been using a CFG value of 1.5, as suggested in the post I downloaded the workflow from. Anyway, I still have a lot to play around, but I'd like to try my hand at a working WAN 2.2 setup. Can you share some more insights of your workflow and the models you are using currently? Maybe I missed the post if you already did.

Thanks!
Hey, there, sorry for the late reply, was under the weather for a bit. Good deal on trying ComfyUI, hope you're having fun with it like I am. It can seem like a lot, going strictly by the visuals of it, but once you use a workflow or two as-is getting the mentioned models in place then watching a YouTube video here and there, it starts to get easier to manage and change. Don't be afraid to move things around for things to make sense for you, it's 100% flexible as you know, so all the workflows out there are just laid out however the author did it. I tend to move things around to make them try to fit on one screen.

I highly recommend watching this video from start to finish, multiple times maybe on one screen while you follow along on another if you have two. Real easy to follow; the guy's voice is friendly (probably AI generated but maybe not, I don't hear pauses or mispronunciations like you can on AI voices) and he takes you from start to finish creating a workflow yourself. This is what gave me better confidence and melted away any thought of "I can't do this stuff".

Wan 2.2 Image to Video for AI Commercials Comfyui GGUF

I haven't been active because of a stomach issue plus work, but as I'm typing this I have video a being generated with this image, it will most likely be done before I finish here.

00010-3799115673.jpgI've attached the workflow I used. Again, I moved things around, maybe in a non-standard way, but only because I know now how things connect and in what order by all the videos and articles and looking at other workflows. This one is using the Wan 2.2 high and low noise models. I got spoiled with LTX early on, but they've languished and haven't produced things as fast as the Wan group and community for it. They finally have something new about to come out, but by now I've moved on. Yes, it is faster, but the quality and prompt adherence is way better with Wan (maybe just for now).

Here are the models/"lightning" loras I used. These are the GGUF versions, though, since the real models are over 20GB each and take a while to load for starters. If you're system can handle it you can try those, but you'll need to replace the GGUF Loaders with Checkpoint Loaders:

Wan2.2-I2V-A14B-HighNoise-Q4_K_M.gguf
Wan2.2-I2V-A14B-LowNoise-Q4_K_M.gguf
Wam2_2_Lightning_high_noise_model.safetensors
Wam2_2_Lightning_low_noise_model.safetensors

For the prompt, I keep it simple as I've read more than one person say for image-to-video you don't have to give so much detail on what is already on the image, it knows there is a person/woman there, no need to describe her in detail or the surroundings for that matter. That's more essential for text-to-video.

At any rate, video generation just finished. It was the first of the night so it included all the models loading, 30min in total, including model loading, so basically around 16min or so. Subsequent runs are faster. I'll be genuine and post what came out, they're not all golden, it can take a while to get what you want, or just to avoid odd things and anomalies that can and will pop up. She's a bit all over the place because I put too many actions in the prompt. Something I picked up to avoid slow motion videos after reading a question from @kharo88 back in August. I'm sure if I remove something, she will behave haha.


WanWorkflow.jpg

EDIT: Wow, it actually took a bit longer to generate the new one (around 19min) without her hop in the prompt. I did notice it hung a bit on the prompt node, but not sure if that contributed to the length of the run that much. Oh well. What I learned to do pretty quick was not watch this paint dry, I either listen to music on YouTube or watch something and check back.

 

Attachments

Last edited:
Here are some others I queued last night, same prompt and workflow, just fed it different images. You'll find even though they all had the same prompt, they won't all do the same thing. Probably a seed thing, I have it sent to Random after each generation, but even then based on the image being used it won't match actions exactly.

00011-3799115686.jpg00014-3799115694.jpg00002-3799115646.jpg00003-3799115647.jpg




 
Moved on to Christmas now that Halloween is over. :D Here is a video based on the images you can find in the thread below.


 
  • Like
Reactions: JavNoob
Moved on to Christmas now that Halloween is over. :D Here is a video based on the images you can find in the thread below.


Wow! What a time to be alive
truly impressive!
 
If this is an off-topic question, I sincerely apologize.

I’ve recently become very interested in AI video generation, and I have a few questions:

1.
From a technical perspective, is it possible to extend or modify existing movie scenes using AI?

2. I’ve noticed that AI videos can be generated either through APIs/web apps or locally on a personal machine.
Is there potential to earn money in this field, since hosted web-based solutions might be more affordable for users?

3. Aside from bypassing restrictions or censorship, what are the main advantages of generating videos locally?
 
  • Like
Reactions: Casshern2
Hey, pal, no such thing as off topic on any threads I've started! :cool:

1. From a technical perspective, is it possible to extend or modify existing movie scenes using AI? - Yes, and that seems to get better as things progress. Originally the best way to was to extract the last frame of a generated video and use it as the source of the next video. But, that lead to image/video quality degradation and of course the AI had no idea what was going on before that last frame, so there would invariably be a jolt or sudden change of action regardless of the prompt being used. These days they have workflows that can use the last x number of frames from a video to give the AI an idea of the motion to guide it into producing a more seamless continuation, or as close as possible. And they now have some ways to combat the degradation somewhat.
2. I’ve noticed that AI videos can be generated either through APIs/web apps or locally on a personal machine.
Is there potential to earn money in this field, since hosted web-based solutions might be more affordable for users? Money to be made for hosting a site to produce videos, you mean? And charge for membership use? I'd say yes, but, I could only image (literally, because I don;t know and have never looked into it) there would be a huge(ish) upfront cost? You'd need to host the site and then rent or outright purchase a ton of GPUs so that your members don't complain about things taking forever to generate. And you would have to really lock things down so that no one can create illegal content in any way. That is a big liability.

3. Aside from bypassing restrictions or censorship, what are the main advantages of generating videos locally? Buying an appropriate GPU for your system would be at least one upfront cost to generating things locally. You could go big or step things up. I started with a much slower GPU and was happy as a clam until I found a deal on a 3060 and I've been happier still. That will pay for itself before long. Some out there might start with one site they pay for, then move to another or keep multiple accounts. That was me for a while when I was first interested in this. The biggest advantage is being able to produce as much as you want without running out of credits/tokens or whatever a membership on a site would cap you at before asking for more money or higher level memberships.

Bottom line, there's a magnitude more freedom doing things locally. And you're not locked down with things you can try or use. You are able to test and taste all kinds of models out there.
 
  • Like
Reactions: JavNoob
Hey, pal, no such thing as off topic on any threads I've started! :cool:

1. From a technical perspective, is it possible to extend or modify existing movie scenes using AI? - Yes, and that seems to get better as things progress. Originally the best way to was to extract the last frame of a generated video and use it as the source of the next video. But, that lead to image/video quality degradation and of course the AI had no idea what was going on before that last frame, so there would invariably be a jolt or sudden change of action regardless of the prompt being used. These days they have workflows that can use the last x number of frames from a video to give the AI an idea of the motion to guide it into producing a more seamless continuation, or as close as possible. And they now have some ways to combat the degradation somewhat.
2. I’ve noticed that AI videos can be generated either through APIs/web apps or locally on a personal machine.
Is there potential to earn money in this field, since hosted web-based solutions might be more affordable for users? Money to be made for hosting a site to produce videos, you mean? And charge for membership use? I'd say yes, but, I could only image (literally, because I don;t know and have never looked into it) there would be a huge(ish) upfront cost? You'd need to host the site and then rent or outright purchase a ton of GPUs so that your members don't complain about things taking forever to generate. And you would have to really lock things down so that no one can create illegal content in any way. That is a big liability.

3. Aside from bypassing restrictions or censorship, what are the main advantages of generating videos locally? Buying an appropriate GPU for your system would be at least one upfront cost to generating things locally. You could go big or step things up. I started with a much slower GPU and was happy as a clam until I found a deal on a 3060 and I've been happier still. That will pay for itself before long. Some out there might start with one site they pay for, then move to another or keep multiple accounts. That was me for a while when I was first interested in this. The biggest advantage is being able to produce as much as you want without running out of credits/tokens or whatever a membership on a site would cap you at before asking for more money or higher level memberships.

Bottom line, there's a magnitude more freedom doing things locally. And you're not locked down with things you can try or use. You are able to test and taste all kinds of models out there.
Thank you for your reply.

I apologize for not expressing myself clearly earlier. By “earning money,” I meant through freelancing. I believe that, in niche areas, there are significant opportunities.

Now, I’m convinced. I’ll start saving up to get that high-end PC, begin learning, and hopefully be able to contribute in the near future.
 
Thank you for your reply.

I apologize for not expressing myself clearly earlier. By “earning money,” I meant through freelancing. I believe that, in niche areas, there are significant opportunities.

Now, I’m convinced. I’ll start saving up to get that high-end PC, begin learning, and hopefully be able to contribute in the near future.
There can be opportunities, but eventually down the road that bubble may pop due to the exact thing we're discussing. Things will get cheaper, things will get easier, and most will start doing things locally, no need to pay for someone else to do things. Just an opinion, though, there's always a market for something out there! :cool:

At any rate, this sure is a fun hobby.