AI Video Generation - General Discussion, Tips, Tricks, Frustrations and Showcases

Casshern2 · Jun 10, 2026

I didn't want to hijack our friend's thread linked above and send readers here during "production", so I think it was best to complete the project and thread where I did, but, this should be the home for the process, I suppose.

TL;DR - If you check out the linked post above you'll see I thought I'd try my hand at at least something JAV related when @nomoremrniceguy posted the reply "Give it a try and show us what you came up with, I'm interested!" Truth be told, a day or two into the project I thought he was probably talking to the OP and not me, but I went with it. Had to come up with a simple/standard JAV genre. Nurses! Long story short, I created source images, made some title cards, found some free music and created the trailer with Z-Image Turbo, Wan 2.2, some Loras, video utilities (not editing software) and ComfyUi.

The Shot List

What I did was remember and re-watch a number of JAV trailers on Fanza. I got a feel for how long the scenes were before a cut, and there are a lot of cuts! I didn't want to go that route so I figured 2-5 second clips but tried to stay more at 3-4 seconds. I first found the soundtrack I liked (free after searching for "free Japanese pop music") and then started listing out from start to finish what I wanted to see. I wanted it to seem like a JAV title as well as trailer since that was the original ask, so I started with a familiar opening to many titles...clouds and a city scene, then made my way through the introduction of the premise having the title in mind and using the characters interacting in the hallway, then straight to the action scenes.

I thought of simple actions in the interest of time, didn't really try to refine anything. I named them all to keep track (assets). The prompts were quite simple, really. Attached are the corresponding prompts for Z-Image Turbo and Wan 2.2 videos. You'll see the length of each clip I was proposing in the tracking list image. I had to include enough shots and varied their length to equal the total runtime of the soundtrack, so that drove the breakdown. I started every shot as 5 seconds then randomly pared things down across all of them to arrive at the original soundtrack's runtime. But then the whole length fiasco I talked about in the thread, so I found another actually better soundtrack that fit the fun mood nicely.

The Image Generation

I generated approximate 190 images from start to finish with the attached prompts. There were plenty with mistakes, not what I thought would translate to video as good as others, and outfits that weren't the right color for at least some consistency if I wasn't shooting for character consistency. It makes a difference if a woman's face is closer or nearer to the man's penis. Too far away and the video generation will have her go for it well into the length selected. If she was closer the model would be able to get her to act quicker. Just compare Shot 17 to 15 and 16. Shot 17 her face is far away, so the model slowly brought her down to him. Sure, it ended up being a great slow shot, but 15 and 16 the starlet went right to it.

Z-Image Turbo likes to produce image from the left to right. Maybe they do, but for Shots 20 and 22 I decided to horizontally flip the original output to give some variance, otherwise it looked too artificial (ironically). Surely a real director wouldn't always shoot the exact same placement for a non-niche title, like a blowjob title that just had POV shots throughout.

Each image was generated at 1280x720 with the intent to have the video generate at 1024x576. I did upscale the image for Shot 07 so I could crop each actress centered in their own frame. I had to create another ComfyUI workflow to mask/remove hands and half faces, though. I did this based on the instructions on a YouTube video named:

"2025 ComfyUI Remove any object you want。step by step Tutorial ， You Can Definitely Learn 。"

The Video Generation

I generated 90 or so videos based on the shot list with the attached prompts. I assumed when I set a video for 2 seconds in length that's what I'd get, but it ended up being 2.x seconds, so by the time all my shots were done it was no longer totaling the exact runtime I had calculated, so I had to find another soundtrack. What I had decided was to find a tune that was close to a 2 minute runtime and then piece together the shots that would equal that runtime.

There were plenty of videos left on the cutting room floor. Some with bad anatomy once movement started and the model decided to bring in some interesting body parts, some of the actress not doing much or doing too little of what I wanted, OR acting like they didn't know what they were doing haha! (insert me reviewing clips thinking THAT's NOT how you suck a cock!

)

Since these generations take way longer than images, this is where the bulk of the time was spent. Set the prompts, let it generate 2-4 clips at a time, rinse and repeat. As I went along I would decide on the winning shot to add to the final shot list. If you noticed on the updates I didn't do it linear, i would skip around to the shots that were either short to get them done quickly or shots where I knew Wan would get things right relatively quickly since I've had my experience with it

Before long all the shots where done, just had to merge and mux to create the final video.

Editing without Editing Software

My plan was simple but ended up not too simple. Get a soundtrack of x length. Generate video clips that when combined would equal that same length, then use MKVMerge to combine all the clips together and include the soundtrack as the audio for the output. What could be more simple?

It didn't work. I had placed a delay for the audio before muxing the same length as the begining title card shot so that the soundtrack would start when the clouds started rolling by, but as I watched the video play after muxing was done the audio faded out TWELVE SECONDS before the video stopped. That's when it hit me about the 2.x, 3.x, 4.x second generations I hadn't anticipated. Increasing the delay was out of the question, then we'd have 12 seconds of silence at the begining. So I had to find the other soundtrack.

After solving that, I had to demux the created MKV to video/audio streams and use Yamb to mux back to MP4. (I know...I know...but I'd much rather use MKVMerge GUI to combine things, just easier for me.) Then I had another one or two views since it was just fun, and uploaded to MEGA so I could present it. PROBLEM! When I watched it in the browser the delay was missing so the soundrack stated as soon as the title card hit the screen. Good grief. After some research, it seemed it was just going to do that, the browser was not going to honor the delay easily or at all. So...

I found an mp3 of silence on the web, cut it to the length of the title card duration, then concatentated that to the front of the soundtrack mp3. I'm embarassed to say I had to use Google to figure that one out. I did the whole MKVMerge to Yamb process with the corrected audio and at last it was done and I was able to share it.

Conclusion

All in all the first images were created on 5/24, so it was 13 days from start to finish as I had time to work on it. It was definitely fun. Nowadays they have (arguably) better technology to use like LTX, lip sync, extended "unlimited" length videos, so it should be possible to make much longer shots/scenes. Theoretically one could create a 10-30 minute feature, if not longer, using better images and more refined and directional prompts if the model can adhere to what it is being asked to produce. At this point it is up to the imagination on the creation side and on the viewing side as to what constitutes a JAV feature and experience.

Tools used

Workflow:
ComfyUI

Image Model:
moodyProMix_zitV12DPO.safetensors (https://civitai.red/models/620406/moody-pro-mix)

Image Loras (one some images):
PornMaster_NSFW_ZIT_V1.safetensors (https://civitai.red/models/2608450/pornmaster-nsfw-z-image)

Video Model:
Wan2.2-I2V-A14B-HighNoise-Q4_K_M.gguf
Wan2.2-I2V-A14B-LowNoise-Q4_K_M.gguf
* replaced WanImageToVideo with PainterI2V in workflow

Video Loras:
Wan2_2_Lightning_high_noise_model.safetensors
Wan2_2_Lightning_low_noise_model.safetensors
DR34ML4Y_I2V_14B_HIGH_V2.safetensors (https://civitai.red/models/1811313/dr34ml4y-all-in-one-nsfw-wanltx2?modelVersionId=2553151)
DR34ML4Y_I2V_14B_LOW_V2.safetensors (https://civitai.red/models/1811313/dr34ml4y-all-in-one-nsfw-wanltx2?modelVersionId=2553271)

ding73ding · Jul 13, 2026

Amazing work you guys did, esp. @Casshern2

I was completely uninterested in AI gen content until suddenly, AI gen become so good that could give real content a stiff competition. ... So ... forgive me for not finely read two years of posts.

Eventually I want to make a full length (let's say 30 mins) uncensored porn with multiple scenes (let's say 3 scenes of 8-10 mins) plus character setup scenes.

ding73ding · Jul 13, 2026

Casshern2 said:
then use MKVMerge to combine all the clips together and include the soundtrack as the audio for the output. What could be more simple?

Why don't you use Kdenlive? It's very powerful and has features to solve all your editing problem.

AI also suggest Shortcut which may be a friendlier alternative for amateurs.

Casshern2 · Jul 14, 2026

ding73ding said:
Amazing work you guys did, esp. @Casshern2

Thanks, it was definitely fun to do!

ding73ding said:
Eventually I want to make a full length (let's say 30 mins) uncensored porn with multiple scenes (let's say 3 scenes of 8-10 mins) plus character setup scenes.

This may still be a year away unless you have some very good gear (or can find a friendly online platform) but it's going to be possible, I'm sure.

ding73ding said:
Why don't you use Kdenlive? It's very powerful and has features to solve all your editing problem.

AI also suggest Shortcut which may be a friendlier alternative for amateurs.

For my purpose that would have been a bit overkill. I was just trying to see if I could produce something passable, and ended up just doing a trailer-length with short cuts scenes. I agree, to get much better results it would good to use some type of editing, but, that would make sense (to me) only if you're starting with FHD (or higher) sources to then re-encode down to HD or FHD (respectively). By simply combining and re-muxing with sound I kept the original quality of the video clips.

ding73ding · Jul 16, 2026

Casshern2 said:
This may still be a year away unless you have some very good gear (or can find a friendly online platform) but it's going to be possible, I'm sure.

I am probably speaking out of turn, since I haven't even sat down to give it a try. But I feel it's possible to put together a full length film using short cuts (5-20 seconds) that current generators can put out. For script, and dialog I guess it's also doable now. Esp. script I plan to write myself, with AI assisted. The crazy thing is using AI translating and voice acting it will be easy to make many language versions for the same film. The music I feel is less important and less difficult. For sound effects is probably a real challenge.

The challenge, I imagine, for doing a full film in 2026 is:
- how to create a virtual actor, let's say a virtual Maria Ozawa, and a virtual Taka Kato and have them looking consistent and recognizable as I generate hundreds or thousands of cuts to be assembled into a complete film.
- same question for background sets, costumes and props.
- is there a workflow I can iterate over and over? Keeping the elements that is successful and keep refining the problem areas? From some YT videos it seems there's a lot of RNG factor and you just run the gen many times and fiddle with the prompts until you get a random success (because the user has not very specific requirement) or the user give up trying.

The question is: can I assemble some set of inputs (text prompt, images etc) that I can feed to the generator which would create a stable consistent output (a head: face, makeup, hair, even if accessories like glasses or decorative hair clips, a body: certain height/weight, skin tone, sexual organs, or a set: room with furniture)

Seems your CAI-001 (amazing work!) partly answered it, the girls are fairly consistent between cuts, and the set also stay reasonably consistent. (anyway as you were making a trailer, it's reasonable there are multiple sets and each one only need to appear for less than a minute)

About refining a cut, let me pick a specific example from your CAI-001, at 1:20 the white-dress nurse was fucked in the asshole. Is there a workflow that say, "make almost the same cut again, but the girl show a much strongly expression because her rectum is pumped by a huge cock!" Or "at the end of push-stroke, hold the cock still and the girl hips tremble slightly and she makes intense expression with her face and her hands"

...use some type of editing, but, that would make sense (to me) only if you're starting with FHD (or higher) sources to then re-encode down to HD or FHD (respectively). By simply combining and re-muxing with sound I kept the original quality of the video clips.

Yeah I understand, you started by "I'll mess around for an afternoon..." and end up with an out-of-control epic project.

But for the MKVMerge vs kdenlive, if I understand you correctly, you are avoiding re-encoding with a bit of obsession. But that concern is unnecessary, I have tons experience re-encoding videos in last few years. Yes a few years ago, re-encoding can make noticeable loss of quality even if you crank up the bit rate. But recent codecs are getting so good that just using default parameters will be fine.

The best thing about kdenlive is that actually it's not a standard GUI app (the way that MS Office is), when you working in KDenlive, you are putting together a text script that specify how to put together the final output from all the input resources (clips and sound files) so if you wish to be absolutist about original quality you can choose to render to uncompressed video, or high bitrate video, which will be painful to upload/download but hey where you dial the compromise between quality and convenience is in your hands.

Anyway I feel editing/assembling all the pieces is not worth debating here, there are many solutions to it, everyone can choose his own workflow. The real hard part is generating the cuts.

Casshern2 · Jul 16, 2026

ding73ding said:
I am probably speaking out of turn, since I haven't even sat down to give it a try. But I feel it's possible to put together a full length film using short cuts (5-20 seconds) that current generators can put out.

Absolutely! My comment was more about after my research realizing the average shot (cut) of a JAV or IV is at least 4x longer than the average 5-8 sec clips that are generated by standard models and their standard workflows (not counting the workflows that produce longer clips [at varying degrees of degradation]) But you can absolutely follow the Netflix cut scene cadence to create something. It may take longer to edit and finalize, but that's not the point.

For the challenges:

ding73ding said:
- how to create a virtual actor, let's say a virtual Maria Ozawa, and a virtual Taka Kato and have them looking consistent and recognizable as I generate hundreds or thousands of cuts to be assembled into a complete film.

That will be where Loras come in, but know next to nothing about training Loras and I certainly don't have a box that can do it. But that would be key for this!

ding73ding said:
- same question for background sets, costumes and props.

For the costume and prompts, I'd imagine Loras would be needed there as well, OR maybe look into Flux Klein workflows or something similar, where you can insert items (props) from one image into another and blend it into the new scene to look like it belongs there. Works with people as well. As for backgrounds, I'm just now playing with this kind of stuff to have like a single room, bed, couch where I can set different ladies into.

ding73ding said:
- is there a workflow I can iterate over and over? Keeping the elements that is successful and keep refining the problem areas?

Yeah, this is where Seeds come in, even if you keep the same and change the prompt slightly you're likely to get something other than you expect. maybe that's where Flux Klein-type things can help better.

ding73ding said:
The question is: can I assemble some set of inputs (text prompt, images etc) that I can feed to the generator which would create a stable consistent output (a head: face, makeup, hair, even if accessories like glasses or decorative hair clips, a body: certain height/weight, skin tone, sexual organs, or a set: room with furniture)

If there is a way right off the bat, I've not come across one, but anyone has only scratched the surface of what is out there documented. I hope it is! But if it isn't yet we will have to be like filmmaking mavericks and figure out how to achieve such things with what we already know or can learn to use. That's the fun of this, for me.

ding73ding said:
About refining a cut, let me pick a specific example from your CAI-001, at 1:20 the white-dress nurse

The great thing here (and what to remember) is that all of the scenes were made using a single Image to Video workflow with a prompt. For that specific example scene, I used a pretty simple and crude prompt: "Japanese nurse fucked by man, doggystyle, rough, insert cock into vagina repeatedly, quick, fast, she gasps". Now, many would say there is no need to point out she's Japanese or even a nurse, but that's just me. At any rate in the 4 gens for this scene she never once gasped and the action certainly wasn't quick or fast, but it came out okay. Point being, you can use the same image and keep changing the prompts until a good combo of prompt/seed gets what we're after.

You're right about editing, I should try those and see if the visual difference isn't as bad as it used to be.

ding73ding · Jul 16, 2026

Casshern2 said:
average shot (cut) of a JAV or IV is at least 4x longer than the average 5-8 sec clips that are generated by standard models

Right right... but:
- I believe the run length of the output clips is the top target of intensely R&D by the top AI companies. probably longer clips will be available sooner than I can get my shits together to start my own film project

- it's probably not necessary to emulate the long cuts (up to 30 seconds according to your research) to create a good porn film, anyway I always thought typical porn scenes are too long, the average JAV sex scene is over 20 minutes long, consisting of, say... about 20 sex acts each lasting between 1 to 5 minutes, which is way-too-slow pace for my taste, so even if we are limited to 5-8 s per cut I think I could be ok.

- video editing trick can probably double and triple the length of a decent clip of blowjob or piston-pumping in vagina, see oldie XXX American porn from the 1980s

That will be where Loras come in, but know next to nothing about training Loras and I certainly don't have a box that can do it. But that would be key for this!
...
Yeah, this is where Seeds come in,

Great! I will have to research these.

About a month ago, we saw a flood of meme videos out of China and Iran, each are a few minutes long, with recognizable characters appearing across multiple shots and multiple video (episodes). So there ought to be a solution in the current technology offering, but the question is how accessible those solutions are to us techno-peons.

"Japanese nurse fucked by man, doggystyle, rough, insert cock into vagina repeatedly, quick, fast, she gasps". ... At any rate in the 4 gens for this scene she never once gasped and it certainly wasn't quick or fast, but it came out okay.

Huh!? So yeah actually I wasn't sure it was anal penetration and right I see you specifically prompt for vaginal penetration, so that cut that looked to me like anal fuck was kind of a messed up output (most probably the AI was trained with lots of Western anal porn without tagging properly between vaginal and anal sex)

That last comment is quite a disappointment, it's hinting that these AI generative models are heavily tuned towards generic short cuts adult content (like the typical GIF ads for adult games and porn site) and would not respond a highly detailed prompt.

Specifically I would (wish to) write very long very detailed prompts to control (for example) how the tits hang and swing during (say) doggie style. I do realize it's a HUGE ask, both in terms of 2026 technology and of my own artistic ability to control all these details in a film project. But at least, let's keep the conversation and experimentation going.

I guess it makes economic sense, surely there are thousands (millions!?) of adult web page needing endless supply of adult clips and animation which need only 3-10 seconds run time.

Casshern2 · Jul 17, 2026

To your point (!) you're right, I'm thinking in terms of creating something that emulates a full-length 2hr JAV title, but the point is to make something for yourself, primarily. So, who better to say what the content will be and how it is created than yourself! You're the producer and directory, after all!

ding73ding said:
- video editing trick can probably double and triple the length of a decent clip of blowjob or piston-pumping in vagina, see oldie XXX American porn from the 1980s

Right! Like the same clip played again a little further down the timeline, or simple same angle but zoomed in. That's actually what I was planning to do with shared backgrounds, to make it seem like a different scene in the same space.

ding73ding said:
Huh!? So yeah actually I wasn't sure it was anal penetration and right I see you specifically prompt for vaginal penetration, so that cut that looked to me like anal fuck was kind of a messed up output (most probably the AI was trained with lots of Western anal porn without tagging properly between vaginal and anal sex)

Sorry, I should have been more specific, but in a way you're right. That scene was created using an image I generated with the Z-Image Turbo model (along with the rest of them). Here is that prompt, but as you'll see, it does not mention "anal" so the model did likely hallucinate and create that as probably a second deformed vagina he's sticking it in. Like my disclaimer for the trailer said "any anatomical anomalies were noticed but ignored in the interest of time"

Image Prompt:
"View from behind, This image is a doggy-style selfie scene. A Japanese nurse with short, dark hair and fair skin, white nurse uniform, kneeling on a hospital bed, her plump buttocks raised high, being penetrated from behind by a man patient. They are alone. She leans forward, faceless looking away from the viewer. the man's penis is clearly visible penetrating her vagina. her legs are spread; the overall pose is extremely explicit. The background is still a hospital room, Moody JAV Photography, daytime, out of focus background, (depth of field), (bright ambience), (bright environment), light saturation"

And the Wan prompt with the gasping was just a reused one the other doggy style clips, which is why it mentions vagina.

ding73ding said:
Specifically I would (wish to) write very long very detailed prompts to control (for example) how the tits hang and swing during (say) doggie style. I do realize it's a HUGE ask, both in terms of 2026 technology and of my own artistic ability to control all these details in a film project. But at least, let's keep the conversation and experimentation going.

You may be in luck again! Along with image Loras there are video Loras! I've used them before and seem to work well. As you can imagine they have different ones for different purposes, from blowjob to deepthroat to cum Loras. With those or even just very good prompting you can get pretty good results.

Here is an example I just did today running a number of gens of the same prompt/different seeds to see what comes out.

Image Model: gonzalomoXLFluxPony_v40UnityXLDMD
Prompt:
Bright (indoor) white-walled livingroom, curved POV, wide angle lense, eye-level fisheye view, (bright ambience), (bright environment), light saturation, (depth of field), 8k photo, HDR, professional lighting, taken with Canon EOS R5, 75mm lens, casual photo, white living room, silver and gold decorations, very realistic (beautiful) stunning, elegant (aged:1.8) shy Japanese 50 year old woman with beautiful pale skin and (medium) (black) twin-tails hair, she wears a ((deep {red|white|blue|green|yellow|orange|black})) leather strap, sexy panties with garter belts, stockings, her body is motherly with ((enormous:1.2)) perky breasts, wide hips, (pale:0.8) areolae blend into her breasts, bushy pubic hair, ((thick thighs)), rural snow draped mountains, she is (sitting on a white couch:1.2), (bokeh:1.4), she is wet
Result:

Now the video, and this was seeing how many things I can have her do in just the 4 sec runtime. Even with that many things to do, it somehow still tends to slow motion things a bit

Prompt:
Amateur handheld video of a Japanese woman sitting on a couch in sunlit room. She sits back further into the couch. She adjusts her body up straight. She pulls her legs up and holds them around her knees, we can see her pointed black high heels. She laughs. She spreads her legs and puckers her lips to blow a kiss at the viewer.
Result: https://mega.nz/embed/rd1SFSAQ#2wy8QW0JV52DavevdSfjc39r6ogwuTaoJhKHShl8s5I

And another that came out decent I suppose:

Video Prompt:
Amateur (handheld video) of a Japanese woman sitting on a couch in sunlit room. She shakes her upper body causing her breasts to quickly shift left and right repeatedly. She sits back further into the couch. She adjusts her body up straight. She pulls her legs up spreads them apart, we can see her pointed black high heels. She lowers her legs and leans forward closer to the camera and sticks her tongue out at the viewer.
Result: https://mega.nz/embed/XccjjRCS#JdwfEV4cKBtsfJh32n0hkwr0QqjtngDP5g30-YoqNbw

So there may be hope for saggy swinging tits during doggy style sex.

Casshern2 · Jul 19, 2026

Another great tool that I'd like to get into is using Qwen to take either a person or a background and produce a different angle of them. That would be key to providing some consistency within and between scenes.

Casshern2 · Thursday at 10:34 PM

I'm experimenting with a Flux 2 Klein variant instead of Qwen. Interesting and promising results so far! So it's likely possible to create a virtual set, so to speak, by using a background and having the model produce different viewing angles from it.

https://www.akiba-online.com/threads/photorealistic-ai-generated-images.2123060/page-52#post-4991337

Search

Search

AI Video Generation - General Discussion, Tips, Tricks, Frustrations and Showcases

Casshern2

Senior Member...I think

Attachments

ding73ding

Akiba Citizen

ding73ding

Akiba Citizen

Casshern2

Senior Member...I think

ding73ding

Akiba Citizen

Casshern2

Senior Member...I think

ding73ding

Akiba Citizen

Casshern2

Senior Member...I think

Casshern2

Senior Member...I think

Casshern2

Senior Member...I think

Similar threads