meguIV: The Official Akiba-Online DVD Encoder (v1.0.1.1)

Rollyco

Team Tomoe
Oct 4, 2007
3,562
34
Interesting, how much % speedup do you see for +1 core? For +2?
 

Vitreous

°
Former Staff
Sep 13, 2009
2,033
591
Interesting, how much % speedup do you see for +1 core? For +2?
Hard to be precise as it depends on machine and current running temp, but roughly 10-12% for +1 and another 2-3% extra for +2

First tried this when I noticed (in Speedfan) that my CPUs were not quite at full load during the TGMC pass (they are in the x264 pass). I figured adding another thread might improve things. Odd thing is that I have a vague recollection that my CPUs used to run at full load during TGMC several months ago. However, using a vanilla MeguIV produces same result, so that kinda rules out the recent software changes. Maybe I'm dreaming.

Do you get any speed-up? And do you have any dual-core machines to test on?
 

isityours

People don't dance no mo'
Sep 27, 2008
2,886
4,135
i think you mentioned Vit that you have an i7 too (i have the 860). 0.~ versions of meguIV used to run my cores at 100% for the full duration of TGMC and just 1 or 2 percent under 100 for 264 pass but from 1.0~ versions this no longer happens. it fluctuates between 95 and 98% during TGMC and between 90 and 95% during 264. (probably not a huge help but mentioning it cant hurt) (Win 7 Ult x64).
 

astrayred

Member
Mar 19, 2008
158
16
Vitreous: The author of NNEDI2/3 recently mentioned that the latest version of 3 is closer in speed as compared to NNEDI2. I did a quick hack of your QTGMC to support NNEDI3 and this seems to be so. Probably about 5% slower. Any plans to officially support NNEDI3 in future versions?
 

Vitreous

°
Former Staff
Sep 13, 2009
2,033
591
probably not a huge help but mentioning it cant hurt
Actually, this helps a lot: I think this is about disk access. MeguIV 1.x introduced the 1st pass lossless encoding, outputing a huge intermediate file, whereas MeguIV 0.x was more cpu-bound. I think that is why we have seen a drop from 100% CPU. Because of my particular configuration all my machines have somewhat slower than typical disk access. That explains why I get such a good speedup from more threads (give the cpu more to do, while it waits for the disk buffers). However, from your numbers it seems that you could benefit somewhat from one extra thread.

So for me, threads=cores+2 is a good call. Whereas for most, I suspect threads=cores+1 is a better choice. Rollyco, have you done any testing?

EDIT: Actually, I wonder if we should still being doing two passes? I know for sure that one pass is much faster overall on the higher speed presets...
EDIT2: Well for my atypical machines, one-pass is marginally faster at "Slow" (30 or 60fps). But lose the benefit of having an intermediate output, which I use to make a quick 30fps rip.
BTW, has anyone else noticed that with the recent changes (QTGMC, new Masktools) 60fps rips are faster to make - closer to the time of a 30fps rip?

Any plans to officially support NNEDI3 in future versions?
Coincidence - I was thinking much the same thing. I tried NNEDI3 a while ago and it was a lot slower, so good to hear that it's getting better. In fact the current version of QTGMC already supports any interpolator using the EdiExt parameter:
Code:
QuickTGMC( Preset="Slow", EdiExt=NNEDI3( field=-2 ) )
 

Rollyco

Team Tomoe
Oct 4, 2007
3,562
34
Rollyco, have you done any testing?
None yet, I have run into a problem with dropped and out-of-order frames and I'm trying to solve that first. I have a dual-core, btw.
 

Vitreous

°
Former Staff
Sep 13, 2009
2,033
591
None yet, I have run into a problem with dropped and out-of-order frames and I'm trying to solve that first.
That sounds weird - is that from an ordinary deinterlacing script? Field parity might explain out-of-order, but how would that get wrong...
 

shadeofgray

Active Member
Sep 22, 2009
316
242
However, I don't have anything with less than 4 cores, so I'm not sure if these observations hold true in that case...

I have Core 2 Duo and both of them are in full swing with 2 threads during (Q)TGMC so I don't expect any major speed boost. I did a few quick tests with adding additional thread(s) and the benefit is marginal ~1.5%
 

Rollyco

Team Tomoe
Oct 4, 2007
3,562
34
I have 2 cores as well with full CPU utilization @ 2 threads. Of course that doesn't mean it's encoding faster, you have to look at the FPS value for a true measure of that.
 

shadeofgray

Active Member
Sep 22, 2009
316
242
I understand that. The benefit of ~1.5% I mentioned is the decrease in time for the 1st pass in meguIV with adding 3rd thread, I didn't notice any additional decrease with adding 4th.

These were just with shorter clips. I'll try and do some proper tests over weekend.
 

Vitreous

°
Former Staff
Sep 13, 2009
2,033
591
Got to watch readings on short clips. Run a lead-in rip of the same material to get the comp up to temperature and to prime the disk cache.

As long as threads = cores+1 doesn't slow anyone down then I don't see why not. The extra speed we're looking at now has been built with lots of 1.5%'s here and there. And I don't say that because my machine needs the extra threads more than most (I have to up it to cores+2 anyway) :cry:
 

shadeofgray

Active Member
Sep 22, 2009
316
242
As you point out, every little bit helps in the long run, but it doesn't bode well for us with somewhat slower machines.

I had one DVD in the works for the weekend and I ''played'' with it for the past 2 days. It's not particularly long, just 53min, and I used QTGMC on Slow preset. When I tried with 3 threads, it ended up being slower. It starts OK being a little faster, but than FPS slowly starts to drop - half way through it was 11% slower, and by the end it was 14% slower. I tried one more time with 3 threads, it followed the same trend.
 

Vitreous

°
Former Staff
Sep 13, 2009
2,033
591
I had one DVD in the works for the weekend and I ''played'' with it for the past 2 days. It's not particularly long, just 53min, and I used QTGMC on Slow preset. When I tried with 3 threads, it ended up being slower. It starts OK being a little faster, but than FPS slowly starts to drop - half way through it was 11% slower, and by the end it was 14% slower. I tried one more time with 3 threads, it followed the same trend.
I see - given that behaviour, I think the default should remain at 1 thread to 1 processor. One of the reasons I wrote QTGMC was make things easier for low spec machines.

Actually, I've been playing with editing the meGUI interface - this kind of thing wants a slider...!
 

Vitreous

°
Former Staff
Sep 13, 2009
2,033
591
Updated QuickTGMC to version 2.4

Main addition is lossless mode, which I've taken much further than the original to produce a very interesting feature. When using TGMC the lines of pixels from the original material are not reproduced exactly in the result - they are smoothed and blended a little for the most stable output. The idea of lossless is to ensure that the original source material lines are retained unchanged in the output, only new "in-between" lines are inserted. This creates output much more faithful to your source material, at the expense of some minor artefacts. This approach will also faithfully reproduce any noise from the source, which can be an unpleasant surprise. I've added further "fake" lossless modes, with far fewer artefacts - they also bring the result closer to the source material, but they aren't exactly lossless. You can also use noise bypass in conjunction for another "fake" approach to remove the source noise problem.

Lossless output is good for clean, stable, detailed material - I've used in with great success on one of my benchmarks - the first 20 seconds of Valensiya S 01 [Junior material]. This is a nightmare rip, with detail that vanilla TGMC just can't reproduce faithfully. QuickTGMC with lossless and noise bypass combined produces a result barely distinguishable from the original. I'll post it with some more detail in the JI forums sometime.

I've also added built-in support for NNEDI3 and EEDI3. They are not used as defaults yet as they're still too slow.

Finally, some tweaks to the preset quality/speed. In particular, Medium is rather nicer now.

Note: This script is pretty huge now! It should only affect the initial parse time, but I'm a bit worried about it. I've retested for speed, and I'm getting roughly the same timings as V2.1, which is what I'd hope for. If anyone gets any other results, tell me...
 

astrayred

Member
Mar 19, 2008
158
16
Hi Vit, it's me again!

Firstly, thanks for your continual work on this great script!

I currently using it now. At Preset=Slow, I'm getting 10fps instead of 12 as I did with 2.3. Were the presets changed again? :p
 

Vitreous

°
Former Staff
Sep 13, 2009
2,033
591
At Preset=Slow, I'm getting 10fps instead of 12 as I did with 2.3. Were the presets changed again?
Hmm... I only made deliberate changes to Placebo, Medium, Fast and Very Fast. I deliberately tried to keep Slow and Slower the same as they're the common ones. But this update was fairly major and so I may have affected something. I'll double check. Thanks for telling me...
 

astrayred

Member
Mar 19, 2008
158
16
Hmm... I only made deliberate changes to Placebo, Medium, Fast and Very Fast. I deliberately tried to keep Slow and Slower the same as they're the common ones. But this update was fairly major and so I may have affected something. I'll double check. Thanks for telling me...

I did a recheck and the speeds appear to be the same now. My computer has been acting weird lately. Don't mind me :bow-pray:
 

Vitreous

°
Former Staff
Sep 13, 2009
2,033
591
I did a recheck and the speeds appear to be the same now. My computer has been acting weird lately. Don't mind me
I also did a barrage of tests and can detect no speed difference. So all is good again... :tea:
________

Returning to a point you raised earlier. I tried MSU's Frame Rate Converter for frame rate doubling and I was pretty disappointed. It's certainly no better than MVTools. So I wrote a small script (InterpolateFPSx2) to do what I mentioned earlier - detect the areas where the motion interpolation is poor and blur those areas instead. It does produce improved results.

Here are some samples (42Mb, pw=glass-like): http://hotfile.com/dl/59996187/96c9b87/IF2.rar.html
It's a short scene with a particularly difficult motion. There are four rips, MVTools, the MSU frame rate converter, and the two methods from my own script. MVTools and MSU both fail in different ways - MSU is blocky, MVTools is distorted.

My script uses MVTools as a starting point, then has two methods to deal with problem areas:
AltMethod=0 replaces the problem areas with a blend of the two neighboring frames - this gives the cleanest result, but tends to look more like single frame rate again.
AltMethod=1 (the default) blurs the problem areas - this hides the distorted areas much better so they are less noticable during playback - but they are still noticable paused.
There is also a BlendThresh parameter which helps decide which areas are considered the problem areas. The default settings are usually OK. As usual the full details are in the script comments.

I used this script for the first 30mins of this rip (60fps version) [Junior material]

Script Code:
[HIDE]
Save as InterpolateFPSx2.avsi in your plugins folder
Requires MVTools v2 plugin
Code:
#------------------#
# InterpolateFPSx2 #
#------------------#
#
# Double framerate using a combination of MVTools and simple frame blending / blurring
# Uses MVTools SAD mask to determine where to blend/blur instead of interpolate
#
# Parameters:
# 	BlendThresh (>=0)  : Threshold above which blending/blurring is used. Decrease to blend more, increase to interpolate more. Default = 36
#	AltMethod   (0,1)  : Method to use instead of interpolation when motion predition is poor: 0 = blend neighbor frames, 1 = gauss-blur the interpolate

function InterpolateFPSx2( clip Input, int "BlendThresh", int "AltMethod" )
{
	BlendThresh = default( BlendThresh, 36 )
	AltMethod   = default( AltMethod, 1 )
	
	_blocksize = 16
	_search = 3
	_searchParam = 4

	w = Input.Width()
	h = Input.Height() 
	epsilon = 0.0001
	
	filter = Input.GaussResize( w,h, 0,0, w+epsilon,h+epsilon, p=2 )
	superFilter = filter.MSuper( pel=2 )
	bVec = MAnalyse( superFilter, isb=true, blksize=_blocksize, overlap=_blocksize/2, search=_search, searchparam=_searchParam )
	fVec = MAnalyse( superFilter, isb=false, blksize=_blocksize, overlap=_blocksize/2, search=_search, searchparam=_searchParam )

	super = Input.MSuper( pel=2 )
	interpolated = Input.MFlowFps( super, bVec, fVec, num=0 ).SelectOdd()
	
	backupPlan = (AltMethod == 0) ? mt_average( Input, Input.trim( 1,0 ) + Input.trim( Input.Framecount()-1,0 ), U=3,V=3 ) : \
	                                interpolated.GaussResize( w,h, 0,0, w+epsilon,h+epsilon, p=5 )

	badMask = MMask( Input, bVec, kind=1, ml=BlendThresh, gamma=2.5 )
	newFrames = mt_merge( interpolated, backupPlan, badMask, luma=true ) 

	return Interleave( Input, newFrames )
}
[/HIDE]
 

astrayred

Member
Mar 19, 2008
158
16
I also did a barrage of tests and can detect no speed difference. So all is good again... :tea:
________

Returning to a point you raised earlier. I tried MSU's Frame Rate Converter for frame rate doubling and I was pretty disappointed. It's certainly no better than MVTools. So I wrote a small script (InterpolateFPSx2) to do what I mentioned earlier - detect the areas where the motion interpolation is poor and blur those areas instead. It does produce improved results.

Here are some samples (42Mb, pw=glass-like): http://hotfile.com/dl/59996187/96c9b87/IF2.rar.html
It's a short scene with a particularly difficult motion. There are four rips, MVTools, the MSU frame rate converter, and the two methods from my own script. MVTools and MSU both fail in different ways - MSU is blocky, MVTools is distorted.

My script uses MVTools as a starting point, then has two methods to deal with problem areas:
AltMethod=0 replaces the problem areas with a blend of the two neighboring frames - this gives the cleanest result, but tends to look more like single frame rate again.
AltMethod=1 (the default) blurs the problem areas - this hides the distorted areas much better so they are less noticable during playback - but they are still noticable paused.
There is also a BlendThresh parameter which helps decide which areas are considered the problem areas. The default settings are usually OK. As usual the full details are in the script comments.

I used this script for the first 30mins of this rip (60fps version) [Junior material]

Script Code:
[HIDE]
Save as InterpolateFPSx2.avsi in your plugins folder
Requires MVTools v2 plugin
Code:
#------------------#
# InterpolateFPSx2 #
#------------------#
#
# Double framerate using a combination of MVTools and simple frame blending / blurring
# Uses MVTools SAD mask to determine where to blend/blur instead of interpolate
#
# Parameters:
# 	BlendThresh (>=0)  : Threshold above which blending/blurring is used. Decrease to blend more, increase to interpolate more. Default = 36
#	AltMethod   (0,1)  : Method to use instead of interpolation when motion predition is poor: 0 = blend neighbor frames, 1 = gauss-blur the interpolate

function InterpolateFPSx2( clip Input, int "BlendThresh", int "AltMethod" )
{
	BlendThresh = default( BlendThresh, 36 )
	AltMethod   = default( AltMethod, 1 )
	
	_blocksize = 16
	_search = 3
	_searchParam = 4

	w = Input.Width()
	h = Input.Height() 
	epsilon = 0.0001
	
	filter = Input.GaussResize( w,h, 0,0, w+epsilon,h+epsilon, p=2 )
	superFilter = filter.MSuper( pel=2 )
	bVec = MAnalyse( superFilter, isb=true, blksize=_blocksize, overlap=_blocksize/2, search=_search, searchparam=_searchParam )
	fVec = MAnalyse( superFilter, isb=false, blksize=_blocksize, overlap=_blocksize/2, search=_search, searchparam=_searchParam )

	super = Input.MSuper( pel=2 )
	interpolated = Input.MFlowFps( super, bVec, fVec, num=0 ).SelectOdd()
	
	backupPlan = (AltMethod == 0) ? mt_average( Input, Input.trim( 1,0 ) + Input.trim( Input.Framecount()-1,0 ), U=3,V=3 ) : \
	                                interpolated.GaussResize( w,h, 0,0, w+epsilon,h+epsilon, p=5 )

	badMask = MMask( Input, bVec, kind=1, ml=BlendThresh, gamma=2.5 )
	newFrames = mt_merge( interpolated, backupPlan, badMask, luma=true ) 

	return Interleave( Input, newFrames )
}
[/HIDE]

That's great! I hope this test this out soon. My PC has been rebooting randomly recently. The cause has been very hard to narrow down.

As for the MSU plugin, I guess the problem is probably that I didn't test it on "difficult" sources. Just as well, since it doesn't really like being multithreaded. :/

BTW I see that you have posted QTGMC in its ah... Home forum. Should be interesting to see what the author says about it :D

EDIT: Just downloaded your samples. You certainly chose a difficult scene to test the plugins on! I like method 1 the most personally. Better than both MSU and MVTools.
 

shadeofgray

Active Member
Sep 22, 2009
316
242
Noise Bypass feature
[...]

I was testing something and thought of trying Noise Bypass - way out of my league, but wanted to see how it looks. Any reason why FFT3DFilter (and FFTW3) are not mentioned in the script or post as requirements?

I was ''informed'' by MeGUI. :thief: