Net compute is quite a bit lower, but having super low latency between complex logic and SIMD makes up the delta and more. And yeah, AVX2 and other SIMD can do a lot of GPU-style operations in a shared 元 cache. And the bottle neck is more latency than bandwidth. HEVC has a lot more modes than H.264, so there's way more little branchy decisions to make per frame. One strong possibility is that using built-in encoder output could provide a similar fidelity of data, but faster and much simpler. The OpenCL preview uses a very simplified x264, nothing like what's in a -preset veryslow. Even though GPU has more compute available, the round trip latency between CPU and GPU means the GPU is often idle waiting for the CPU to process GPU output to make new instructions for the GPU. And modern codecs have more and more ways to do things, which means more and more very branchy logic with lots of early exits and heuristics. Latency is generally a bigger limitation than bandwidth for this scenario. Maybe - with faster PCIE Connections, the situation could be re-evaluated ? Also, AVX Extensions could do a lot of what the OPENCL part was doing (I hope I remember correctly, can´t find the thread at the moment.). One of the developers of x265 jumped in and mentioned that with x265 it wasn´t worth thinking about adding some GPU offloading, because the transfer speeds were too much of a bottleneck. This whole topic reminds of an older thread where Hardware-Acceleration of x264 and x265 was discussed. Although the processing power isn´t that great on internal GPUs, the often fast connection between CPU and GPU seemed to help.
![denoiser iii cannot be applied opencl denoiser iii cannot be applied opencl](https://i.imgur.com/G0Yz9Q7.png)
In my tests, it seemed like fast transfers from CPU to GPU helped in this regard - I noticed significant drops in speeds with slower PCI-E connections.įor smaller resolutions, iGPUs and APUs were a good combination. GCN 2 and above), the combination of GPU/CPU offered a lot of performance for the invested money. Especially with not so powerfull CPUs in combination with small AMD GPUs (which are fast in OPENCL.
#DENOISER III CANNOT BE APPLIED OPENCL FREE#
Just like others stated above, it can offload some of the CPU work to the GPU and free up ressources for filtering etc. I used the OPENCL Acceleration quite often in the past. Could it be an option to move motion estimation from OPENCL to DX12 (at least on windows)?