Nvidia saves desktop frame to cache

1/31/2024

Be conscious of which asynchronous compute and graphics workloads can be scheduled together - use fences to pair up the right workloads.Even for compute tasks that can in theory run in parallel with graphics tasks, the actual scheduling details of the parallel work on the GPU may not generate the results you hope for.Check carefully if the use of a separate compute command queues really is advantageous.This allows bundles to be reused with less overhead as it facilitates more thoroughly cooked bundles.Use bundle resource binding inheritance sparsely.Reuse fragments recorded in bundles if you can.Try to bundle those CLs into 5-10 ExecuteCommandLists() calls per frame. Try to aim at a reasonable number of command lists in the range of 15-30 or below.Fences force the splitting of command lists for various reasons ( multiple command queues, picking up the results of queries).You still need a reasonable number of command lists for efficient parallel work submission.Be aware of the fact that there is a cost associated with setup and reset of a command list.Command lists are not free threaded so parallel work submission means submitting to multiple command lists.Recording commands is a CPU intensive operation and no driver threads come to the rescue.Submit work in parallel and evenly across several threads/cores to multiple command lists.Calls to ExecuteCommadList() finally do start work on the GPU.Submitting work to command lists doesn’t start any work on the GPU.Accept the fact that you are responsible for achieving and controlling GPU/CPU parallelism.Work Submission – Command Lists & Bundles Do’s The more efficiently one can use parallel hardware cores of the CPU to submit work in parallel, the more benefit in terms of draw call submission performance can be expected. While the total cost of work submission in DX12 has been reduced, the amount of work measured on the application’s thread may be larger due to the loss of driver threading.On DX11 the driver does farm off asynchronous tasks to driver worker threads where possible – this doesn’t happen anymore under DX12.Don’t rely on the driver to parallelize any Direct3D12 works in driver threads.The app has to replace driver reasoning about how to most efficiently drive the underlying hardware.Expect to maintain separate render paths for each IHV minimum.The idea is to get the worker threads generate command lists and for the master thread to pick those up and submit them.Consider a ‘Master Render Thread’ for work submission with a couple of ‘Worker Threads’ for command list recording, resource creation and PSO ‘Pipeline Stata Object’ (PSO) compilation.This way you may achieve sufficient parallelism in terms of draw submission whilst making sure that resource and command queue dependencies get respected.Prefer a tasks graph architecture for parallel draw submission.

0 Comments

Nvidia saves desktop frame to cache

Leave a Reply.

Author

Archives

Categories