完善资料让更多小伙伴认识你,还能领取20积分哦, 立即完善>
你好。
我对NVidia开发人员提出了问题和建议。 关于GPU调度程序的真正功能的信息很少。 调度程序只是简单的循环法吗? 它是可编程的吗? 它是从dom0编程的(例如,Dom0中的vgpu / libnvidia-vgpu进程)? 十多年来,有更复杂的调度程序。 如果你查看网络硬件,你可以看到更多高级调度程序(https://en.wikipedia.org/wiki/Network_scheduler)。 由于NVidia背景基于Sun Microsystems,因此SunOS / Solaris中有更复杂的处理器调度程序示例。 SunOS / Solaris公平共享调度程序(FSS)(实现共享,包括分层共享(区域/项目))和动态池(实现封顶和固定/绑定)的组合非常强大,并且易于实现并且几乎可以展示其功能 20年。 GPU调度程序可以更复杂吗? 如果是的话,还有更多实际目标: - 如果共享是可编程的,则应该删除“在一个物理gpu中一个类型的所有vgpu(例如k120q)”的限制! - 如果共享的分层可编程性比CUDA可用,则所有vGPU类型都应该可用! - 如果调度程序具有固定/绑定功能(对SMX),则由于较少的指令和数据缓存未命中,性能应该提升! - 如果调度程序(可能是非分层的)可以移动到domU for Grid2.0“完整”配置文件M6-8Q和M60-8Q可以删除dom0的开销并在domU中启用CUDA而不是相同的功能应该可用于k180q和k280q (是的,我仍然乐观地认为NVidia总部允许将此功能向后移植到K1 / K2网格)! GPU调度程序是否有任何可观察性API(性能监视器API)(每个vGPU(在Dom0中)和vGPU内部的每个进程(在DomU中))? (https://gridforums.nvidia.com/de ... utilization-per-vm/) 谢谢你的回答,马丁 以上来自于谷歌翻译 以下为原文 Hello. I have questions and proposal to NVidia developers. There are few information about true function of GPU scheduler. Is the scheduler only simple round-robin ? Is it programmable ? Is it programmed from dom0 (eg. vgpu/libnvidia-vgpu process in Dom0) ? There are more sophisticated schedulers for more then decade. If you look in network hardware you can see many more advanced schedulers (https://en.wikipedia.org/wiki/Network_scheduler). Because NVidia background is based on Sun Microsystems there is more sophisticated example of processor scheduler in SunOS/Solaris. The SunOS/Solaris combination of Fair Share Scheduler (FSS) (implements sharing, including hierarchical shares (zones/projects)) and dynamic pools (implements capping and pinning/binding) is VERY powerful and also simple to implement and demonstrating its power for nearly 20 years. Can the GPU scheduler be more sophisticated ? If yes, there are more practical goals: - If the share is programmable than the restriction about "all vgpu of one type (for example k120q) in one physical gpu" should be removed ! - If the share is hierarchically programmable than the CUDA in all vGPU types should be available ! - If the scheduler have pinning/binding capability (to SMX) than the performance should be boosted due to less instruction and data cache misses ! - If the scheduler (probably non hierarchical) can be moved to domU for Grid2.0 "full" profiles M6-8Q and M60-8Q that remove overhead of dom0 and enable CUDA in domU than the same feature should be available for k180q and k280q (yes, I am still optimistic that NVidia HQ allows to backport this feature and more to K1/K2 grid) ! Is there any observability API (performance monitor API) for GPU scheduler (per vGPU (in Dom0) and per processes inside vGPU (in DomU)) ? ( https://gridforums.nvidia.com/de ... utilization-per-vm/ ) Thanks for answers, Martin |
|
相关推荐
9个回答
|
|
嗨,MArtin,
我认为可以解除对同质(所有相同)vGPU类型的限制,但它有点像我头脑中的普通可编程阵列,固定大小意味着可以有效地完成许多事情。 我认为还需要避免记忆碎片,特别是当GPU被重新分配时(我想到vMotion和类似的可能的那一天)将是一个考虑因素。 确保进行常规和持续测试,质量保证和回归测试的需要会带来一些限制。 BAck移植总是需要投资额外的质量保证,不仅测试我们,还测试OEM测试实验室。 各种各样的事情是可能的,但我们必须保持质量和可靠性。 可以固定和封装CPU,但我自己的经验非常混杂,尤其是CAD / 3D应用程序 - 反向固定PTC Creo实际上提高了性能,直观的钉扎降低了它,因为一些非常严重的半光谱行为iirc。 太多配置选项通常意味着用户会陷入困境。 我不是这方面的专家 - 我希望有人会出现这种情况。 尽管我们需要知道用户故事/业务案例是什么,但我们需要了解每个功能请求....为什么你需要混合vGPU类型和证据,它值得在测试矩阵等方面进行大量扩展...... 最好的祝愿, 雷切尔 以上来自于谷歌翻译 以下为原文 Hi MArtin, The restriction on homogenous (all the same) vGPU types could I guess be lifted however it's a bit like normal programmable arrays in my head, that a fixed size means many things can be done efficiently. I think also the need to avoid memmory fragmentation particularly as GPUs reassigned (I'm think of the day when vMotion and similar is possible) would be a consideration. Some restrictions are imposed by the need to ensure cotinual and ongoing testing, QA and regression testing. BAck porting always requires investment in extra QA and test for not just us but also the OEMs test labs. All sorts of things are possible but we must maintain quality and reliability. It is possible to pin and cap CPUs but my own experiences have been extremely mixed particularly with CAD/3D applications - reverse pinning PTC Creo actually improved performance and the intuitive pinning degraded it because of some very stragne semophore behaviour iirc. Too many configuration options can often mean users get themselves in a real muddle. I'm not an expert in this area - I'm hoping someone who is will pop along. With every feature request though we need to know what the user story/business case is.... why you _need_ to mix vGPU types and evidence it's worth a substantial expansion in the test matrix etc... Best wishes, Rachel |
|
|
|
vGPU启动的“广度优先”分配机制对于性能是最佳的,但是第一次分配确定整个GPU的vGPU配置文件并且它不可移动。
例如,在K1上启动新的4x k120q,而下一个新的k160q是不可启动的,旧的k120q是不可移动的。 是的,还有“深度优先”,但它对一个GPU上共享的4x k120q的性能有影响。 这导致这五个VM / VDI示例的用户体验(用户体验,今年的NVidia流行语)较低。 最好的问候,M.C> 以上来自于谷歌翻译 以下为原文 There is "breadth-first" allocation mechanism for vGPU startup that is optimal for performance but first allocation determine vGPU profile for whole GPU and it is not movable. For example start new 4x k120q on K1 and next new k160q is unstartable and old k120q are unmovable. Yes, there is also "depth-first" but it has impact on performance for shared 4x k120q on one GPU. This leads to lower UX (user-experience, NVidia buzzword for this year) for this five VM/VDI example. Best regards, M.C> |
|
|
|
嗨马丁,
广度和深度分配是由XenServer / XenCenter和VMware中的等效功能实现的功能。 我想知道你是否真的需要更多的管理工具控制。 我仍然有点担心这会大大扩展QA矩阵; 很多用户都有足够的用户或类似的应用程序,他们可以很容易地汇集。 我没有听到很多人告诉我每个pGPU拥有同质虚拟机是个大问题... 最好的祝愿, 雷切尔 以上来自于谷歌翻译 以下为原文 Hi Martin, The breadth and depth allocations are functionality implemented by XenServer/XenCenter and by the equivalent in VMware. I'm wondering if you really need more control in the management tools. I'm still somewhat wary that this could expand the QA matrix substantially; a lot of users have sufficient users or similar apps that they can pool easily. I haven't heard a large number of people telling me that having homogenous VMs per pGPU is a big issue... Best wishes, Rachel |
|
|
|
嗨,MArtin,
我和Citrix的产品管理团队谈了一句话,虽然他们可能会调整分发,但它仍然只是一天的开始。 实际上,他们认为VMotion / XenMotion是一种可以根据需要平衡负载的前进方式(这是Citrix / VMware和NVIDIA都希望长期实现的目标)。 最好的祝愿, 雷切尔 以上来自于谷歌翻译 以下为原文 Hi MArtin, I had a word with the product management team at Citrix and whilst they could possibly tweak the distribution it would still just be start of day. Really long goal they feel VMotion/XenMotion is the way forward that would balance load as needed (this is something both Citrix/VMware and NVIDIA are keen to achieve long term). Best wishes, Rachel |
|
|
|
网格5.0
Pascal芯片的新“QoS调度程序”: 我不知道Pascal的这个“QoS调度程序”是否只是营销品牌愚蠢的“固定/平等共享调度程序”。 “... Pascal有一个名为Preemption的新硬件功能,允许在vGPU配置文件上进行计算.Preemption是一个允许任务上下文切换的功能。它使GPU能够基本上暂停和恢复任务......” - 请参阅https://gridforums.nvidia.com/default/topic/1604/nvidia-grid-vgpu/compute-mode-quot-prohibited-quot-grid-m60-/post/5161/#5161 - 请参阅http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/10 - 在http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_1080_Whitepaper_FINAL.pdf中搜索“preemtion” - 在http://on-demand.gputechconf.com/gtc/2016/presentation/s6810-swapna-matwankar-optimizing-application-performance-cuda-tools.pdf中搜索“preemtion”。 - cuDeviceGetAttribute() - CU_DEVICE_ATTRIBUTE_COMPUTE_PREEMPTION_SUPPORTED - https://devtalk.nvidia.com/default/topic/1023524/system-management-and-monitoring-nvml-/-vgpu-management-qos-api-/ - docs https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#changing-vgpu-scheduling-policy - 但计算抢占不会作为程序员可见控件公开! 现在很明显,NVidia重新发现了轮子 - 帕斯卡芯片中的“先发制人”。 欢迎来到1964年! (参见https://en.wikipedia.org/wiki/Computer_multitasking#Preemptive_multitasking)。 本公开解释了vGPU和CUDA在先前芯片世代中的所有缺陷,即vGPU半虚拟化驱动程序无法强制切换SMX / SMM上下文,并且重度依赖于客户驱动程序协作多任务(受FRL限制)和客户操作系统。 NVidia令人难以置信,羞耻,羞耻,羞耻! 现在,所有“GRID P * - * Q”配置文件都启用了CUDA。 新的“可观察性”: 每个进程利用率API(可用于> = r375),最终公开的函数为nvmlDeviceGetProcessUtilization()和nvmlDeviceGetVgpuProcessUtilization()(请参阅https://devtalk.nvidia.com/default/topic/934756/system-management-and-monitoring-nvml- /每进程统计,NVIDIA-SMI-pmon- /)。 让我们再等几年,在SMX / SMM / SMP上固定/绑定要缓存有效,在GPU上混合vGPU配置文件...... 以上来自于谷歌翻译 以下为原文 Grid 5.0 New "QoS scheduler" for Pascal chips: I do not known if this "QoS scheduler" for Pascal is only marketing branded stupid "fixed/equal share scheduler". "... Pascal has a new hardware feature called Preemption that allows Compute on vGPU profiles. Preemption is a feature that allows task Context switching. It gives the GPU the ability to essentially pause and resume a task ..." - see https://gridforums.nvidia.com/default/topic/1604/nvidia-grid-vgpu/compute-mode-quot-prohibited-quot-grid-m60-/post/5161/#5161 - see http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/10 - search for "preemtion" in http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_1080_Whitepaper_FINAL.pdf - search for "preemtion" in http://on-demand.gputechconf.com/gtc/2016/presentation/s6810-swapna-matwankar-optimizing-application-performance-cuda-tools.pdf - cuDeviceGetAttribute() - CU_DEVICE_ATTRIBUTE_COMPUTE_PREEMPTION_SUPPORTED - https://devtalk.nvidia.com/default/topic/1023524/system-management-and-monitoring-nvml-/-vgpu-management-qos-api-/ - docs https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#changing-vgpu-scheduling-policy - BUT compute preemption isn't exposed as a programmer visible control ! Now it is clear that NVidia rediscovered wheel - "preemption" in Pascal chip. Welcome to year 1964 ! (see https://en.wikipedia.org/wiki/Computer_multitasking#Preemptive_multitasking). This disclosure explains all pitfalls with vGPU and CUDA in previous chip generations that vGPU paravirtualized driver was unable to force switch SMX/SMM context and heavy depends on guest drivers cooperative multitasking (limited by FRL) and guest operating system. Unbelievable, shame, shame, shame on NVidia ! CUDA is now enabled in all "GRID P*-*Q" profiles. New "observability": Per process utilization API (usable for >= r375) with finally disclosured functions nvmlDeviceGetProcessUtilization() and nvmlDeviceGetVgpuProcessUtilization() (see https://devtalk.nvidia.com/default/topic/934756/system-management-and-monitoring-nvml-/per-process-statistics-nvidia-smi-pmon-/). Let's wait few more years, for pinning/binding on SMX/SMM/SMP to be cache effective, for mixing vGPU profiles on GPU ... |
|
|
|
嗨马丁
对Pascal GPU(A,B和Q)上的所有配置文件启用CUDA(App,vPC和vDWS) 至于在相同的物理GPU上混合FB配置文件,我们中的一些人在一段时间之前使用NVIDIA工程提出了这一点,但是有理由说明它没有被提供。 正如您所说,希望随着技术的发展,这将作为一项功能添加。 问候 本 以上来自于谷歌翻译 以下为原文 Hi Martin CUDA is enabled for all profiles on Pascal GPUs (A, B & Q) (App, vPC & vDWS) As for mixing FB Profiles on the same physical GPU, a few of us raised this with NVIDIA engineering a while back, however there are reasons why it hasn't been offered. As you say, hopefully this will be added as a feature as the technology develops. Regards Ben |
|
|
|
CUDA / OpenCL仅在P * - * Q(https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#features-grid-vgpu)中。
数字签名/usr/share/nvidia/vgpu/vgpuConfig.xml优先于/usr/share/nvidia/vgx/*.conf(请查看“egrep -i'cuda | vgpuType | signature'usr / share / nvidia / vgpu /vgpuConfig.xml“和”grep cuda_enabled /usr/share/nvidia/vgx/*.conf“)(https://gridforums.nvidia.com/default/topic/258/nvidia-grid-vgpu/documentation-for- vgpu-configs / post / 2087 /#2087)...你应该发布你的/usr/share/nvidia/vgpu/vgpuConfig.xml和/ usr / bin / nvidia-vgpud 以上来自于谷歌翻译 以下为原文 CUDA/OpenCL is only in P*-*Q (https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#features-grid-vgpu). Digitally signed /usr/share/nvidia/vgpu/vgpuConfig.xml has precendence over /usr/share/nvidia/vgx/*.conf (check with "egrep -i 'cuda|vgpuType|signature' usr/share/nvidia/vgpu/vgpuConfig.xml" and "grep cuda_enabled /usr/share/nvidia/vgx/*.conf") (https://gridforums.nvidia.com/default/topic/258/nvidia-grid-vgpu/documentation-for-vgpu-configs/post/2087/#2087) ... you should post your's /usr/share/nvidia/vgpu/vgpuConfig.xml and /usr/bin/nvidia-vgpud |
|
|
|
道歉,你是对的。
我刚刚重新检查,那些评估驱动程序不是生产。 生产驱动程序没有此功能。 请注意,我上面编辑了我的帖子,删除了错误的驱动程序信息,以免给其他阅读此内容的人造成混淆 以上来自于谷歌翻译 以下为原文 My apologies, you're correct. I've just re-checked and those were evaluation drivers not production. Production drivers do not have this functionality. Please note, I've edited my post above to remove the incorrect driver information so as not to add confusion for anyone else reading this |
|
|
|
Nvidia更新了调度程序幻灯片。
正如预期的那样,“QoS”标题被删除(新的抢占式调度程序远离真正的QoS)。 您可以使用旧的“共享/尽力/时间切片调度程序”与协作式多任务处理,或者您可以使用“固定/等同调度程序”进行抢占式多任务处理,并且由于“空/未使用”插槽而导致卡性能丢失。 无法重新分配“未使用”的插槽! 每个VM的“插槽”应该是可编程的(如设置比率/共享(最小保证和重新分配未使用)和设置最大值(加盖)!)。 (调度程序由驱动程序参数选择(https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#changing-vgpu-scheduling-policy)。) 更新摘要(删除“QoS”): 基于协作式多任务处理的共享/尽力而为/时间切片调度程序: 固定/等同调度程序基于抢占式多任务处理,性能丢失(“空/未使用的插槽”!): GTC-EU-2017更新: 以上来自于谷歌翻译 以下为原文 Nvidia updated scheduler slides. As expected "QoS" title was removed (the new preemptive schedulers are far away from true QoS). You can use old "Shared/Best Effort/Time Sliced Scheduler" with cooperative multitasking OR you can use "Fixed/Equal Scheduler" with preemptive multitasking and with card performance lost due to "empty/unused" slots. It is not possible to redistribute "unused" slots ! The "slots" per VM should be programmable (like set ratio/share (minimum guaranteed and redistribute unused) and set maximum (capping) !). (Scheduler is chosen by driver parameter (https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#changing-vgpu-scheduling-policy).) Updated summary (removed "QoS"): Shared/Best Effort/Time Sliced Scheduler based on cooperative multitasking: Fixed/Equal Schedulers based on preemptive multitasking with performance lost ("empty/unused slots"!): Update from GTC-EU-2017: |
|
|
|
只有小组成员才能发言,加入小组>>
使用Vsphere 6.5在Compute模式下使用2个M60卡遇到VM问题
3153 浏览 5 评论
是否有可能获得XenServer 7.1的GRID K2驱动程序?
3567 浏览 4 评论
小黑屋| 手机版| Archiver| 电子发烧友 ( 湘ICP备2023018690号 )
GMT+8, 2025-1-14 05:59 , Processed in 0.606866 second(s), Total 78, Slave 72 queries .
Powered by 电子发烧友网
© 2015 bbs.elecfans.com
关注我们的微信
下载发烧友APP
电子发烧友观察
版权所有 © 湖南华秋数字科技有限公司
电子发烧友 (威廉希尔官方网站 图) 湘公网安备 43011202000918 号 电信与信息服务业务经营许可证:合字B2-20210191 工商网监 湘ICP备2023018690号