英伟达
直播中

刘璐

7年用户 228经验值
私信 关注
[问答]

用于vGPU的GPU调度程序

你好。
我对NVidia开发人员提出了问题和建议。
关于GPU调度程序的真正功能的信息很少。
调度程序只是简单的循环法吗?
它是可编程的吗?
它是从dom0编程的(例如,Dom0中的vgpu / libnvidia-vgpu进程)?
十多年来,有更复杂的调度程序。
如果你查看网络硬件,你可以看到更多高级调度程序(https://en.wikipedia.org/wiki/Network_scheduler)。
由于NVidia背景基于Sun Microsystems,因此SunOS / Solaris中有更复杂的处理器调度程序示例。
SunOS / Solaris公平共享调度程序(FSS)(实现共享,包括分层共享(区域/项目))和动态池(实现封顶和固定/绑定)的组合非常强大,并且易于实现并且几乎可以展示其功能
20年。
GPU调度程序可以更复杂吗?
如果是的话,还有更多实际目标:
- 如果共享是可编程的,则应该删除“在一个物理gpu中一个类型的所有vgpu(例如k120q)”的限制!
- 如果共享的分层可编程性比CUDA可用,则所有vGPU类型都应该可用!
- 如果调度程序具有固定/绑定功能(对SMX),则由于较少的指令和数据缓存未命中,性能应该提升!
- 如果调度程序(可能是非分层的)可以移动到domU for Grid2.0“完整”配置文件M6-8Q和M60-8Q可以删除dom0的开销并在domU中启用CUDA而不是相同的功能应该可用于k180q和k280q
(是的,我仍然乐观地认为NVidia总部允许将此功能向后移植到K1 / K2网格)!
GPU调度程序是否有任何可观察性API(性能监视器API)(每个vGPU(在Dom0中)和vGPU内部的每个进程(在DomU中))?
https://gridforums.nvidia.com/de ... utilization-per-vm/
谢谢你的回答,马丁

以上来自于谷歌翻译


以下为原文

Hello.

I have questions and proposal to NVidia developers. There are few information about true function of GPU scheduler.



Is the scheduler only simple round-robin ?

Is it programmable ?

Is it programmed from dom0 (eg. vgpu/libnvidia-vgpu process in Dom0) ?

There are more sophisticated schedulers for more then decade.
If you look in network hardware you can see many more advanced schedulers (https://en.wikipedia.org/wiki/Network_scheduler).
Because NVidia background is based on Sun Microsystems there is more sophisticated example of processor scheduler in SunOS/Solaris. The SunOS/Solaris combination of Fair Share Scheduler (FSS) (implements sharing, including hierarchical shares (zones/projects)) and dynamic pools (implements capping and pinning/binding) is VERY powerful and also simple to implement and demonstrating its power for nearly 20 years.

Can the GPU scheduler be more sophisticated ?

If yes, there are more practical goals:

- If the share is programmable than the restriction about "all vgpu of one type (for example k120q) in one physical gpu" should be removed  !

- If the share is hierarchically programmable than the CUDA in all vGPU types should be available !

- If the scheduler have pinning/binding capability (to SMX) than the performance should be boosted due to less instruction and data cache misses !

- If the scheduler (probably non hierarchical) can be moved to domU for Grid2.0 "full" profiles M6-8Q and M60-8Q that remove overhead of dom0 and enable CUDA in domU than the same feature should be available for k180q and k280q (yes, I am still optimistic that NVidia HQ allows to backport this feature and more to K1/K2 grid) !

Is there any observability API (performance monitor API) for GPU scheduler (per vGPU (in Dom0) and per processes inside vGPU (in DomU)) ?
( https://gridforums.nvidia.com/de ... utilization-per-vm/ )

Thanks for answers, Martin

回帖(9)

张倩

2018-9-11 16:44:41
嗨,MArtin,
我认为可以解除对同质(所有相同)vGPU类型的限制,但它有点像我头脑中的普通可编程阵列,固定大小意味着可以有效地完成许多事情。
我认为还需要避免记忆碎片,特别是当GPU被重新分配时(我想到vMotion和类似的可能的那一天)将是一个考虑因素。
确保进行常规和持续测试,质量保证和回归测试的需要会带来一些限制。
BAck移植总是需要投资额外的质量保证,不仅测试我们,还测试OEM测试实验室。
各种各样的事情是可能的,但我们必须保持质量和可靠性。
可以固定和封装CPU,但我自己的经验非常混杂,尤其是CAD / 3D应用程序 - 反向固定PTC Creo实际上提高了性能,直观的钉扎降低了它,因为一些非常严重的半光谱行为iirc。
太多配置选项通常意味着用户会陷入困境。
我不是这方面的专家 - 我希望有人会出现这种情况。
尽管我们需要知道用户故事/业务案例是什么,但我们需要了解每个功能请求....为什么你需要混合vGPU类型和证据,它值得在测试矩阵等方面进行大量扩展......
最好的祝愿,
雷切尔

以上来自于谷歌翻译


以下为原文

Hi MArtin,

The restriction on homogenous (all the same) vGPU types could I guess be lifted however it's a bit like normal programmable arrays in my head, that a fixed size means many things can be done efficiently. I think also the need to avoid memmory fragmentation particularly as GPUs reassigned (I'm think of the day when vMotion and similar is possible) would be a consideration.

Some restrictions are imposed by the need to ensure cotinual and ongoing testing, QA and regression testing. BAck porting always requires investment in extra QA and test for not just us but also the OEMs test labs. All sorts of things are possible but we must maintain quality and reliability.

It is possible to pin and cap CPUs but my own experiences have been extremely mixed particularly with CAD/3D applications - reverse pinning PTC Creo actually improved performance and the intuitive pinning degraded it because of some very stragne semophore behaviour iirc. Too many configuration options can often mean users get themselves in a real muddle.

I'm not an expert in this area - I'm hoping someone who is will pop along. With every feature request though we need to know what the user story/business case is.... why you _need_ to mix vGPU types and evidence it's worth a substantial expansion in the test matrix etc...

Best wishes,
Rachel
举报

李星星

2018-9-11 16:57:26
vGPU启动的“广度优先”分配机制对于性能是最佳的,但是第一次分配确定整个GPU的vGPU配置文件并且它不可移动。
例如,在K1上启动新的4x k120q,而下一个新的k160q是不可启动的,旧的k120q是不可移动的。
是的,还有“深度优先”,但它对一个GPU上共享的4x k120q的性能有影响。
这导致这五个VM / VDI示例的用户体验(用户体验,今年的NVidia流行语)较低。
最好的问候,M.C>

以上来自于谷歌翻译


以下为原文


There is "breadth-first" allocation mechanism for vGPU startup that is optimal for performance but first allocation determine vGPU profile for whole GPU and it is not movable. For example start new 4x k120q on K1 and next new k160q is unstartable and old k120q are unmovable. Yes, there is also "depth-first" but it has impact on performance for shared 4x k120q on one GPU. This leads to lower UX (user-experience, NVidia buzzword for this year) for this five VM/VDI example.

Best regards, M.C>
举报

李月如

2018-9-11 17:14:30
嗨马丁,
广度和深度分配是由XenServer / XenCenter和VMware中的等效功能实现的功能。
我想知道你是否真的需要更多的管理工具控制。
我仍然有点担心这会大大扩展QA矩阵;
很多用户都有足够的用户或类似的应用程序,他们可以很容易地汇集。
我没有听到很多人告诉我每个pGPU拥有同质虚拟机是个大问题...
最好的祝愿,
雷切尔

以上来自于谷歌翻译


以下为原文

Hi Martin,

The breadth and depth allocations are functionality implemented by XenServer/XenCenter and by the equivalent in VMware. I'm wondering if you really need more control in the management tools.

I'm still somewhat wary that this could expand the QA matrix substantially; a lot of users have sufficient users or similar apps that they can pool easily. I haven't heard a large number of people telling me that having homogenous VMs per pGPU is a big issue...

Best wishes,
Rachel
举报

李进锋

2018-9-11 17:23:39
嗨,MArtin,
我和Citrix的产品管理团队谈了一句话,虽然他们可能会调整分发,但它仍然只是一天的开始。
实际上,他们认为VMotion / XenMotion是一种可以根据需要平衡负载的前进方式(这是Citrix / VMware和NVIDIA都希望长期实现的目标)。
最好的祝愿,
雷切尔

以上来自于谷歌翻译


以下为原文

Hi MArtin,

I had a word with the product management team at Citrix and whilst they could possibly tweak the distribution it would still just be start of day. Really long goal they feel VMotion/XenMotion is the way forward that would balance load as needed (this is something both Citrix/VMware and NVIDIA are keen to achieve long term).

Best wishes,
Rachel
举报

更多回帖

发帖
×
20
完善资料,
赚取积分