英伟达
直播中

郭武莱

7年用户 192经验值
私信 关注
[问答]

使用vGPU的Horizon View:未分配GPU资源

我们在Nvidia K1s上运行了几年的vSGA。
升级到Horizo​​n View 6.2并测试vGPU配置文件。
最初的测试进行得很顺利但是一旦我将我的池缩小了几个虚拟机失败的定制。
但是,Horizo​​n没有发生错误,它们只是处于自定义状态。
如果我强制重置电源然后响应Windows恢复以正常启动,它通常会继续定制并完成。
最好的部分是因为它永远不会启动,因为控制台是通过连接到VM的vGPU K100禁用的,所以我无法进入VNC。
这是非常奇怪的行为,我有一个与VMware的开放票。
当我挖掘日志时,我在VM的vmware.log中找到了这个有趣的项目:
2016-04-27T01:44:09.329Z |
MKS |
W110:GLWindow:无法保留主机GPU资源
2016-04-27T01:44:09.339Z |
VMX |
I120:[msg.mks.noGPUResourceFallback]硬件GPU资源不可用。
虚拟机将使用软件渲染。
如果你看看周围的工作,我重新启动VM,它最终会工作。
似乎在开机期间某些VM在K1上分配了GPU核心时出现故障。
我没有找到任何在线提到这个问题。
我会保持这篇文章更新。
系统环境
超微
双K1
ESXi 6.0 U2
Nvidia VIB 361.40
Windows Nvidia 362.13

以上来自于谷歌翻译


以下为原文

We have been running vSGA for a couple years off of Nvidia K1s. Upgraded to Horizon View 6.2 and testing vGPU profiles. The initial testing went very well but once I scaled my pool out several VMs failed customization. However, no error occurred in Horizon, they just remained in a customization status.

If I forced a power reset and then responded to windows recovery to boot normally, it would usually continue customization and finish. The best part is because it never boots, I can't VNC into it as the console is disabled with the vGPU K100 attached to the VM. It is very odd behavior and I have an open ticket with VMware.  


As I dug through the logs, I found this interesting item in the vmware.log for the VM:

2016-04-27T01:44:09.329Z| mks| W110: GLWindow: Unable to reserve host GPU resources
2016-04-27T01:44:09.339Z| vmx| I120: [msg.mks.noGPUResourceFallback] Hardware GPU resources are not available. The virtual machine will use software rendering.

If you look at the work around, where I power reset the VM and it eventually works. It seems like there is a failure during power on for some VMs to get assigned a GPU core on the K1s. I haven't found ANYTHING online referring to this issue. I'll keep this post updated.


System Environment

  • Super Micro

  • Dual K1s

  • ESXi 6.0 U2

  • Nvidia VIB 361.40

  • Windows Nvidia 362.13

回帖(7)

王瑞

2018-10-8 14:14:24
你能检查一下安装的K1卡的vBIOS,并确保它们是最新版本。
您可能需要从SuperMicro请求此更新。
另外,为什么要使用K100?
K120Q是更好的选择,更多的图形内存和完全相同的密度,每个GPU仅支持最多8个vGPU会话(因此在K1上为32)。

以上来自于谷歌翻译


以下为原文

Can you check the vBIOS of the K1 cards installed and ensure they're at the latest version.

You may need to request this update from SuperMicro.

Also, why use K100?

K120Q is a better choice, more Graphics Memory and exactly the same density as each GPU only supports a maximum of 8 vGPU sessions ( so that's 32 on a K1).
举报

杨思

2018-10-8 14:27:37
感谢Jason,我联系了Super Micro,但他们不知道Nvidia GRID卡的任何“授权”BIOS更新。
我也尝试在线查看,但没有找到GRID卡的BIOS版本历史记录。
运行nvidia-smi命令,它报告它们正在运行:
VBIOS版本:80.07.BE.00.04 
MultiGPU董事会:是的 
威廉希尔官方网站 板ID:0x8300 
GPU部件号:900-52401-0020-000 
Inforom版本 
图像版本:2401.0502.00.02
你知道我在哪里可以找到这些信息吗?
另外,关于K100的选择。
我同意,我们只是想分别测试K100和K120Q,以了解正在使用的应用程序的性能提升。
我计划将K120Q用于生产,因为我们获得了相同的用户密度。
感谢您及时的回复!

以上来自于谷歌翻译


以下为原文

Thanks Jason, I contacted Super Micro but they are not aware of any "authorized" BIOS updates for the Nvidia GRID cards. I also tried looking online but didn't find a BIOS version history for the GRID cards.

Running the nvidia-smi command, it reports that they are running:

VBIOS Version                   : 80.07.BE.00.04
    MultiGPU Board                  : Yes
    Board ID                        : 0x8300
    GPU Part Number                 : 900-52401-0020-000
    Inforom Version
        Image Version               : 2401.0502.00.02



Do you know where I could find that info? Also, regarding the K100 choice. I agree, we just wanted to test the K100 and K120Q separately to understand the performance gains on applications being used. I plan to go K120Q for production since we get the same user density.


Thanks for the quick response!
举报

马龙

2018-10-8 14:38:54
您使用的是最新的VBIOS,因此无需更新。
我会避免使用K100配置文件,它仅用于遗留支持,我建议所有新项目/部署不使用它。
出于兴趣,您在创建的池中有多少VM,并且在这些主机中可以使用许多K1?

以上来自于谷歌翻译


以下为原文

You're on the latest VBIOS so no update required.

I would avoid the K100 profile, it's only there for legacy support and I would recommend all new projects / deployments to not use it.

Out of interest, how many VM's do you have in the pool you're creating, and hown many K1's are available in those hosts?
举报

李小红

2018-10-8 14:51:36
感谢您查看vBIOS。
我将测试K120Q然后报告。
关于池***为55个VM,目标主机有两个K1。
如果群集中发生主机故障,则具有两个K1的第二个主机将成为备用主机(我知道不支持vmotion)。

以上来自于谷歌翻译


以下为原文

Thanks for checking on the vBIOS.

I will test out the K120Q then and report back. Regarding the pool size, it was planned to be 55 VMs with the target host having two K1s. A second host with two K1s would be a standby in case of host failure in the cluster (I know vmotion isn't supported).
举报

更多回帖

发帖
×
20
完善资料,
赚取积分