GM107的编码性能不佳(只有一个编码器引擎和Maxwell Gen1),即使对于GPU H.264加速64xVDI(M10变体,作为营销荣耀)它也是不合适的。
反思:
1x GRID M40 ==特斯拉M10是带有一个编码器引擎的4xGM107(Maxwell Gen1) - 用于编码~235 FPS(低通道,高通道,1920x1080,YUV4:2:0,8位) - 4 * 235~940 FPS / 225W / 2
slot / 4 * 5 = 20 SMM = 2560 CUDA核心
1x Quadro M5000是GM204(Maxwell Gen2),带有两个编码器引擎 - 用于编码~361 FPS(低通,高通,1920x1080,YUV4:2:0,8位) - 2 * 361~722 FPS / 150W / 2插槽/ 16 SMM
= 2048个CUDA核心
1x Quadro M4000是带有两个编码器引擎的GM204(Maxwell Gen2) - 用于编码~361 * 80%FPS(低通,高通,1920x1080,YUV4:2:0,8位) - 2 * 361 * 80%~578 FPS( -
由于降频而估计性能下降20%/ 120W / 1插槽/ 13 SMM = 1664 CUDA内核
2x Quadro M4000 - 2 * 2 * 361 * 80%~1155 FPS / 240W / 2插槽/ 2 * 13 SMM = 3328 CUDA核心 - 似乎是更好的支持解决方案(FPS + 20%,功率+ 7%,价格
+ 30%(Quadro M4000(ebay / new)~2x $ 650 = $ 1300,GRID M40(ebay)〜$ 1000))
1x Quadro P5000是带有两个编码器引擎的GP104(Pascal) - 用于编码~535 FPS(低通,高通,1920x1080,YUV4:2:0,8位) - 2 * 535~1070 FPS / 180W / 2插槽(+ 8k HEVC)
和10位编码)/ 20个SMM = 2560个CUDA核心
1x特斯拉P4是带有两个编码器引擎的GP104(Pascal) - 用于编码~535 * 70%FPS(低通,高通,1920x1080,YUV4:2:0,8位) - 2 * 535 * 70%~749 FPS(-30)
由于低频估计性能下降百分比)/ 50-75W / 1插槽(+ 8k HEVC和10bit编码)(FPS -20%,功率-66%,价格+ 80%)/ 20 SMP = 2560 CUDA核心
2x特斯拉P4 - 2 * 2 * 535 * 70%~1498 FPS / 100-150W / 2插槽/ 2 * 20 SMP = 5120 CUDA内核(FPS + 60%,功率-33%,价格+ 260%)
NVidia应该披露更详细的编码器/解码器基准(依赖于时钟速度(和突发)和内存带宽)。
参考文献:
https://developer.nvidia.com/video-encode-decode-gpu-support-matrix
https://developer.nvidia.com/nvidia-video-codec-sdk#NVENCFeatures
https://developer.nvidia.com/nvenc-application-note
以上来自于谷歌翻译
以下为原文
Encoding performance of GM107 is not good (only one encoder engine and Maxwell Gen1) and it is unsuitable even for GPU H.264 accelerated 64xVDI (M10 variant, as marketing glorify). Rethink:
- 1x GRID M40 == Tesla M10 is 4xGM107 (Maxwell Gen1) with one encoder engine - for encoding ~ 235 FPS (lowlat, highperf, 1920x1080, YUV4:2:0, 8 bit) - 4*235 ~ 940 FPS / 225W / 2 slots / 4*5=20 SMM = 2560 CUDA cores
- 1x Quadro M5000 is GM204 (Maxwell Gen2) with two encoder engines - for encoding ~ 361 FPS (lowlat, highperf, 1920x1080, YUV4:2:0, 8 bit) - 2*361 ~ 722 FPS / 150W / 2 slots / 16 SMM = 2048 CUDA cores
- 1x Quadro M4000 is GM204 (Maxwell Gen2) with two encoder engines - for encoding ~ 361*80% FPS (lowlat, highperf, 1920x1080, YUV4:2:0, 8 bit) - 2*361*80% ~ 578 FPS (-20% performance drop estimated due to underclocking) / 120W / 1 slot / 13 SMM = 1664 CUDA cores
- 2x Quadro M4000 - 2*2*361*80% ~ 1155 FPS / 240W / 2 slots / 2*13 SMM = 3328 CUDA cores - It seems to be better and supported solution (FPS +20%, power +7%, price +30% (Quadro M4000 (ebay/new) ~ 2x$650=$1300, GRID M40 (ebay) ~ $1000))
- 1x Quadro P5000 is GP104 (Pascal) with two encoder engines - for encoding ~ 535 FPS (lowlat, highperf, 1920x1080, YUV4:2:0, 8 bit) - 2*535 ~ 1070 FPS / 180W / 2 slots (+ 8k HEVC and 10bit encoding) / 20 SMM = 2560 CUDA cores
- 1x Tesla P4 is GP104 (Pascal) with two encoder engines - for encoding ~ 535*70% FPS (lowlat, highperf, 1920x1080, YUV4:2:0, 8 bit) - 2*535*70% ~ 749 FPS (-30% performance drop estimated due to underclocking) / 50-75W / 1 slot (+8k HEVC and 10bit encoding)(FPS -20%, power -66%, price +80%) / 20 SMP = 2560 CUDA cores
- 2x Tesla P4 - 2*2*535*70% ~ 1498 FPS / 100-150W / 2 slots / 2*20 SMP = 5120 CUDA cores (FPS +60%, power -33%, price +260%)
NVidia should disclosure more detailed encoder/decoder benchmarks (with dependency on clock speed (and burst) and memory bandwidth).
Refs:
https://developer.nvidia.com/video-encode-decode-gpu-support-matrix
https://developer.nvidia.com/nvidia-video-codec-sdk#NVENCFeatures
https://developer.nvidia.com/nvenc-application-note
GM107的编码性能不佳(只有一个编码器引擎和Maxwell Gen1),即使对于GPU H.264加速64xVDI(M10变体,作为营销荣耀)它也是不合适的。
反思:
1x GRID M40 ==特斯拉M10是带有一个编码器引擎的4xGM107(Maxwell Gen1) - 用于编码~235 FPS(低通道,高通道,1920x1080,YUV4:2:0,8位) - 4 * 235~940 FPS / 225W / 2
slot / 4 * 5 = 20 SMM = 2560 CUDA核心
1x Quadro M5000是GM204(Maxwell Gen2),带有两个编码器引擎 - 用于编码~361 FPS(低通,高通,1920x1080,YUV4:2:0,8位) - 2 * 361~722 FPS / 150W / 2插槽/ 16 SMM
= 2048个CUDA核心
1x Quadro M4000是带有两个编码器引擎的GM204(Maxwell Gen2) - 用于编码~361 * 80%FPS(低通,高通,1920x1080,YUV4:2:0,8位) - 2 * 361 * 80%~578 FPS( -
由于降频而估计性能下降20%/ 120W / 1插槽/ 13 SMM = 1664 CUDA内核
2x Quadro M4000 - 2 * 2 * 361 * 80%~1155 FPS / 240W / 2插槽/ 2 * 13 SMM = 3328 CUDA核心 - 似乎是更好的支持解决方案(FPS + 20%,功率+ 7%,价格
+ 30%(Quadro M4000(ebay / new)~2x $ 650 = $ 1300,GRID M40(ebay)〜$ 1000))
1x Quadro P5000是带有两个编码器引擎的GP104(Pascal) - 用于编码~535 FPS(低通,高通,1920x1080,YUV4:2:0,8位) - 2 * 535~1070 FPS / 180W / 2插槽(+ 8k HEVC)
和10位编码)/ 20个SMM = 2560个CUDA核心
1x特斯拉P4是带有两个编码器引擎的GP104(Pascal) - 用于编码~535 * 70%FPS(低通,高通,1920x1080,YUV4:2:0,8位) - 2 * 535 * 70%~749 FPS(-30)
由于低频估计性能下降百分比)/ 50-75W / 1插槽(+ 8k HEVC和10bit编码)(FPS -20%,功率-66%,价格+ 80%)/ 20 SMP = 2560 CUDA核心
2x特斯拉P4 - 2 * 2 * 535 * 70%~1498 FPS / 100-150W / 2插槽/ 2 * 20 SMP = 5120 CUDA内核(FPS + 60%,功率-33%,价格+ 260%)
NVidia应该披露更详细的编码器/解码器基准(依赖于时钟速度(和突发)和内存带宽)。
参考文献:
https://developer.nvidia.com/video-encode-decode-gpu-support-matrix
https://developer.nvidia.com/nvidia-video-codec-sdk#NVENCFeatures
https://developer.nvidia.com/nvenc-application-note
以上来自于谷歌翻译
以下为原文
Encoding performance of GM107 is not good (only one encoder engine and Maxwell Gen1) and it is unsuitable even for GPU H.264 accelerated 64xVDI (M10 variant, as marketing glorify). Rethink:
- 1x GRID M40 == Tesla M10 is 4xGM107 (Maxwell Gen1) with one encoder engine - for encoding ~ 235 FPS (lowlat, highperf, 1920x1080, YUV4:2:0, 8 bit) - 4*235 ~ 940 FPS / 225W / 2 slots / 4*5=20 SMM = 2560 CUDA cores
- 1x Quadro M5000 is GM204 (Maxwell Gen2) with two encoder engines - for encoding ~ 361 FPS (lowlat, highperf, 1920x1080, YUV4:2:0, 8 bit) - 2*361 ~ 722 FPS / 150W / 2 slots / 16 SMM = 2048 CUDA cores
- 1x Quadro M4000 is GM204 (Maxwell Gen2) with two encoder engines - for encoding ~ 361*80% FPS (lowlat, highperf, 1920x1080, YUV4:2:0, 8 bit) - 2*361*80% ~ 578 FPS (-20% performance drop estimated due to underclocking) / 120W / 1 slot / 13 SMM = 1664 CUDA cores
- 2x Quadro M4000 - 2*2*361*80% ~ 1155 FPS / 240W / 2 slots / 2*13 SMM = 3328 CUDA cores - It seems to be better and supported solution (FPS +20%, power +7%, price +30% (Quadro M4000 (ebay/new) ~ 2x$650=$1300, GRID M40 (ebay) ~ $1000))
- 1x Quadro P5000 is GP104 (Pascal) with two encoder engines - for encoding ~ 535 FPS (lowlat, highperf, 1920x1080, YUV4:2:0, 8 bit) - 2*535 ~ 1070 FPS / 180W / 2 slots (+ 8k HEVC and 10bit encoding) / 20 SMM = 2560 CUDA cores
- 1x Tesla P4 is GP104 (Pascal) with two encoder engines - for encoding ~ 535*70% FPS (lowlat, highperf, 1920x1080, YUV4:2:0, 8 bit) - 2*535*70% ~ 749 FPS (-30% performance drop estimated due to underclocking) / 50-75W / 1 slot (+8k HEVC and 10bit encoding)(FPS -20%, power -66%, price +80%) / 20 SMP = 2560 CUDA cores
- 2x Tesla P4 - 2*2*535*70% ~ 1498 FPS / 100-150W / 2 slots / 2*20 SMP = 5120 CUDA cores (FPS +60%, power -33%, price +260%)
NVidia should disclosure more detailed encoder/decoder benchmarks (with dependency on clock speed (and burst) and memory bandwidth).
Refs:
https://developer.nvidia.com/video-encode-decode-gpu-support-matrix
https://developer.nvidia.com/nvidia-video-codec-sdk#NVENCFeatures
https://developer.nvidia.com/nvenc-application-note
举报