Hardware

NVIDIA DGX-1

Appliance designed for machine and deep learning use cases.
 

DGX-1 is a ready-to-use system for machine learning, deep learning and artificial intelligence use cases.

 

8 V100 GPUs are included in the server, connected to the eachother using links supporting up to 40Gb/s bidirectionnal bandwidth, allowing efficient direct access memory and data sharing. The hardware comes along with a software stack allowing you to easily fetch and use prebuilt containers fitted for deep learning applications (Tensorflow, Caffee, Caffee2, Cuda 8, Cudnn...)

Hardware_topo

 

software_stack

 

 

 

 

Spare time on your training step !

training

 

 

 

 

 

 

 

image sources:
 

Full documentation available here

 

Available Soon !

Support

The support is available by our mailing list hardware.labs@ml.ovh.net.

 

Specs

Server characteristics
CPU

2 x E5-2698 v4
2 x 20 cores x 2 threads
2.2GHz

RAM 512 GB DDR4 - 2400 MHz
Disk 1x480GB SSD + 4x1920GB SSD
GPU 8 x V100 + Nvlink mesh
Price

800€/48 hours (VAT excl.)

Features
Bandwidth 250Mbps
IP 1 IPv4

 

FAQ

The list of available containers can be fetch from the DGX, here: http://nvcr.ovh/list.txt

Pull containers from our mirror like this

docker pull nvcr.ovh/nvidia/$container:$release

Available $container:$release

  •   caffe:17.10
  •   caffe2:17.10
  •   cntk:17.10
  •   cuda:9.0-cudnn7-devel-ubuntu16.04
  •   digits:17.10
  •   mxnet:17.10
  •   pytorch:17.10
  •   tensorflow:17.10
  •   theano:17.10
  •   torch:17.10

 

 

You may wonder why any customized hardware raid or partitioning layout provided through the OVH installation wizzard is ignored for the DGX-1. We do so on purpose, as we want to stick to the DGX appliance native settings regarding the layout:

The DGX has got 5 SSD disks:

  • The first 480GB one hosts the operating system
  • The 4 2TB other ones are gathered in a hardware raid 0, to benefit from stripes, so that IO accesses bottleneck is minimized during dataset accesses, while training or simulating

The raid 0 is formatted in ext4 and mounted on /raid: this is where datasets are intended to be cached.

Status

  • ALPHA
  • BETA
  • GAMMA