Dell PowerEdge C4130 4x V100 SXM2 Nvidia GPU NVLink

$1,750.00

In stock

SKU

c4130

CPU

RAM

SXM2 GPU Options

Hard Drives

Additional Network Cards

Warranty

Additional Options

Rack Rails + $135.00

IDrac Enerprise + $150.00

Qty

Skip to the end of the images gallery

Skip to the beginning of the images gallery

Details

Dell PowerEdge C4130

Default: 4x Tesla V100 16GB SXM2 NVLink GPUs (Upgradable to V100 32GB)

PCIe Slots: 2 Half length PCIe 3.0

2x Hotswap 1.8" SATA SSD Slots (NON Standard so we suggest you purchase from us)

2x 2000W 15A Power supply

IMPORTANT: Extremly loud server that is meant for datacenter environment only

Depth of this server is longer by about 5" then standard servers which are about 30" so make sure your rack will fit it!

Dimensions:
H: 4.31 cm (1.7 in), W: 43.4 cm (17.09 in), D: 88.58 cm (34.87 in)

Tested private LLM with Open WebGUI on this Platform:
We also have some tutorials and videos for running DELL SXM2 servers for Local Private LLMs

Working examples on c4130 with 4x 16GB V100 SXM2 NVIDIA GPU
======================================================

Reasoning:
vllm serve unsloth/Qwen3-14B-unsloth-bnb-4bit --port 8000 --served-model-name "qwen3-14b" --quantization bitsandbytes --gpu_memory_utilization 0.9 --pipeline_parallel_size 4

Coding Model:
vllm serve QuantTrio/Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8 --port 8000 --served-model-name Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8 --enable-expert-parallel --gpu_memory_utilization 0.8 --tensor_parallel_size 4 --tokenizer "Qwen/Qwen3-Coder-30B-A3B-Instruct" --trust-remote-code --max_model_len 64000 --max-num-seqs 512 --swap-space 16

VLM Model: Visual LLM
vllm serve Qwen/Qwen3-VL-8B-Instruct --port 8000 --served-model-name Qwen3-VL-8B --gpu_memory_utilization 0.9 --tensor_parallel_size 4 --trust-remote-code --max_model_len 32000 --max-num-seqs 512

Other Models:
vllm serve unsloth/Devstral-Small-2507-bnb-4bit --port 8000 --served-model-name "devstral" --quantization bitsandbytes --gpu_memory_utilization 0.9 --pipeline_parallel_size 4

Non Quantized!
vllm serve microsoft/Phi-4 --port 8000 --served-model-name "phi-14b" --gpu_memory_utilization 0.9 --pipeline_parallel_size 4

Working examples on c4130 with 4x 32GB V100 SXM2 NVIDIA GPU (Able to run larger LLMs and full context, multiple models on the same server simultaneously/splitting GPUs between VMs in Proxmox)
==========================================================
Coding Model:
vllm serve QuantTrio/Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8 --port 8000 --served-model-name Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8 --enable-expert-parallel --gpu_memory_utilization 0.8 --tensor_parallel_size 4 --tokenizer "Qwen/Qwen3-Coder-30B-A3B-Instruct" --trust-remote-code --max-num-seqs 512 --swap-space 16

VLM Model: Visual
vllm serve Qwen/Qwen3-VL-8B-Instruct --port 8000 --served-model-name Qwen3-VL-8B --gpu_memory_utilization 0.9 --tensor_parallel_size 4 --trust-remote-code

Reasoning:
vllm serve Qwen/Qwen3-30B-A3B --port 8000 --served-model-name "qwen3-30b" --gpu_memory_utilization 0.9 --pipeline_parallel_size 4 --enable-expert-parallel

Huge Model:
vllm serve unsloth/DeepSeek-R1-Distill-Llama-70B-bnb-4bit --port 8000 --served-model-name DeepSeek-R1-Distill-Llama-70B-bnb-4bit --gpu_memory_utilization 0.9 --pipeline_parallel_size 4 --trust-remote-code --quantization bitsandbytes