Dell PowerEdge C4130 4x V100 SXM2 Nvidia GPU NVLink

$1,750.00
In stock
SKU
c4130

Dell PowerEdge C4130 

Default: 4x Tesla V100 16GB SXM2 NVLink GPUs   (Upgradable to V100 32GB)

PCIe Slots:
2 Half length PCIe 3.0

2x Hotswap 1.8" SATA SSD Slots (NON Standard so we suggest you purchase from us)

2x 2000W 15A Power supply 

IMPORTANT: Extremly loud server that is meant for datacenter environment only

Depth of this server is longer by about 5" then standard servers which are about 30" so make sure your rack will fit it!

Dimensions:
H: 4.31 cm (1.7 in), W: 43.4 cm (17.09 in), D: 88.58 cm (34.87 in)

Tested private LLM with Open WebGUI on this Platform:
We also have some tutorials and videos for running DELL SXM2 servers for Local Private LLMs

 

Working examples on c4130 with 4x 16GB V100 SXM2 NVIDIA GPU
======================================================

Reasoning: 
vllm serve unsloth/Qwen3-14B-unsloth-bnb-4bit --port 8000 --served-model-name "qwen3-14b" --quantization bitsandbytes --gpu_memory_utilization 0.9 --pipeline_parallel_size 4 

Coding Model:
vllm serve QuantTrio/Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8 --port 8000 --served-model-name Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8 --enable-expert-parallel --gpu_memory_utilization 0.8 --tensor_parallel_size 4 --tokenizer "Qwen/Qwen3-Coder-30B-A3B-Instruct" --trust-remote-code --max_model_len 64000 --max-num-seqs 512 --swap-space 16

VLM Model: Visual LLM
vllm serve Qwen/Qwen3-VL-8B-Instruct --port 8000 --served-model-name Qwen3-VL-8B --gpu_memory_utilization 0.9 --tensor_parallel_size 4 --trust-remote-code  --max_model_len 32000 --max-num-seqs 512

Other Models:
vllm serve unsloth/Devstral-Small-2507-bnb-4bit --port 8000 --served-model-name "devstral" --quantization bitsandbytes --gpu_memory_utilization 0.9 --pipeline_parallel_size 4

Non Quantized!
vllm serve microsoft/Phi-4 --port 8000 --served-model-name "phi-14b"  --gpu_memory_utilization 0.9 --pipeline_parallel_size 4

 

Working examples on c4130 with 4x 32GB V100 SXM2 NVIDIA GPU  (Able to run larger LLMs and full context, multiple models on the same server simultaneously/splitting GPUs between VMs in Proxmox)
==========================================================
Coding Model:
vllm serve QuantTrio/Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8 --port 8000 --served-model-name Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8 --enable-expert-parallel --gpu_memory_utilization 0.8 --tensor_parallel_size 4 --tokenizer "Qwen/Qwen3-Coder-30B-A3B-Instruct" --trust-remote-code --max-num-seqs 512 --swap-space 16

VLM Model: Visual 
vllm serve Qwen/Qwen3-VL-8B-Instruct --port 8000 --served-model-name Qwen3-VL-8B --gpu_memory_utilization 0.9 --tensor_parallel_size 4 --trust-remote-code  

Reasoning:
vllm serve Qwen/Qwen3-30B-A3B --port 8000 --served-model-name "qwen3-30b" --gpu_memory_utilization 0.9 --pipeline_parallel_size 4 --enable-expert-parallel

Huge Model:
vllm serve unsloth/DeepSeek-R1-Distill-Llama-70B-bnb-4bit --port 8000 --served-model-name DeepSeek-R1-Distill-Llama-70B-bnb-4bit --gpu_memory_utilization 0.9 --pipeline_parallel_size 4 --trust-remote-code --quantization bitsandbytes

 

Copyright © 2026 CanServers | All Prices are in CAD. All used custom and refurbished servers are shipped from Canada.