Dell PowerEdge C4130 4x V100 SXM2 Nvidia GPU NVLink
Dell PowerEdge C4130
Default: 4x Tesla V100 16GB SXM2 NVLink GPUs (Upgradable to V100 32GB)
PCIe Slots: 2 Half length PCIe 3.0
2x Hotswap 1.8" SATA SSD Slots (NON Standard so we suggest you purchase from us)
2x 2000W 15A Power supply
IMPORTANT: Extremly loud server that is meant for datacenter environment only
Depth of this server is longer by about 5" then standard servers which are about 30" so make sure your rack will fit it!
Dimensions:
H: 4.31 cm (1.7 in), W: 43.4 cm (17.09 in), D: 88.58 cm (34.87 in)
Tested private LLM with Open WebGUI on this Platform:
We also have some tutorials and videos for running DELL SXM2 servers for Local Private LLMs
Working examples on c4130 with 4x 16GB V100 SXM2 NVIDIA GPU
======================================================
Reasoning:
vllm serve unsloth/Qwen3-14B-unsloth-bnb-4bit --port 8000 --served-model-name "qwen3-14b" --quantization bitsandbytes --gpu_memory_utilization 0.9 --pipeline_parallel_size 4
Coding Model:
vllm serve QuantTrio/Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8 --port 8000 --served-model-name Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8 --enable-expert-parallel --gpu_memory_utilization 0.8 --tensor_parallel_size 4 --tokenizer "Qwen/Qwen3-Coder-30B-A3B-Instruct" --trust-remote-code --max_model_len 64000 --max-num-seqs 512 --swap-space 16
VLM Model: Visual LLM
vllm serve Qwen/Qwen3-VL-8B-Instruct --port 8000 --served-model-name Qwen3-VL-8B --gpu_memory_utilization 0.9 --tensor_parallel_size 4 --trust-remote-code --max_model_len 32000 --max-num-seqs 512
Other Models:
vllm serve unsloth/Devstral-Small-2507-bnb-4bit --port 8000 --served-model-name "devstral" --quantization bitsandbytes --gpu_memory_utilization 0.9 --pipeline_parallel_size 4
Non Quantized!
vllm serve microsoft/Phi-4 --port 8000 --served-model-name "phi-14b" --gpu_memory_utilization 0.9 --pipeline_parallel_size 4
Working examples on c4130 with 4x 32GB V100 SXM2 NVIDIA GPU (Able to run larger LLMs and full context, multiple models on the same server simultaneously/splitting GPUs between VMs in Proxmox)
==========================================================
Coding Model:
vllm serve QuantTrio/Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8 --port 8000 --served-model-name Qwen3-Coder-30B-A3B-Instruct-GPTQ-Int8 --enable-expert-parallel --gpu_memory_utilization 0.8 --tensor_parallel_size 4 --tokenizer "Qwen/Qwen3-Coder-30B-A3B-Instruct" --trust-remote-code --max-num-seqs 512 --swap-space 16
VLM Model: Visual
vllm serve Qwen/Qwen3-VL-8B-Instruct --port 8000 --served-model-name Qwen3-VL-8B --gpu_memory_utilization 0.9 --tensor_parallel_size 4 --trust-remote-code
Reasoning:
vllm serve Qwen/Qwen3-30B-A3B --port 8000 --served-model-name "qwen3-30b" --gpu_memory_utilization 0.9 --pipeline_parallel_size 4 --enable-expert-parallel
Huge Model:
vllm serve unsloth/DeepSeek-R1-Distill-Llama-70B-bnb-4bit --port 8000 --served-model-name DeepSeek-R1-Distill-Llama-70B-bnb-4bit --gpu_memory_utilization 0.9 --pipeline_parallel_size 4 --trust-remote-code --quantization bitsandbytes