DDLSim-Lab

Distributed simulation environment using bare metal nodes

Active Research Since 2025 Fully Open Source

Project Lead: Kaitlyn Brishae Truby

Contributed by many American universities & Research Labs

View on GitHub Explore Capabilities

Project Overview

DDLSim-Lab is an open-source research project designed to simulate, analyze, and optimize distributed deep learning systems using AI-driven control mechanisms.

The project enables experimentation with large-scale, heterogeneous, and failure-prone environments, including edge-cloud hybrid infrastructures. It provides researchers with a reproducible, cost-free environment to test novel algorithms, scheduling strategies, and fault-tolerance mechanisms without requiring access to expensive physical testbeds.

Global Network
Bare Metal Servers

Why Bare-Metal is Required

To achieve scientifically valid results, DDLSim-Lab must run on bare-metal infrastructure. Virtualization layers introduce non-deterministic noise that undermines the fidelity of network and performance measurements. Bare-metal deployment ensures:

  • Elimination of virtualization overhead – Essential for accurate scaling studies.
  • Realistic network behavior – Direct access to NICs enables precise emulation of latency, jitter, packet loss.
  • High-fidelity experiments – Kernel-level networking, SR-IOV, and RDMA support allow experimentation with cutting-edge interconnects.