Skip to content

DistributedCC Experiments

Evan West edited this page Mar 27, 2022 · 16 revisions

Results on March 26th

Cluster Stats:

  • 4 c6i.4xlarge EC2 instances all in the same cluster placement group
  • 16 Xeon Platinum
  • 32 GiB of RAM

EBS (Disk) Stats:

  • 80 GiB general purpose2 SSD rated at 240 IOPS
  • Reads about 15 million graph updates/s from binary graph streams

Networking Performance

Latency to each from main to worker

Using ping round trip latency is measured between 0.096 ms and 0.157 ms

Throughput

Measured with Iperf

To install:

sudo amazon-linux-extras install -y epel
sudo yum install -y iperf

Results

iperf -s on worker nodes. Then iperf -c <worker_addr> on the main node. Using default iperf tcp options, network bandwidth of 9.03-9.09 Gibabits/s from main to worker.

DistributedStreamingCC: Kron16, cold file cache, WorkerCluster::num_batches=512

Used comamnd sync; echo 3 | sudo tee -a /proc/sys/vm/drop_caches to clear file cache

machines, worker_proc 1, 16 2, 16 4, 16 2, 32 4, 48
ingestion (million/s) 1.867 2.014 2.570 3.818 5.411
CC algorithm time (s) 0.43 0.14 0.14 0.14 0.14
memory usage (main) 7.70 GiB 7.70 GiB 7.70 GiB 8.84 GiB 9.8 GiB
memory usage (worker) 112 MiB 148 MiB 148 MiB 148 MiB 138 MiB

DistributedStreamingCC: Kron16, pre-populated file cache, WorkerCluster::num_batches=512

Used command cat /mnt/ssd1/kron_16_stream_binary > /dev/null to prepopulate

machines, worker_proc 1, 16 2, 16 4, 16 2, 32 4, 48
ingestion (million/s) 1.869 2.019 2.584 3.829 5.463
CC algorithm time (s) 0.45 0.14 0.14 0.14 0.14

Kron17 results

4 machines, 48 workers, num_batches=512

Kron17 dataset cold cache pre-pop
ingestion (million/s) 5.550 5.593
CC algorithm time (s) 0.31 0.43
memory usage (main) 18.5 GiB N/A
memory usage (worker) 167 MiB N/A

pre-populating has less affect than it might have otherwise because we can't fit the entire file in RAM much less sketches and the file.

Clone this wiki locally