Started by user Jeremy Enos Running as SYSTEM Building in workspace /var/lib/jenkins/jobs/pytorch_infer/workspace [SSH] script: TARGETNODE="""" module load anaconda3_gpu/4.13.0 module load cuda/11.7.0 cd pytorch_infer rm -f infer_results_jenkins.csv # Slurm Arguments sargs="--nodes=1 " sargs+="--ntasks-per-node=1 " sargs+="--mem=16g " sargs+="--time=00:10:00 " sargs+="--account=bbmb-hydro " sargs+="--gpus-per-node=1 " sargs+="--gpu-bind=closest " # Add Target node if it exists if [[ ! -z ${TARGETNODE} ]] then PARTITION=`sinfo --format="%R,%N" -n hydro61 | grep hydro61 | cut -d',' -f1 | tail -1` sargs+="--partition=${PARTITION} " sargs+="--nodelist=${TARGETNODE} " else sargs+="--partition=a100 " fi # Executable to run scmd="python benchmark.py --model-list jenkins_list_short.txt --bench inference --channels-last --results-file infer_results_jenkins.csv" # Run the command start_time=`date +%s.%N` echo $"Starting srun with command" echo "srun $sargs $scmd" srun $sargs $scmd end_time=`date +%s.%N` python transpose_results.py runtime=$( echo "$end_time - $start_time" | bc -l ) echo "YVALUE=$runtime" > time.txt printf "Pytorch test completed in %0.3f secs\n" $runtime [SSH] executing... Starting srun with command srun --nodes=1 --ntasks-per-node=1 --mem=16g --time=00:10:00 --account=bbmb-hydro --gpus-per-node=1 --gpu-bind=closest --partition=a100 python benchmark.py --model-list jenkins_list_short.txt --bench inference --channels-last --results-file infer_results_jenkins.csv srun: job 96855 queued and waiting for resources srun: job 96855 has been allocated resources Running benchmark on hydro04 Running bulk validation on these pretrained models: vgg19_bn, resnet18, resnet34, simplenetv1_5m_m1, Benchmarking in float32 precision. NHWC layout. torchscript disabled Model vgg19_bn created, param count: 143678248 Running inference benchmark on vgg19_bn for 40 steps w/ input size (3, 224, 224) and batch size 256. Infer [8/40]. 1430.41 samples/sec. 178.969 ms/step. Infer [16/40]. 1429.98 samples/sec. 179.023 ms/step. Infer [24/40]. 1430.77 samples/sec. 178.925 ms/step. Infer [32/40]. 1430.27 samples/sec. 178.987 ms/step. Infer [40/40]. 1429.94 samples/sec. 179.028 ms/step. Inference benchmark of vgg19_bn done. 1429.74 samples/sec, 179.03 ms/step Benchmarking in float32 precision. NHWC layout. torchscript disabled Model resnet18 created, param count: 11689512 Running inference benchmark on resnet18 for 40 steps w/ input size (3, 224, 224) and batch size 256. Infer [8/40]. 10656.37 samples/sec. 24.023 ms/step. Infer [16/40]. 10657.54 samples/sec. 24.021 ms/step. Infer [24/40]. 10656.06 samples/sec. 24.024 ms/step. Infer [32/40]. 10643.66 samples/sec. 24.052 ms/step. Infer [40/40]. 10640.28 samples/sec. 24.060 ms/step. Inference benchmark of resnet18 done. 10636.38 samples/sec, 24.06 ms/step Benchmarking in float32 precision. NHWC layout. torchscript disabled Model resnet34 created, param count: 21797672 Running inference benchmark on resnet34 for 40 steps w/ input size (3, 224, 224) and batch size 256. Infer [8/40]. 6484.22 samples/sec. 39.480 ms/step. Infer [16/40]. 6478.77 samples/sec. 39.514 ms/step. Infer [24/40]. 6489.75 samples/sec. 39.447 ms/step. Infer [32/40]. 6487.87 samples/sec. 39.458 ms/step. Infer [40/40]. 6488.85 samples/sec. 39.452 ms/step. Inference benchmark of resnet34 done. 6487.30 samples/sec, 39.45 ms/step Benchmarking in float32 precision. NHWC layout. torchscript disabled Model simplenetv1_5m_m1 created, param count: 5752808 Running inference benchmark on simplenetv1_5m_m1 for 40 steps w/ input size (3, 224, 224) and batch size 256. Infer [8/40]. 12353.55 samples/sec. 20.723 ms/step. Infer [16/40]. 12391.71 samples/sec. 20.659 ms/step. Infer [24/40]. 12430.79 samples/sec. 20.594 ms/step. Infer [32/40]. 12446.47 samples/sec. 20.568 ms/step. Infer [40/40]. 12433.91 samples/sec. 20.589 ms/step. Inference benchmark of simplenetv1_5m_m1 done. 12428.78 samples/sec, 20.59 ms/step args: Namespace(model_list='jenkins_list_short.txt', bench='inference', detail=False, results_file='infer_results_jenkins.csv', num_warm_iter=10, num_bench_iter=40, model='vgg19_bn', batch_size=256, img_size=None, input_size=None, use_train_size=False, num_classes=None, gp=None, channels_last=True, grad_checkpointing=False, amp=False, precision='float32', torchscript=False, fuser='', opt='sgd', opt_eps=None, opt_betas=None, momentum=0.9, weight_decay=0.0001, clip_grad=None, clip_mode='norm', smoothing=0.1, drop=0.0, drop_path=None, drop_block=None) args: Namespace(model_list='jenkins_list_short.txt', bench='inference', detail=False, results_file='infer_results_jenkins.csv', num_warm_iter=10, num_bench_iter=40, model='resnet18', batch_size=256, img_size=None, input_size=None, use_train_size=False, num_classes=None, gp=None, channels_last=True, grad_checkpointing=False, amp=False, precision='float32', torchscript=False, fuser='', opt='sgd', opt_eps=None, opt_betas=None, momentum=0.9, weight_decay=0.0001, clip_grad=None, clip_mode='norm', smoothing=0.1, drop=0.0, drop_path=None, drop_block=None) args: Namespace(model_list='jenkins_list_short.txt', bench='inference', detail=False, results_file='infer_results_jenkins.csv', num_warm_iter=10, num_bench_iter=40, model='resnet34', batch_size=256, img_size=None, input_size=None, use_train_size=False, num_classes=None, gp=None, channels_last=True, grad_checkpointing=False, amp=False, precision='float32', torchscript=False, fuser='', opt='sgd', opt_eps=None, opt_betas=None, momentum=0.9, weight_decay=0.0001, clip_grad=None, clip_mode='norm', smoothing=0.1, drop=0.0, drop_path=None, drop_block=None) args: Namespace(model_list='jenkins_list_short.txt', bench='inference', detail=False, results_file='infer_results_jenkins.csv', num_warm_iter=10, num_bench_iter=40, model='simplenetv1_5m_m1', batch_size=256, img_size=None, input_size=None, use_train_size=False, num_classes=None, gp=None, channels_last=True, grad_checkpointing=False, amp=False, precision='float32', torchscript=False, fuser='', opt='sgd', opt_eps=None, opt_betas=None, momentum=0.9, weight_decay=0.0001, clip_grad=None, clip_mode='norm', smoothing=0.1, drop=0.0, drop_path=None, drop_block=None) --result [ { "model": "simplenetv1_5m_m1", "infer_samples_per_sec": 12428.78, "infer_step_time": 20.589, "infer_batch_size": 256, "infer_img_size": 224, "param_count": 5.75 }, { "model": "resnet18", "infer_samples_per_sec": 10636.38, "infer_step_time": 24.06, "infer_batch_size": 256, "infer_img_size": 224, "param_count": 11.69 }, { "model": "resnet34", "infer_samples_per_sec": 6487.3, "infer_step_time": 39.452, "infer_batch_size": 256, "infer_img_size": 224, "param_count": 21.8 }, { "model": "vgg19_bn", "infer_samples_per_sec": 1429.74, "infer_step_time": 179.028, "infer_batch_size": 256, "infer_img_size": 224, "param_count": 143.68 } ] Pytorch test completed in 73.106 secs [SSH] completed [SSH] exit-status: 0 [workspace] $ /bin/sh -xe /tmp/jenkins13334595695312051041.sh + scp 'HYDRO_REMOTE:~svchydrojenkins/pytorch_infer/time.txt' /var/lib/jenkins/jobs/pytorch_infer/workspace + scp 'HYDRO_REMOTE:~svchydrojenkins/pytorch_infer/infer_results_jenkins.csv' /var/lib/jenkins/jobs/pytorch_infer/workspace Recording plot data Saving plot series data from: /var/lib/jenkins/jobs/pytorch_infer/workspace/time.txt Sending e-mails to: [email protected] Finished: SUCCESS