Benchmarks

CPU benchmarks

L_MAX=8; BATCH_SIZE=1000; N_FEATURES=2000
preparing real life transformation rule
transformation rule is computed
*************
CPU BENCHMARKS
Running on 72 threads
*************

***forward***

python loops; active dim 0; forward; cpu:  0.287917349073622
torch index_add_; active dim 0; forward; cpu:  0.2882208559248183
cpp; active dim 0; forward; cpu:  0.06418042712741429

python loops; active dim 1; forward; cpu:  0.10099238819546169
torch index_add_; active dim 1; forward; cpu:  0.19646917449103463
cpp; active dim 1; forward; cpu:  0.015409390131632486

python loops; active dim 2; forward; cpu: 1.13313627243042
torch index_add_; active dim 2; forward; cpu:  0.9349785645802816
cpp; active dim 2; forward; cpu  0.029056257671780057

***backward***

python loops; active dim 0; backward; cpu  8.56085040834215
torch index_add_; active dim 0; backward; cpu  0.8768206967247857
cpp; active dim 0; backward; cpu  0.14745905664232042

python loops; active dim 1; backward; cpu  12.528574811087715
torch index_add_; active dim 1; backward; cpu  1.3579767015245225
cpp; active dim 1; backward; cpu  0.11550368203057183

python loops; active dim 2; backward; cpu  1.43605547481113
torch index_add_; active dim 2; backward; cpu  1.3703345987531874
cpp; active dim 2; backward; cpu  0.05493460761176215

GPU benchmarks

ninja: no work to do.
L_MAX=8; BATCH_SIZE=1000; N_FEATURES=2000
preparing real life transformation rule
transformation rule is computed

*************
GPU benchmarks
*************

***forward***

python loops; active dim 0; forward; cuda:  0.06442029486762153
torch index_add_; active dim 0; forward; cuda:  0.04863210678100585

python loops; active dim 1; forward; cuda:  0.07500541941324869
torch index_add_; active dim 1; forward; cuda:  0.04478669526841905

python loops; active dim 2; forward; cuda:  0.3839471096462673
torch index_add_; active dim 2; forward; cuda:  0.04732361221313477
CUDA kernel; active dim 2; forward; cuda:  0.002660088883505928

***backward***

python loops; active dim 0; backward; cuda:  1.1864166802300347
torch index_add_; active dim 0; backward; cuda:  0.09936237504747178

python loops; active dim 1; backward; cuda:  1.5248256022135418
torch index_add_; active dim 1; backward; cuda:  0.09908599090576171

python loops; active dim 2; backward; cuda:  0.7663369886610244
torch index_add_; active dim 2; backward; cuda:  0.7663484971788194
CUDA kernel; active dim 2; backward; cuda:  0.0068883590698242195