Heroku Performance-M dynos have only one core?
Things did not change as of December 17th, 2021. See below for latest numbers.
While investigating some performance issues on Heroku for one of our clients I noticed that cat /proc/cpuinfo returned 4 cores on Standard-2X dynos but only 1 core on Performance-M dynos. That seemed strange given that the CPU on those machines should have 8~10 cores and in the case of Perfomance-M they should be dedicated.
These findings confirmed the unexpected slow performances we were experiencing on those bigger machines: using more than 1~2 processes on those machines did not improve performance.
To verify my hunch I logged on those machines through the Heroku CLI and wrote a small program that does nothing else than use CPU
echo "ruby -e 'require \"benchmark\"; puts Benchmark.measure { 60221411.times { 3.14159 * 6.626068 } }'" > test; chmod u+x test;
Then I ran it on both dynos (a few times) launching multiple copies in parallel. The followings are the results on a Standard-2X dyno
~/tmp $ ./test & 4.280000 0.000000 4.280000 ( 4.311505) ~/tmp $ ./test & ./test & 4.950000 0.000000 4.950000 ( 5.078005) 5.260000 0.000000 5.260000 ( 5.400922) ~/tmp $ ./test & ./test & ./test & 5.290000 0.000000 5.290000 ( 5.322220) 5.460000 0.000000 5.460000 ( 5.544633) 6.100000 0.000000 6.100000 ( 6.267409) ~/tmp $ ./test & ./test & ./test & ./test & 5.780000 0.000000 5.780000 ( 6.257643) 5.970000 0.020000 5.990000 ( 6.598910) 6.060000 0.030000 6.090000 ( 6.937576) 6.420000 0.010000 6.430000 ( 6.971127)
and these are the results on a Performance-M dyno
~/tmp $ ./test & 3.190000 0.000000 3.190000 ( 3.192567) ~/tmp $ ./test & ./test & 6.440000 0.000000 6.440000 ( 6.521593) 6.340000 0.000000 6.340000 ( 6.613333) ~/tmp $ ./test & ./test & ./test & 6.430000 0.000000 6.430000 ( 9.337328) 6.440000 0.000000 6.440000 ( 9.656173) 6.360000 0.000000 6.360000 ( 9.790865) ~/tmp $ ./test & ./test & ./test & ./test & 6.410000 0.000000 6.410000 ( 13.137289) 6.410000 0.000000 6.410000 ( 13.165291) 6.440000 0.000000 6.440000 ( 13.333027) 6.420000 0.010000 6.430000 ( 13.311972)
The results seem to confirm our hunch. The Standard-2X dyno behaves as one would expect a virtualized environment with 4 cores to run, with a small overhead as parallelization increases but a consistent timing up to the number of cores.
On the other hand, the Performance-M dyno really behaves as if it only had one core, with performances increasing linearly as the multiple processes have to share the time on the single core.
I have opened a ticket with Heroku on this but so far I have not heard back. Last night I provided this proof-of-concept and I hope it will make them investigate.
My guess is that the cpu core value tricks the kernel scheduler.
Update on December 17th, 2021
As of today things did not seem to have changed!
Here are the results when running it on a Standard-2X dyno
/tmp $ ./test & 3.924000 0.016000 3.940000 ( 4.023044) /tmp $ ./test & ./test & 3.836000 0.028000 3.864000 ( 4.055048) 4.012000 0.048000 4.060000 ( 4.233349) /tmp $ ./test & ./test & ./test & 4.112000 0.040000 4.152000 ( 4.506793) 4.128000 0.016000 4.144000 ( 4.615282) 4.192000 0.044000 4.236000 ( 4.689541) /tmp $ ./test & ./test & ./test & ./test & 4.444000 0.040000 4.484000 ( 4.999437) 4.400000 0.040000 4.440000 ( 5.067705) 4.688000 0.044000 4.732000 ( 5.349200) 4.684000 0.032000 4.716000 ( 5.490364)
and these are the results when running on a Performance-M dyno
/tmp $ ./test & 2.312000 0.000000 2.312000 ( 2.310593) /tmp $ ./test & ./test & 4.516000 0.004000 4.520000 ( 4.563026) 4.484000 0.004000 4.488000 ( 4.611348) /tmp $ ./test & ./test & test & 4.528000 0.004000 4.532000 ( 6.218242) 4.540000 0.000000 4.540000 ( 6.412617) 4.124000 0.000000 4.124000 ( 6.840410) /tmp $ ./test & ./test & ./test & ./test & 4.524000 0.000000 4.524000 ( 9.125465) 4.548000 0.004000 4.552000 ( 9.105754) 4.528000 0.000000 4.528000 ( 9.077439) 4.524000 0.000000 4.524000 ( 9.064622)
The clock time does not change much when increasing processes on Standard-2X while on a Performance-M it increases linearly with every extra process. This confirms the limit on the number of cores.














