AVR-LibC  2.3.0git
Standard C library for AVR-GCC
 

AVR-LibC Documen­tation

AVR-LibC Development Pages

Main Page

User Manual

Library Refe­rence

FAQ

Example Projects

File List

Index

Loading...
Searching...
No Matches
Benchmarks

The results below can only give a rough estimate of the resources necessary for using certain library functions. There is a number of factors which can both increase or reduce the effort required:

  • Expenses for preparation of operands and their stack are not considered.
  • In the table, the size includes all dependent functions.
  • Expenses of time of performance of some functions essentially depend on parameters of a call, for example, qsort() is recursive, and sprintf() receives parameters in a stack.
  • Different versions of the compiler can give a significant difference in code size and execution time. For example, the dtostre() function, compiled with avr-gcc 3.4.6, requires 930 bytes. After transition to avr-gcc 4.2.3, the size become 1088 bytes.

A few of libc Functions

avr-gcc version is 15.1.1

The size of a function is given in view of all picked up functions. By default AVR-LibC is compiled with -mcall-prologues. In parenthesis the size without taking into account the code for the prologue and epilogue routines is shown. Both sizes can coincide when no prologue and epilogue routines are present, and then only one code size is shown.

  • qsort sorts an array of char with 100 elements.
  • double has a size of 4, which plays a role in the float type promotions of arguments of varargs functions like sprintf.
  • For an overview of the different AVR architectures, see avr-gcc: Command Line Options.
    Function Units avr2 avr25 avr4 avr6
    atoi ("12345") Flash bytes
    Stack bytes
    Cycles
    82
    4
    155
    78
    4
    149
    74
    4
    149
    76
    6
    167
    atol ("12345") Flash bytes
    Stack bytes
    Cycles
    126
    5
    221
    118
    5
    209
    106
    4
    205
    110
    6
    223
    ftostre (1.2345f, s, 6, 0) Flash bytes
    Stack bytes
    Cycles
    1128 (1016)
    20
    1339
    1058 (948)
    20
    1178
    1058 (948)
    20
    1178
    1078 (968)
    22
    1191
    ftostrf (1.2345f, 15, 6, s) Flash bytes
    Stack bytes
    Cycles
    1648 (1536)
    40
    1645
    1524 (1414)
    40
    1469
    1524 (1414)
    40
    1469
    1548 (1438)
    43
    1486
    ktoa (123.45k, s, 2) Flash bytes
    Stack bytes
    Cycles
    316
    8
    479
    306
    8
    466
    298
    8
    466
    306
    10
    488
    itoa (12345, s, 10) Flash bytes
    Stack bytes
    Cycles
    110
    2
    879
    102
    2
    875
    102
    2
    875
    106
    3
    880
    ltoa (12345678L, s, 10) Flash bytes
    Stack bytes
    Cycles
    138
    2
    2766
    130
    2
    2762
    130
    2
    2762
    136
    3
    2767
    lltoa (12345678LL, s, 10) Flash bytes
    Stack bytes
    Cycles
    206
    3
    2239
    194
    3
    2209
    194
    3
    2209
    202
    4
    2214
    ulltoa_base10 (12345678ULL, s) Flash bytes
    Stack bytes
    Cycles
    182
    11
    1908
    172
    11
    1852
    168
    11
    1458
    168
    12
    1461
    malloc (1) Flash bytes
    Stack bytes
    Cycles
    664
    6
    103
    598
    6
    98
    598
    6
    98
    602
    7
    101
    realloc ((void*) 0, 1) Flash bytes
    Stack bytes
    Cycles
    1194 (1082)
    6
    103
    1072 (962)
    6
    98
    1072 (962)
    6
    98
    1050
    7
    101
    qsort (s, sizeof(s), 1, cmp) Flash bytes
    Stack bytes
    Cycles
    1206 (1094)
    160
    58011
    1004 (894)
    160
    51797
    978 (868)
    160
    47899
    1000
    167
    49671
    sprintf_min (s, "%d", 12345) Flash bytes
    Stack bytes
    Cycles
    1202 (1090)
    57
    1838
    1078 (968)
    57
    1706
    1074 (964)
    57
    1701
    1078
    60
    1743
    sprintf (s, "%d", 12345) Flash bytes
    Stack bytes
    Cycles
    1632 (1520)
    57
    1649
    1492 (1382)
    57
    1564
    1478 (1368)
    58
    1567
    1496
    61
    1603
    sprintf_flt (s, "%e", 1.2345) Flash bytes
    Stack bytes
    Cycles
    3218 (3106)
    68
    2513
    2970 (2860)
    68
    2302
    2946 (2836)
    69
    2309
    3048 (2938)
    72
    2347
    sscanf_min ("12345", "%d", &i) Flash bytes
    Stack bytes
    Cycles
    2826 (2714)
    56
    1491
    2502 (2392)
    56
    1366
    2498 (2388)
    56
    1366
    2746
    60
    1369
    sscanf ("12345", "%d", &i) Flash bytes
    Stack bytes
    Cycles
    1836 (1724)
    56
    1491
    1622 (1512)
    56
    1366
    1622 (1512)
    56
    1366
    1758
    60
    1369
    sscanf ("point,color", "%[a-z]", s) Flash bytes
    Stack bytes
    Cycles
    1836 (1724)
    90
    2754
    1622 (1512)
    90
    2607
    1622 (1512)
    90
    2607
    1758
    94
    2508
    sscanf_flt ("1.2345", "%e", &x) Flash bytes
    Stack bytes
    Cycles
    4872 (4760)
    40
    410
    4416 (4306)
    40
    364
    4392 (4282)
    40
    364
    4716 (4606)
    43
    341
    strtof ("1.2345", &end) Flash bytes
    Stack bytes
    Cycles
    1682 (1570)
    24
    1550
    1540 (1430)
    24
    1446
    1480 (1370)
    24
    1074
    1602 (1492)
    30
    1205
    strtol ("12345", &end, 0) Flash bytes
    Stack bytes
    Cycles
    384
    14
    606
    362
    14
    583
    344
    12
    351
    358
    15
    373
    strtoll ("12345", &end, 0) Flash bytes
    Stack bytes
    Cycles
    572 (460)
    20
    832
    534 (424)
    20
    785
    510 (400)
    18
    488
    534 (424)
    21
    513

Math Functions from libm

The following tables contain benchmark values for some floating-point functions over the indicated range(s) of input values.

Notice that the values for relative error and the Worst Case Execution Time Cyclesmax are only lower bounds. The best achievable accuracy for IEEE single with its 23 fractional bits in the mantissa is log10(2–24 ≈ 6·10–8) ≈ -7.22.

The poor performance of sinf, cosf and tanf occurs for values that are close to the poles (if any) or close to the non-zero zeros.

libm benchmarks for ATmega128 (avr51, with MUL)
Function Size x0 x1 Cyclesavr Cyclesmax log10(Errmax)
acosf 1102 -1 1 1957 2464 -6.66
asinf 1092 -1 1 1896 2454 -6.47
atanf 1058 -10 10 2879 3073 -6.92
cbrtf 514 -1e+06 1e+06 2573 2665 -6.92
ceilf 258 -1e+05 1e+05 108 177 -7.22
cosf 904 -1.57 1.58 1775 2126 -2.86
coshf 1366 -20 20 3053 3439 -5.78
expf 1320 -20 20 2588 3247 -5.78
floorf 258 -1e+05 1e+05 108 180 -7.22
frexpf 154 -1e+05 1e+05 40 40 -7.22
logf 1076 0 100 2392 2866 -6.69
log10f 1076 0 100 2397 2866 -6.62
log2f 1052 0 100 2252 2723 -6.83
modff 484 -1e+05 1e+05 365 456 -7.22
roundf 236 -1e+05 1e+05 111 156 -7.22
sinf 910 0 3.15 1744 2146 -3.67
sinhf 1466 -20 20 3043 3461 -5.78
sqrtf 256 0 1e+06 474 510 -7.22
tanf 1178 0 3.15 2178 2946 -2.86
tanhf 1494 -20 20 3148 3620 -6.37
truncf 234 -1000 1000 140 178 -7.22

 

libm benchmarks for ATmega128 (avr51, with MUL)
Function Size x0 x1 y0 y1 Cyclesavr Cyclesmax log10(Errmax)
+ 380 -1e+10 1e+10 -1e+10 1e+10 102 256 -7.22
* 380 -1e+10 1e+10 -1e+10 1e+10 129 139 -7.22
/ 390 -1e+10 1e+10 -1e+10 1e+10 469 501 -7.22
atan2f 1206 -10 10 -10 10 2882 3455 -6.82
fdimf 446 -1e+10 1e+10 -1e+10 1e+10 75 218 -7.22
fmaxf 62 -1e+10 1e+10 -1e+10 1e+10 30 34 -∞
fminf 62 -1e+10 1e+10 -1e+10 1e+10 30 34 -∞
fmodf 312 -1e+10 1e+10 -1e+10 1e+10 88 324 -7.22
hypotf 1092 -1e+10 1e+10 -1e+10 1e+10 850 927 -6.92
ldexpf 238 -1e+10 1e+10 -10 10 40 40 -7.22
powf 1858 0 1e+04 -10 10 5182 5833 -4.65
__builtin_powif 732 0 1e+04 -10 10 648 1223 -6.39

For devices wihout MUL instruction the following applies:

  • The execution times for multiplication and for the transcendental functions are roughly twice the time for devices that have MUL.
  • The execution times for the remaining functions are roughly the same.
  • The maximal relative errors are the same, i.e. independent of MUL.
libm benchmarks for AT90S8515 (avr2, no MUL)
Function Size x0 x1 Cyclesavr Cyclesmax log10(Errmax)
acosf 1104 -1 1 3513 3888 -6.66
asinf 1096 -1 1 3452 3879 -6.47
atanf 1054 -10 10 5280 5541 -6.92
cbrtf 538 -1e+06 1e+06 2702 2795 -6.92
ceilf 250 -1e+05 1e+05 105 174 -7.22
cosf 906 -1.57 1.58 3441 3798 -2.86
coshf 1348 -20 20 4966 5346 -5.78
expf 1312 -20 20 4512 5140 -5.78
floorf 250 -1e+05 1e+05 105 177 -7.22
frexpf 150 -1e+05 1e+05 39 39 -7.22
logf 1076 0 100 4562 5023 -6.69
log10f 1076 0 100 4568 5035 -6.62
log2f 1060 0 100 4205 4632 -6.83
modff 490 -1e+05 1e+05 365 456 -7.22
roundf 230 -1e+05 1e+05 109 154 -7.22
sinf 912 0 3.15 3408 3818 -3.67
sinhf 1434 -20 20 4941 5358 -5.78
sqrtf 252 0 1e+06 474 510 -7.22
tanf 1164 0 3.15 4080 4785 -2.86
tanhf 1462 -20 20 5055 5544 -6.37
truncf 226 -1000 1000 137 175 -7.22

 

libm benchmarks for AT90S8515 (avr2, no MUL)
Function Size x0 x1 y0 y1 Cyclesavr Cyclesmax log10(Errmax)
+ 376 -1e+10 1e+10 -1e+10 1e+10 102 253 -7.22
* 378 -1e+10 1e+10 -1e+10 1e+10 346 386 -7.22
/ 374 -1e+10 1e+10 -1e+10 1e+10 467 499 -7.22
atan2f 1192 -10 10 -10 10 5287 5844 -6.82
fdimf 436 -1e+10 1e+10 -1e+10 1e+10 74 219 -7.22
fmaxf 66 -1e+10 1e+10 -1e+10 1e+10 31 36 -∞
fminf 66 -1e+10 1e+10 -1e+10 1e+10 30 36 -∞
fmodf 302 -1e+10 1e+10 -1e+10 1e+10 86 322 -7.22
hypotf 1068 -1e+10 1e+10 -1e+10 1e+10 1297 1387 -6.92
ldexpf 232 -1e+10 1e+10 -10 10 39 39 -7.22
powf 1820 0 1e+04 -10 10 9490 10180 -4.65
__builtin_powif 734 0 1e+04 -10 10 1143 2140 -6.39

Math Functions for IEEE double from LibF7

The following tables contain benchmark values for some IEEE double floating-point functions over the indicated range(s) of input values. LibF7 is a IEEE double implementation hosted by libgcc since GCC v10.

The code sizes include all dependencies with the exception of potential prologue and epilogue routines (__prologue_saves__, __epilogue_restores__).

The sizes of functions don't add up. For example, sinl, cosl, asinl, acosl and sqrtl together occupy only 4744 bytes of code including the prologue and epilogue routines. With -mrelax the code size reduces further to around 4400 bytes.

Notice that the values for relative error and the Worst Case Execution Time Cyclesmax are only lower bounds. The best achievable accuracy for IEEE double with its 52 fractional bits in the mantissa is log10(2–53 ≈ 1.1·10–16) ≈ -15.95.

LibF7 Benchmarks for ATmega128 (avr51, with MUL)
Function Size x0 x1 Cyclesavr Cyclesmax log10(Errmax)
acosl 3330 -1 1 16214 17974 -15.65
asinl 3330 -1 1 16103 17985 -15.65
atanl 2810 -10 10 20337 21229 -15.65
cbrtl 4100 -1e+06 1e+06 32326 33372 -15.33
ceill 1760 -1e+05 1e+05 1429 1766 -15.95
cosl 3886 -1.57 1.58 11222 13623 -15.66
coshl 3642 -20 20 20006 21436 -15.65
expl 3550 -20 20 16937 18389 -15.65
floorl 1696 -1e+05 1e+05 1352 1687 -15.95
frexpl 1252 -1e+05 1e+05 748 765 -15.95
logl 2944 0 100 15085 15810 -15.65
log10l 2954 0 100 15732 16422 -15.59
log2l 2954 0 100 15737 16403 -15.65
roundl 1782 -1e+05 1e+05 1661 1757 -15.95
sinl 3884 0 3.15 11953 14000 -15.65
sinhl 3850 -20 20 19812 21506 -15.51
sqrtl 1406 0 1e+06 3009 3087 -15.65
tanl 3932 0 3.15 22413 24830 -15.65
tanhl 3662 -20 20 20611 22312 -14.70
truncl 1696 -1000 1000 1117 1172 -15.95

 

LibF7 Benchmarks for ATmega128 (avr51, with MUL)
Function Size x0 x1 y0 y1 Cyclesavr Cyclesmax log10(Errmax)
+ 1460 -1e+10 1e+10 -1e+10 1e+10 1528 1664 -15.95
* 1596 -1e+10 1e+10 -1e+10 1e+10 1620 1661 -15.65
/ 1568 -1e+10 1e+10 -1e+10 1e+10 3563 3756 -15.65
atan2l 3198 -10 10 -10 10 21089 21984 -15.65
fdiml 1720 -1e+10 1e+10 -1e+10 1e+10 1370 1799 -15.95
fmaxl 1448 -1e+10 1e+10 -1e+10 1e+10 1249 1353 -∞
fminl 1448 -1e+10 1e+10 -1e+10 1e+10 1252 1353 -∞
fmodl 2794 -1e+10 1e+10 -1e+10 1e+10 5170 5940 -15.95
hypotl 2302 -1e+10 1e+10 -1e+10 1e+10 5004 5164 -15.66
ldexpl 1252 -1e+10 1e+10 -10 10 739 757 -15.95
powl 4078 0 1e+04 -10 10 32398 34023 -14.46
__builtin_powil 2220 0 1e+04 -10 10 3396 6173 -15.12

Fixed-Point Functions from <stdfix.h>

The following tables contain benchmark values for some fixed-point functions over the indicated range of input values.

  • V+ denotes the smallest value that is larger than V for the considered fixed-point type. Similarly, V- denotes the largest value that is smaller than V for the considered type.
  • The code sizes include all dependencies.

Notice that the values for absolute error Errmax, and the Worst Case Execution Times Cyclesmax are only lower bounds.

Fixed-Point Benchmarks for ATmega128 (avr51, with MUL)
Function Size x0 x1 Cyclesavr Cyclesmax Errmax
log2uhk 78 0+ 10 52 75 1.25e-02
log21puhr 32 0 1- 22 22 4.28e-03
sinuhk_deg 270 0 256- 52 54 6.44e-05
cosuhk_deg 316 0 256- 72 76 6.44e-05
sqrthr 42 0 1- 100 100 7.78e-03
sqrtuhr 38 0 1- 98 98 3.90e-03
acosk 404 -1 1 416 572 5.39e-05
acosuk 328 0 1 385 526 4.51e-05
asink 386 -1 1 414 589 4.95e-05
asinuk 328 0 1 381 539 4.43e-05
atank 368 -1 1 242 264 4.04e-05
atank 368 1 10 888 913 4.65e-05
atanuk 298 0 1 203 206 2.52e-05
atanur 152 0 1- 188 191 2.46e-05
exp2k 216 -10 10 256 318 1.09e-02
exp2uk 164 0 10 245 293 1.06e-02
exp2m1ur 112 0 1- 177 180 2.13e-05
log2uk 184 0+ 10 257 305 6.03e-05
log21pur 114 0 1- 212 215 2.87e-05
cospi2k 182 -4 4 249 258 4.52e-05
sinpi2k 182 -4 4 247 256 4.54e-05
sinpi2ur 120 0 1- 215 219 2.80e-05
sqrtur 66 0 1- 274 297 1.53e-05

 

Fixed-Point Benchmarks for ATtiny88 (avr25, no MUL)
Function Size x0 x1 Cyclesavr Cyclesmax Errmax
log2uhk 114 0+ 10 339 380 1.25e-02
log21puhr 70 0 1- 312 332 4.28e-03
sqrthr 54 0 1- 102 105 7.78e-03
sqrtuhr 50 0 1- 100 104 3.90e-03
acosk 446 -1 1 1097 1233 5.39e-05
acosuk 378 0 1 1070 1197 4.51e-05
asink 428 -1 1 1095 1249 4.95e-05
asinuk 378 0 1 1066 1210 4.43e-05
atank 412 -1 1 877 973 4.04e-05
atank 412 1 10 1538 1630 4.65e-05
atanuk 352 0 1 844 924 2.52e-05
atanur 212 0 1- 830 910 2.46e-05
exp2k 236 -10 10 994 1075 1.09e-02
exp2uk 184 0 10 982 1050 1.06e-02
exp2m1ur 134 0 1- 916 939 2.13e-05
log2uk 214 0+ 10 1222 1294 6.03e-05
log21pur 146 0 1- 1187 1228 2.87e-05
cospi2k 204 -4 4 1235 1285 4.52e-05
sinpi2k 204 -4 4 1233 1283 4.54e-05
sinpi2ur 146 0 1- 1203 1248 2.80e-05
sqrtur 64 0 1- 273 296 1.53e-05