AVR-LibC  2.2.0
Standard C library for AVR-GCC
 

AVR-LibC Documentation

Logo

AVR-LibC Development Pages

Main Page

User Manual

Library Reference

FAQ

Example Projects

File List

Loading...
Searching...
No Matches
Compiler optimization

Problems with reordering code

Author
Jan Waclawek

Programs contain sequences of statements, and a naive compiler would execute them exactly in the order as they are written. But an optimizing compiler is free to reorder the statements — or even parts of them — if the resulting "net effect" is the same. The "measure" of the "net effect" is what the standard calls "side effects", and is accomplished exclusively through accesses (reads and writes) to variables qualified as volatile. So, as long as all volatile reads and writes are to the same addresses and in the same order (and writes write the same values), the program is correct, regardless of other operations in it. One important point to note here is, that time duration between consecutive volatile accesses is not considered at all.

Unfortunately, there are also operations which are not covered by volatile accesses. An example of this in AVR-GCC/AVR-LibC are the cli() and sei() macros defined in <avr/interrupt.h>, which convert directly to the respective assembler mnemonics through the __asm__() statement. They constitute a variable access by means of their memory clobber, and they are (implicitly) volatile because they don't have an output operand. So the compiler may not reorder these inline asm statements with respect to other memory accesses or volatile actions. However, such asm statementy may still be reordered with other statement that are neither volatile nor access memory.

Note that even a volatile asm instruction can be moved relative to other code, including across (expensive) arithmetic and jump instructions [...]

See also
http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

However, not even a volatile memory barrier like

__asm __volatile__ ("" ::: "memory");

keeps GCC from reordering non-volatile, non-memory accesses across such barriers. Peter Dannegger provided a nice example of this effect:

#define cli() __asm volatile( "cli" ::: "memory" )
#define sei() __asm volatile( "sei" ::: "memory" )
unsigned int ivar;
void test2 (unsigned int val)
{
val = 65535U / val;
cli();
ivar = val;
sei();
}
#define cli()
Definition: interrupt.h:90
#define sei()
Definition: interrupt.h:76

avr-gcc v5.4 or v14 compile with optimisations switched on (-Os) to

00000112 <test2>:
 112:	bc 01       	movw	r22, r24
 114:	f8 94       	cli
 116:	8f ef       	ldi	r24, 0xFF	; 255
 118:	9f ef       	ldi	r25, 0xFF	; 255
 11a:	0e 94 96 00 	call	0x12c	; 0x12c <__udivmodhi4>
 11e:	70 93 01 02 	sts	0x0201, r23
 122:	60 93 00 02 	sts	0x0200, r22
 126:	78 94       	sei
 128:	08 95       	ret

where the potentially slow division is moved across cli(), resulting in interrupts to be disabled longer than intended. Note, that the volatile access occurs in order with respect to cli() or sei(); so the "net effect" required by the standard is achieved as intended, it is "only" the timing which is off. However, for most of embedded applications, timing is an important, sometimes critical factor.

See also
https://www.mikrocontroller.net/topic/65923

Unfortunately, at the moment, in avr-gcc (nor in the C standard), there is no mechanism to enforce complete match of written and executed code ordering — except maybe of switching the optimization completely off (-O0), or writing all the critical code in assembly.

Note
The artifact with the __udivmodhi4 function is specific to avr-gcc and how the compiler represents the division internally. On other target platforms that are using a library function for division or whatever expensive operation, this eccect will not occur. The reason is that avr-gcc does not represent the library call as a function call but rather like an ordinary instruction. Outcome is that the GCC middle-end concludes that the division is cheap (because the backend has an instruction for it) but in fact it's not.

A work around for the code from above would be to enforce that the division havvens prior to the cli():

val = 65535U / val;
__asm __volatile__ ("" : "+r" (val));
cli();
  • The volatile forces the asm statememt prior to the cli.
  • The asm has val as input operand, hence the division must be carried out prior to the asm because val is set by the division.

Notice that this work around does not work in general due to a variety of reasons:

  • The division might be located in an inlined function.
  • The variable might be read-only or may not be appropriate as an asm operand.
  • There may be more such instruction prior to the division, and it is not practical to treat all of them like this.

To sum it up:

  • volatile memory barriers don't ensure statements with no volatile accesses to be reordered across the barrier