AVR-LibC
2.2.0
Standard C library for AVR-GCC
|
AVR-LibC Documentation |
AVR-LibC Development Pages |
||||
Main Page |
User Manual |
Library Reference |
FAQ |
Example Projects |
File List |
AVR-GCC
Inline Assembler Cookbook
The GNU C/C++ compiler for AVR RISC processors offers to embed assembly language code into C/C++ programs. This cool feature may be used for manually optimizing time critical parts of the software, or to use specific processor instructions which are not available in the C language.
It's assumed that you are familiar with writing AVR assembler programs, because this is not an AVR assembler programming tutorial. It's not a C/C++ tutorial either.
Note that this document does not cover files written completely in assembly language, refer to AVR-LibC and Assembler Programs for this.
Copyright (C) 2001-2002 by egnite Software GmbH
Permission is granted to copy and distribute verbatim copies of this manual provided that the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
This document describes version 4.7 of the compiler or newer.
Herne, 17th of May 2002 Harald Kipp harald.kipp-at-egnite.de
A GCC inline assembly statement starts with the keyword asm
, __asm
or __asm__
, where the first one is not available in strict ANSI mode.
In its simplest form, the inline assembly statement has no operands and injects just one instruction into the code stream, like in
In its generic form, an asm statements can have one of the following three forms:
code-string
is a string literal that will be added as is into the generated assembly code. This even applies to the % character. The only replacement is that \n and \t are interpreted as newline resp. TAB character.
This type of asm statement may occur at top level, outside any function as global asm. When its placement relative to functions is important, consider -fno-toplevel-reorder
.
This is the most widely used form of an asm statement. It must be located in a function.
output-operands
, input-operands
and clobbers
are comma-separated lists of operands resp. clobber specifications. Any of them may be empty, for example when the asm has no outputs. At least one : (colon) must be present, otherwise it will be a simple asm without operands and without
%
replacements.
labels
is a comma-separated list of C/C++ code labels which would be valid in a goto
statement. And output-operands
must be empty, because it is impossible to generate output reloads after the code has transferred control to one of the labels. volatile
is specified explicitly, the goto
keyword may be placed after or before the volatile
. Notes on the various parts:
Keyword volatile
is optional and means that the asm statement has side effects that are not expressed in terms of the operands or clobbers. The asm statement must not be optimized away or reordered with respect to other volatile statements like volatile memory accesses or other volatile asm.
Any asm statement without output-operands
is implicitly volatile.
A non-volatile asm statement with output operands that are all unused may be optimized away when all output operands are unused.
Instead of volatile
, __volatile
or __volatile__
can be used.
code-string
A string literal that contains the code that is to be injected in the assembly code generated by the compiler. %-expressions
are replaced by the string representations of the operands, and the number of lines is determined to estimate the code size of the asm.
Apart from that, the compiler does not analyze the code provided in the code template.
This means that the code appears to the compiler as if it was executed in one parallel chunk, all at once. It is important to keep that in mind, in particular for cases where input and output operands may overlap.
output-operands
input-operands
A comma-separated list of operands, which may take the following forms. In any case, the first operand can be referred to as "%0"
in code-string
, the second one as "%1"
etc.
"constraints" (expr)
expr
is a C expression that's an input or output (or both) to the asm statement. An output expression must be an lvalue, i.e. it must be valid to assign a value to it. "constraints"
is a string literal with constraints and constraint modifiers. For example, constraint "r"
stands for general-purpose register. A simple input operand would be value + 1
and supplies it in some general-purpose register R2...R31. In many cases, an upper d-register R16...R31 is required for instructions like LDI
or ANDI
. A respective output operand specification is &
so that it won't overlap with any input operand: +
constraint modifier: [name] "constraints" (expr)
%[name]
in code-string
. This is useful in long asm statements with many operands. clobbers
A comma-separated list of string literals like "16"
, "r16" or "memory".
The first two clobbers mean that the asm destroys register R16. Only the lower-case form is allowed, and register names like Z are not recognized.
"memory"
means that the asm touches memory in some way. When the asm writes to some RAM location for example, the compiler must not optimize RAM accesses across the asm because the memory may change.
Clobbering __tmp_reg__
by means of "r0"
has no effect, but such a clobber may be added to indicate to the reader that the asm clobbers R0.
Clobbering __zero_reg__
by means of "r1"
has no effect. When the asm destroys the zero register, for example by means of a MUL
instruction, then the code must restore the register at the end by means of "clr __zero_reg__"
The code size of an asm statement is the number of lines multiplied by 4 bytes, the maximal possible AVR instruction length. The length is needed when (conditional) jumps cross the asm statement in order to compute (upper bounds for) jump offsets of PC-relative jumps.
The number of lines is one plus the number of line breaks in code-string
. These may be physical line breaks from \n
characters and logical line breaks from $
characters.
Before we start with the first examples, we list all the bells and whistles that can be used to compose an inline assembly statement: special sequences, constraints, constraint modifiers, print modifiers and operand modifiers.
There are special sequences that can be used in the assembly template.
Sequence | Description |
---|---|
__SREG__ | The I/O address of the status register SREG at 0x3F |
__tmp_reg__ | The temporary register R0 (R16 on reduced Tiny) |
__zero_reg__ | The zero register R1, always zero (R17 on reduced Tiny) |
$ | A logical line separator, used to separate multiple instruction in one physical line |
\n | A physical newline, used to separate multiple instructions |
\t | A TAB character, can be used for better legibility of the generated asm |
\" | A " character (double quote) |
\\ | A \ character (backslash) |
%% | A % charater (percent) |
%~ | "r" or "" , used to construct call or rcall by means of "%~call" , depending on the architecture |
%! | "" or "e" , used to construct indirect calls like icall or eicall by means of "%!icall" , depending on the architecture |
%= | A number that's unique for the compilation unit and the respective inline asm code, used to construct unique labels |
Comment | Description |
; text | A single-line assembly comment that extends to the end of the physical line |
/* text */ | A multi-line C comment |
__SP_L__
, __SP_H__
, __CCP__
, __RAMPX__
, __RAMPY__
, __RAMPZ__
, __RAMPD__
.__tmp_reg__
may be freely used by inline assembly code and need not be restored at the end of the code.__zero_reg__
contains a value of zero. When that value is destroyed, for example by a MUL
instruction, its value has to be restored at the end of the code by means of %
will always insert a single %
. No %-codes
are available.Sequences like __SREG__
are not evaluated as part of the inline asm, they are just copied to the asm code as they are. At the top of each assembly file, the compiler prints definitions like
so that they can also be used in inline assembly.
The most up-to-date and detailed information on constraints for the AVR can be found in the avr-gcc Wiki.
Constraint | Registers | Range |
---|---|---|
a | Simple upper registers that support FMUL | R16 ... R23 |
b | Base pointer registers that support LDD , STD | Y, Z (R28 ... R31) |
d | Upper registers | R16 ... R31 |
e | Pointer registers that support LD , ST | X, Y, Z (R26 ... R31) |
l | Lower registers | R2 ... R15 |
r | Any register | R2 ... R31 |
w | Upper registers that support ADIW | R24 ... R31 |
x | X pointer registers | R26, R27 |
y | Y pointer registers | R28, R29 |
z | Z pointer registers | R30, R31 |
Constraint | Constant | Range |
I | 6-bit unsigned integer constant | 0 to 63 |
J | 6-bit negative integer constant | −63 to 0 |
M | 8-bit unsigned integer constant | 0 to 255 |
n | Integer constant | |
i | Immediate value known at link-time, like the address of a variable in static storage | |
EF | Floating-point constant | |
Ynn | Fixed-point or integer constant | |
Constraint | Explanation | Notes |
m | A memory location | |
X | Any valid operand | |
0 ... 9 | Matches the respective operand number |
"rn"
specifies the union of the specified constraints; "r"
and "n"
in this case.Ynn
which is a 3-letter constraint.Constraint modifiers are:
Modifier | Meaning |
---|---|
= | Output-only operand. Without & it may overlap with input operands |
+ | Output operand that's also an input |
=& | "Early-clobber". Register should be used for output only and won't overlap with any input operand(s) |
The selection of the proper constraint depends on the range of the constants or registers, which must be acceptable to the AVR instruction they are used with. The C compiler doesn't check any line of your assembler code. But it is able to check the constraint against your C expression. However, if you specify the wrong constraints, then the compiler may silently pass wrong code to the assembler. And, of course, the assembler will fail with some cryptic output or internal errors, or in the worst case wrong code may be the result.
For example, if you specify the constraint "r"
and you are using this register with an ORI
instruction, then the compiler may select any register. This will fail if the compiler chooses R2
to R15
. (It will never choose R0
or R1
, because these are uses for special purposes.) That's why the correct constraint in that case is "d"
. On the other hand, if you use the constraint "M"
, the compiler will make sure that you don't pass anything else but an 8-bit unsigned integer value known at compile-time.
The following table shows all AVR assembler mnemonics which require operands, and the related constraints.
Mnemonic | Constraints | Mnemonic | Constraints | |
---|---|---|---|---|
adc | r,r | add | r,r | |
adiw | w,I | and | r,r | |
andi | d,M | asr | r | |
bclr | I | bld | r,I | |
brbc | I,label | brbs | I,label | |
bset | I | bst | r,I | |
call | i | cbi | I,I | |
cbr | d,I | clr | r | |
com | r | cp | r,r | |
cpc | r,r | cpi | d,M | |
cpse | r,r | dec | r | |
elpm | r,z | eor | r,r | |
fmul | a,a | fmuls | a,a | |
fmulsu | a,a | in | r,I | |
inc | r | jmp | i | |
lac | z,r | las | z,r | |
lat | z,r | ld | r,e | |
ldd | r,b | ldi | d,M | |
lds | r,i | lpm | r,z | |
lsl | r | lsr | r | |
mov | r,r | movw | r,r | |
mul | r,r | muls | r,r | |
mulsu | a,a | neg | r | |
or | r,r | ori | d,M | |
out | I,r | pop | r | |
push | r | rcall | i | |
rjmp | i | rol | r | |
ror | r | sbc | r,r | |
sbci | d,M | sbi | I,I | |
sbic | I,I | sbiw | w,I | |
sbr | d,M | sbrc | r,I | |
sbrs | r,I | ser | d | |
st | e,r | std | b,r | |
sts | i,r | sub | r,r | |
subi | d,M | swap | r | |
tst | r | xch | z,r |
The %-operands in the inline assembly template can be adjusted by special print-modify characters. The one-letter modifier follows the %
and precedes the operand number like in "%a0"
, or precedes the name in named operands like in "%a[address]"
.
Modifier | Number of Arguments | Explanation | Suitable Constraints |
---|---|---|---|
%a0 | 1 | Print pointer register as address X , Y or Z , like in "LD r0, %a0+" | x , y , z , b , e |
%i0 | 1 | Print compile-time RAM address as I/O address, like in "OUT %i0, r0" with argument "n"(&SREG) | n |
%n0 | 1 | Print the negative of a compile-time integer constant | n |
%r0 | 1 | Print the register number of a register, like in "CLR %r0+7" for the MSB of a 64-bit register | reg |
%x0 | 1 | Print a function name without gs() modifier, like in "%~CALL %x0" with argument "s"(main) | s |
%A0 | 1 | Add 0 to the register number (no effect) | reg |
%B0 | 1 | Add 1 to the register number | reg |
%C0 | 1 | Add 2 to the register number | reg |
%D0 | 1 | Add 3 to the register number | reg |
%T0%t1 | 2 | Print the register that holds bit number %1 of register %0 | reg + n |
%T0%T1 | 2 | Print operands suitable for BLD /BST , like in "BST %T0%T1" , including the required , | reg + n |
r
, d
, w
, x
, y
, z
, b
, e
, a
, l
.Modifier | Explanation | Purpose |
---|---|---|
lo8() | 1st Byte of a link-time constant, bits 0...7 | Getting parts of a byte-address |
hi8() | 2nd Byte of a link-time constant, bits 8...15 | |
hlo8() | 3rd Byte of a link-time constant, bits 16...23 | |
hhi8() | 4th Byte of a link-time constant, bits 24...31 | |
hh8() | Same like hlo8 | |
pm_lo8() | 1st Byte of a link-time constant divided by 2, bits 1...8 | Getting parts of a word-address |
pm_hi8() | 2nd Byte of a link-time constant divided by 2, bits 9...16 | |
pm_hh8() | 3rd Byte of a link-time constant divided by 2, bits 17...24 | |
pm() | Link-time constant divided by 2 in order to get a program memory (word) addresses, like in lo8(pm(main)) | Word-address |
gs() | Function address divided by 2 in order to get a (word) addresses, like in lo8(gs(main)) . Generate stub (trampoline) as needed. This is required to calculate the address of a code label on devices with more than 128 KiB of program memory that's supposed to be used in EICALL . For rationale, see the GCC documentation. On devices with less program memory, gs() behaves like pm() | Function address for [E]ICALL |
When the argument of a modifier is not computable at assembler-time, then the assembler has to encode the expression in an abstract form using RELOCs. Consequence is that only a very limited number of argument expressions is supported when they are not computable at assembler-time.
Some examples show the assembly code as generated by the compiler. It's the code from the .s
files as generated with option -save-temps
. Adding the high-level source to the generated assembly can be turned on with -fverbose-asm
since GCC v8.
The fist example uses the swap
instruction to swap the nibbles of a byte. Input and output of swap
are located in the same general purpose register. This means the input operand, operand 1 below, must be located in the same register(s) like operand 0, so that the right constraint for operand 1 is "0"
:
All side effects of the code are described by the constraints and the clobbers, so that there is no need for this asm to be volatile. In particular, this asm may be optimized out when the output value is unused.
A shorter pattern to state that value
is both input and output is by means of constraint modifier +
Swapping nibbles was a piece of cake, so let's swap the bytes of a 16-bit value. In order to access the constituent bytes of the 16-bit input and output values, we use the print modifiers %A
and %B
.
The asm is placed in a small C test case so that we can inspect the resulting assembly code as generated by the compiler with -save-temps
.
The "\n\t"
sequence adds a line feed that is required between the two instructions, and a TAB to align the two instructions in the generated assembly. There is no "\n\t"
after the last instruction because that would just increase the size of the asm.
The generated assembly works as expected. The compiler wraps it in #APP
/ #NOAPP
annotations:
Wrong! While the generated code above is correct, the inline asm itself is not!
We see this with a slightly adjusted test case where the arguments of callee
have been swapped, but that uses the same inline asm:
The result is the following assembly:
which is obviously wrong, because after the code from the inline asm, the low byte of swapped
and the high byte will always have the same value of r25
.
The reason is that the output operand overlaps the input, and the output is changed before all of the input operands are consumed. This is a so-called early-clobber situation. There are two possible solutions to this predicament:
Accessing memory requires that the AVR instructions that perform the memory access are provided with the appropriate memory address.
__SREG__
, 0x3f, as a symbol, or as a symbol plus a constant offset.Approach 1 is simpler as it does not require an asm operand, while approach 2 is in many cases more powerful because macros defined per, say, #include <avr/io.h>
can be used as operands, whereas such headers are not included in the assembly code as generated by the compiler.
Reading a SFR like PORTB
can be performed by
Macro _SFR_IO_ADDR
is provided by avr/sfr_defs.h
which is included by avr/io.h
.
Since GCC v4.7, print modifier %i
is supported, which prints RAM addresses like & PORTB
as an I/O address:
When the address is not an I/O address, then LDS
or LD
must be used, depending on whether the address is known at link-time or only at run-time. For example, the following macro provides the functionality to clear an SFR. The code discriminates between the possibilities that
The last case with constraint "e"
works because &sfr
is a 16-bit value, and 16-bit values (and larger) start in even registers. Therefore, the address will be located in R27:R26, R29:R28 or in R31:R30, which print modifier %a
will print as X, Y or Z, respectively. The address will never end up in, say, R30:R29.
The test case
compiles for ATmega328 and with optimization turned on to
As __builtin_constant_p
is used to infer whether the address of the SFR is known at compile-time, extra care must be taken when the functionality is implemented as an inline function:
Casting the address psfr
to an integer type in the inline function is required so that the compiler will recognize constant addresses.
Also notice that we have to pass the address of the SFR to the inline function. Passing the SFR directly like in the marco approach won't work for obvious reasons.
Finally, an example that atomically increments a 16-bit integer. The code is wrapped in IN SREG
/ CLI
/ OUT SREG
to make it atomic. It reads the 16-bit value data
from its absolute address, increments it and then writes it back:
Notice there are three different ways required to access the different bytes of the involved 16-bit entities:
%
[temp], print modifiers %A
and %B
are used.data
in static storage, %[addr]+1
is used to access the high byte. The resulting expression data+1
is computable at link-time and evaluated by the linker.−1
are accessed with the operand modifiers lo8
and hi8
that are evaluated by the assembler because −1
is known at assembler-time.data
is located in static storage, hence its address is known to the linker and fits constraint "i"
.
The sole purpose of operand "+m" (data)
is to describe the effect of the asm on data memory: It changes data
. Notice that there is no "memory"
clobber, because that operand already describes all memory side effects, and it does this in a less intrusive way than a catch-all "memory"
. The operand is not used in the asm template; but in principle it would be possible to use it as operand with LDS
and STS
instead of operand [addr] "i" (& data)
. However, there are many situations where a memory operand constrained by "m"
takes a form that cannot be used with AVR instructions because there are no matching print modifiers, or because it is not known a priori what specific form the memory operand takes. In such cases, one would take the address of the operand and supply it as address in a pointer register to the inline asm. The compiler generates the required instructions for address computation, and the inline asm knows that it can use LD
and ST
.
When an inline asm contains jumps, then it also requires labels. When the label is inside the asm, then care must be taken that the label is unique in the compilation unit even when the inline asm is used multiple times, e.g. when the code is located in an unrolled loop or a function has multiple incarnations due to cloning, or simply because a macro or inline function that contains an asm statement is used more than once.
There are two kinds of labels that can be used:
n:
where n is some (small, non-negative) number. They can be targeted by means of nb
or nf
, depending on whether the jump direction is backwards or forwards. Such a numeric labels may be present more than once. The taken label is the first one with the specified number in the respective direction: %=
which yields some number that's unique amongst all asm incarnations in the respective compilation unit: Which form is used is a matter of taste. In practice, the first variant is often preferred in short sequences, whereas the second form is usually seen in longer algorithms.
For labels that are defined in the surrounding C/C++ code, asm goto has to be used. The print modifier %x0
prints panic
as a raw label, not as gs(panic)
like it would be the case with %0
.
This assumes that the jump offset can be encoded in the brne
instruction in all situations. When static analysis cannot prove that the jump offset fits, then a jumpity jump has to be used:
Sequence "%~jmp"
yields "rjmp"
or "jmp"
depending on the architecture. Notice that a jmp
can be relaxed to an rjmp
with option -mrelax
provided the jump offset fits.
One use of GCC's asm
keyword is to bind local register variables to hardware registers.
Such bindings of local variables to registers are only guaranteed during inline asm which has these variables as operands.
Suppose we want to interface a non-ABI assembly function mul_8_16
that multiplies R24 with R27:R26, clobbers R0, R1 and R25, and returns the 24-bit result in R20:R19:R18. One way to implement such an interface would be to provide an assembly function that performs the required copying and call to mul_8_16
. Such a function would destroy some of the performance gain obtained by using assembly for mul_8_16:
Additional copying back and forth and extra CALL
and RET
instructions.
The compiler comes to the rescue. We can bind local variables to the required registers:
register
keyword is mandatory."18"
or "r18"
. Specifications like "R18"
, 18
or "Z"
are not supported.mul_8_16
happens to be required in R27:R26, which is the X register for which there is register constraint "x"
. Therefore, no register binding is required for val16
.mul_8_16
clobbers the zero register R1, it has to be restored by means of sei()
or cli()
instructions.)Let's have a look at how this performs in a test case:
For ATmega8 we get the following assembly:
No superfluous register moves. Great!
Sometimes, it is desirable to use a different name for an object or function rather than the (mangled) name from the C/C++ implementation. Just add an asm specifier with the desired name as a string literal at the end of the declaration.
For example, this is how avr/eeprom.h
implements the eeprom_read_double()
function:
eeprom_read_dword
for eeprom_read_double
, provided double
is a 32-bit type.eeprom_read_qword
for 64-bit doubles.GCC inline asm has some limitations.
Sequences like the following are not supposed to work:
A correct code would be
or
It is not possible to bind a value to a local register variable that crosses the boundaries of the Y register. For example, trying to bind a 32-bit value to R31:R28 by means of
will result in an error message like
Similarly, an operand described by a constraint will be located either completely below the Y register, as part of Y register, or above it.
Suppose we want an inline asm that returns the low byte of a 16-bit value val16:
The diagnostic will be: