AVR-LibC  2.2.0
Standard C library for AVR-GCC
 

AVR-LibC Documentation

Logo

AVR-LibC Development Pages

Main Page

User Manual

Library Reference

FAQ

Example Projects

File List

Loading...
Searching...
No Matches
Inline Assembler Cookbook

AVR-GCC
Inline Assembler Cookbook

About this Document

The GNU C/C++ compiler for AVR RISC processors offers to embed assembly language code into C/C++ programs. This cool feature may be used for manually optimizing time critical parts of the software, or to use specific processor instructions which are not available in the C language.

It's assumed that you are familiar with writing AVR assembler programs, because this is not an AVR assembler programming tutorial. It's not a C/C++ tutorial either.

Note that this document does not cover files written completely in assembly language, refer to AVR-LibC and Assembler Programs for this.

Copyright (C) 2001-2002 by egnite Software GmbH

Permission is granted to copy and distribute verbatim copies of this manual provided that the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.

This document describes version 4.7 of the compiler or newer.

Herne, 17th of May 2002 Harald Kipp harald.kipp-at-egnite.de

The Anatomy of a GCC asm Statement

A GCC inline assembly statement starts with the keyword asm, __asm or __asm__, where the first one is not available in strict ANSI mode.

In its simplest form, the inline assembly statement has no operands and injects just one instruction into the code stream, like in

__asm ("nop");

In its generic form, an asm statements can have one of the following three forms:

A simple asm without operands
__asm (code-string);

code-string is a string literal that will be added as is into the generated assembly code. This even applies to the % character. The only replacement is that \n and \t are interpreted as newline resp. TAB character.

This type of asm statement may occur at top level, outside any function as global asm. When its placement relative to functions is important, consider -fno-toplevel-reorder.

An asm with operands
__asm volatile (code-string : output-operands : input-operands : clobbers);

This is the most widely used form of an asm statement. It must be located in a function.

output-operands, input-operands and clobbers are comma-separated lists of operands resp. clobber specifications. Any of them may be empty, for example when the asm has no outputs. At least one : (colon) must be present, otherwise it will be a simple asm without operands and without % replacements.

An asm goto statement
__asm goto (code-string : : input-operands : clobbers : labels);
Like the asm above, but labels is a comma-separated list of C/C++ code labels which would be valid in a goto statement. And output-operands must be empty, because it is impossible to generate output reloads after the code has transferred control to one of the labels.
As there are no output operands, asm goto is implicitly volatile. When volatile is specified explicitly, the goto keyword may be placed after or before the volatile.

Notes on the various parts:

Volatility

Keyword volatile is optional and means that the asm statement has side effects that are not expressed in terms of the operands or clobbers. The asm statement must not be optimized away or reordered with respect to other volatile statements like volatile memory accesses or other volatile asm.

Any asm statement without output-operands is implicitly volatile.

A non-volatile asm statement with output operands that are all unused may be optimized away when all output operands are unused.

Instead of volatile, __volatile or __volatile__ can be used.

code-string

A string literal that contains the code that is to be injected in the assembly code generated by the compiler. %-expressions are replaced by the string representations of the operands, and the number of lines is determined to estimate the code size of the asm.
Apart from that, the compiler does not analyze the code provided in the code template.
This means that the code appears to the compiler as if it was executed in one parallel chunk, all at once. It is important to keep that in mind, in particular for cases where input and output operands may overlap.

output-operands
input-operands

A comma-separated list of operands, which may take the following forms. In any case, the first operand can be referred to as "%0" in code-string, the second one as "%1" etc.

"constraints" (expr)
expr is a C expression that's an input or output (or both) to the asm statement. An output expression must be an lvalue, i.e. it must be valid to assign a value to it.
"constraints" is a string literal with constraints and constraint modifiers. For example, constraint "r" stands for general-purpose register. A simple input operand would be
"r" (value + 1)
The compiler computes value + 1 and supplies it in some general-purpose register R2...R31. In many cases, an upper d-register R16...R31 is required for instructions like LDI or ANDI. A respective output operand specification is
"=d" (result)
Notice that this operand may overlap with input operands!
When an operand is written before all input operands are consumed, then in almost all cases the output operand requires an early-clobber modifier & so that it won't overlap with any input operand:
"=&d" (result)
An operand that's both an output and an input can be expressed with the + constraint modifier:
"+d" (result)
Such an operand is both output and input, and hence it won't overlap with other operands.
[name] "constraints" (expr)
Like above. In addition, a named operand can be referred to as %[name] in code-string. This is useful in long asm statements with many operands.

clobbers

A comma-separated list of string literals like "16", "r16" or "memory".

The first two clobbers mean that the asm destroys register R16. Only the lower-case form is allowed, and register names like Z are not recognized.

"memory" means that the asm touches memory in some way. When the asm writes to some RAM location for example, the compiler must not optimize RAM accesses across the asm because the memory may change.

Clobbering __tmp_reg__ by means of "r0" has no effect, but such a clobber may be added to indicate to the reader that the asm clobbers R0.

Clobbering __zero_reg__ by means of "r1" has no effect. When the asm destroys the zero register, for example by means of a MUL instruction, then the code must restore the register at the end by means of "clr __zero_reg__"

The size of an asm

The code size of an asm statement is the number of lines multiplied by 4 bytes, the maximal possible AVR instruction length. The length is needed when (conditional) jumps cross the asm statement in order to compute (upper bounds for) jump offsets of PC-relative jumps.

The number of lines is one plus the number of line breaks in code-string. These may be physical line breaks from \n characters and logical line breaks from $ characters.

Before we start with the first examples, we list all the bells and whistles that can be used to compose an inline assembly statement: special sequences, constraints, constraint modifiers, print modifiers and operand modifiers.

Special Sequences

There are special sequences that can be used in the assembly template.

Inline asm Special Sequences
Sequence Description
__SREG__ The I/O address of the status register SREG at 0x3F
__tmp_reg__ The temporary register R0 (R16 on reduced Tiny)
__zero_reg__ The zero register R1, always zero (R17 on reduced Tiny)
$ A logical line separator, used to separate multiple instruction in one physical line
\n A physical newline, used to separate multiple instructions
\t A TAB character, can be used for better legibility of the generated asm
\" A " character (double quote)
\\ A \ character (backslash)
%% A % charater (percent)
%~ "r" or "", used to construct call or rcall by means of "%~call", depending on the architecture
%! "" or "e", used to construct indirect calls like icall or eicall by means of "%!icall", depending on the architecture
%= A number that's unique for the compilation unit and the respective inline asm code, used to construct unique labels
Comment Description
; text A single-line assembly comment that extends to the end of the physical line
/* text */ A multi-line C comment
  • Moreover, the following I/O addresses are defined provided the device supports the respective SFR: __SP_L__, __SP_H__, __CCP__, __RAMPX__, __RAMPY__, __RAMPZ__, __RAMPD__.
  • Register __tmp_reg__ may be freely used by inline assembly code and need not be restored at the end of the code.
  • Register __zero_reg__ contains a value of zero. When that value is destroyed, for example by a MUL instruction, its value has to be restored at the end of the code by means of
    clr __zero_reg__
  • In inline asm without operands (i.e without a single colon), a % will always insert a single %. No %-codes are available.

Sequences like __SREG__ are not evaluated as part of the inline asm, they are just copied to the asm code as they are. At the top of each assembly file, the compiler prints definitions like

__SREG__ = 0x3f

so that they can also be used in inline assembly.

Constraints

The most up-to-date and detailed information on constraints for the AVR can be found in the avr-gcc Wiki.

Inline asm Operand Constraints
Constraint Registers Range
a Simple upper registers that support FMUL R16 ... R23
b Base pointer registers that support LDD, STD Y, Z (R28 ... R31)
d Upper registersR16 ... R31
e Pointer registers that support LD, ST X, Y, Z (R26 ... R31)
l Lower registersR2 ... R15
r Any registerR2 ... R31
w Upper registers that support ADIW R24 ... R31
x X pointer registersR26, R27
y Y pointer registersR28, R29
z Z pointer registersR30, R31
Constraint Constant Range
I 6-bit unsigned integer constant0 to 63
J 6-bit negative integer constant−63 to 0
M 8-bit unsigned integer constant0 to 255
n Integer constant
i Immediate value known at link-time, like the address of a variable in static storage
EF Floating-point constant
Ynn Fixed-point or integer constant
Constraint Explanation Notes
m A memory location
X Any valid operand
0 ... 9 Matches the respective operand number

  • Constraints without a modifier specify input operands.
  • Constraints with a modifier specify output operands.
  • More than one constraint like in "rn" specifies the union of the specified constraints; "r" and "n" in this case.
  • All constraints listed above are single-letter constraints, except Ynn which is a 3-letter constraint.

Constraint modifiers are:

Constraint Modifiers
Modifier Meaning
= Output-only operand. Without & it may overlap with input operands
+ Output operand that's also an input
=& "Early-clobber". Register should be used for output only and won't overlap with any input operand(s)

The selection of the proper constraint depends on the range of the constants or registers, which must be acceptable to the AVR instruction they are used with. The C compiler doesn't check any line of your assembler code. But it is able to check the constraint against your C expression. However, if you specify the wrong constraints, then the compiler may silently pass wrong code to the assembler. And, of course, the assembler will fail with some cryptic output or internal errors, or in the worst case wrong code may be the result.

For example, if you specify the constraint "r" and you are using this register with an ORI instruction, then the compiler may select any register. This will fail if the compiler chooses R2 to R15. (It will never choose R0 or R1, because these are uses for special purposes.) That's why the correct constraint in that case is "d". On the other hand, if you use the constraint "M", the compiler will make sure that you don't pass anything else but an 8-bit unsigned integer value known at compile-time.

The following table shows all AVR assembler mnemonics which require operands, and the related constraints.

AVR Instructions and Constraints
Mnemonic Constraints Mnemonic Constraints
adc r,r add r,r
adiw w,I and r,r
andi d,M asr r
bclr I bld r,I
brbc I,label brbs I,label
bset I bst r,I
call i cbi I,I
cbr d,I clr r
com r cp r,r
cpc r,r cpi d,M
cpse r,r dec r
elpm r,z eor r,r
fmul a,a fmuls a,a
fmulsu a,a in r,I
inc r jmp i
lac z,r las z,r
lat z,r ld r,e
ldd r,b ldi d,M
lds r,i lpm r,z
lsl r lsr r
mov r,r movw r,r
mul r,r muls r,r
mulsu a,a neg r
or r,r ori d,M
out I,r pop r
push r rcall i
rjmp i rol r
ror r sbc r,r
sbci d,M sbi I,I
sbic I,I sbiw w,I
sbr d,M sbrc r,I
sbrs r,I ser d
st e,r std b,r
sts i,r sub r,r
subi d,M swap r
tst r xch z,r

Print Modifiers

The %-operands in the inline assembly template can be adjusted by special print-modify characters. The one-letter modifier follows the % and precedes the operand number like in "%a0", or precedes the name in named operands like in "%a[address]".

Inline asm Print Modifiers
Modifier Number of
Arguments
Explanation Suitable
Constraints
%a0 1 Print pointer register as address X, Y or Z, like in "LD r0, %a0+" x, y, z, b, e
%i0 1 Print compile-time RAM address as I/O address, like in "OUT %i0, r0" with argument "n"(&SREG) n
%n0 1 Print the negative of a compile-time integer constant n
%r0 1 Print the register number of a register, like in "CLR %r0+7" for the MSB of a 64-bit register reg
%x0 1 Print a function name without gs() modifier, like in "%~CALL %x0" with argument "s"(main) s
%A0 1 Add 0 to the register number (no effect) reg
%B0 1 Add 1 to the register number reg
%C0 1 Add 2 to the register number reg
%D0 1 Add 3 to the register number reg
%T0%t1 2 Print the register that holds bit number %1 of register %0 reg + n
%T0%T1 2 Print operands suitable for BLD/BST, like in "BST %T0%T1", including the required , reg + n
  • Register constraints are: r, d, w, x, y, z, b, e, a, l.

Operand Modifiers

Assembly Code Operand Modifiers
Modifier Explanation Purpose
lo8() 1st Byte of a link-time constant, bits 0...7 Getting parts
of a byte-address
hi8() 2nd Byte of a link-time constant, bits 8...15
hlo8() 3rd Byte of a link-time constant, bits 16...23
hhi8() 4th Byte of a link-time constant, bits 24...31
hh8() Same like hlo8
pm_lo8() 1st Byte of a link-time constant divided by 2, bits 1...8 Getting parts
of a word-address
pm_hi8() 2nd Byte of a link-time constant divided by 2, bits 9...16
pm_hh8() 3rd Byte of a link-time constant divided by 2, bits 17...24
pm() Link-time constant divided by 2 in order to get a program memory (word) addresses, like in lo8(pm(main)) Word-address
gs() Function address divided by 2 in order to get a (word) addresses, like in lo8(gs(main)). Generate stub (trampoline) as needed. This is required to calculate the address of a code label on devices with more than 128 KiB of program memory that's supposed to be used in EICALL. For rationale, see the GCC documentation. On devices with less program memory, gs() behaves like pm() Function address
for [E]ICALL

When the argument of a modifier is not computable at assembler-time, then the assembler has to encode the expression in an abstract form using RELOCs. Consequence is that only a very limited number of argument expressions is supported when they are not computable at assembler-time.

Examples

Some examples show the assembly code as generated by the compiler. It's the code from the .s files as generated with option -save-temps. Adding the high-level source to the generated assembly can be turned on with -fverbose-asm since GCC v8.

Swapping Nibbles

The fist example uses the swap instruction to swap the nibbles of a byte. Input and output of swap are located in the same general purpose register. This means the input operand, operand 1 below, must be located in the same register(s) like operand 0, so that the right constraint for operand 1 is "0":

asm ("swap" : "=r" (value) : "0" (value));

All side effects of the code are described by the constraints and the clobbers, so that there is no need for this asm to be volatile. In particular, this asm may be optimized out when the output value is unused.
A shorter pattern to state that value is both input and output is by means of constraint modifier +

asm ("swap" : "+r" (value));

Swapping Bytes

Swapping nibbles was a piece of cake, so let's swap the bytes of a 16-bit value. In order to access the constituent bytes of the 16-bit input and output values, we use the print modifiers %A and %B.

The asm is placed in a small C test case so that we can inspect the resulting assembly code as generated by the compiler with -save-temps.

void callee (int, int);
void func (int param)
{
int swapped;
asm ("mov %A0, %B1" "\n\t"
"mov %B0, %A1"
: "=r" (swapped) : "r" (param));
callee (param, swapped);
}

The "\n\t" sequence adds a line feed that is required between the two instructions, and a TAB to align the two instructions in the generated assembly. There is no "\n\t" after the last instruction because that would just increase the size of the asm.
The generated assembly works as expected. The compiler wraps it in #APP / #NOAPP annotations:

func:
/* #APP */
mov r22, r25 ; swapped, param
mov r23, r24 ; swapped, param
/* #NOAPP */
jmp callee

Wrong! While the generated code above is correct, the inline asm itself is not!
We see this with a slightly adjusted test case where the arguments of callee have been swapped, but that uses the same inline asm:

void func (int param)
{
int swapped;
asm ("mov %A0, %B1" "\n\t"
"mov %B0, %A1"
: "=r" (swapped) : "r" (param));
callee (swapped, param);
}

The result is the following assembly:

func:
movw r22,r24
/* #APP */
mov r24, r25 ; swapped, param
mov r25, r24 ; swapped, param
/* #NOAPP */
jmp callee

which is obviously wrong, because after the code from the inline asm, the low byte of swapped and the high byte will always have the same value of r25.

The reason is that the output operand overlaps the input, and the output is changed before all of the input operands are consumed. This is a so-called early-clobber situation. There are two possible solutions to this predicament:

  • Mark the output operand with the early-clobber constraint modifier:
    asm ("mov %A0, %B1" "\n\t"
    "mov %B0, %A1"
    : "=&r" (swapped) : "r" (param));
  • Use constraints and a code sequence that expect input and output in the same registers:
    asm ("eor %A0, %B0" "\n\t"
    "eor %B0, %A0" "\n\t"
    "eor %A0, %B0"
    : "=r" (swapped) : "0" (param));

Accessing Memory

Accessing memory requires that the AVR instructions that perform the memory access are provided with the appropriate memory address.

  1. The address can be provided directly, like __SREG__, 0x3f, as a symbol, or as a symbol plus a constant offset.
  2. Provide the address by means of an inline asm operand.

Approach 1 is simpler as it does not require an asm operand, while approach 2 is in many cases more powerful because macros defined per, say, #include <avr/io.h> can be used as operands, whereas such headers are not included in the assembly code as generated by the compiler.

Reading a SFR like PORTB can be performed by

asm volatile ("in %0, %1" : "=r" (result) : "I" _SFR_IO_ADDR (PORTB));

Macro _SFR_IO_ADDR is provided by avr/sfr_defs.h which is included by avr/io.h.

Since GCC v4.7, print modifier %i is supported, which prints RAM addresses like & PORTB as an I/O address:

asm volatile ("in %0, %i1" : "=r" (result) : "I" (& PORTB));

When the address is not an I/O address, then LDS or LD must be used, depending on whether the address is known at link-time or only at run-time. For example, the following macro provides the functionality to clear an SFR. The code discriminates between the possibilities that

  • The SFR address is known at compile-time and is an I/O address.
  • The SFR address is known at compile-time but is not in the I/O range.
  • The SFR address is not known at compile-time.
#include <avr/io.h>
#define CLEAR_REG(sfr) \
do { \
if (__builtin_constant_p (& (sfr)) \
&& _SFR_IO_REG_P (sfr)) \
asm volatile ("out %i0, __zero_reg__" \
:: "I" (& (sfr)) : "memory"); \
else if (__builtin_constant_p (& (sfr))) \
asm volatile ("sts %0, __zero_reg__" \
:: "n" (& (sfr)) : "memory"); \
else \
asm volatile ("st %a0, __zero_reg__" \
:: "e" (& (sfr)) : "memory"); \
} while (0)

The last case with constraint "e" works because &sfr is a 16-bit value, and 16-bit values (and larger) start in even registers. Therefore, the address will be located in R27:R26, R29:R28 or in R31:R30, which print modifier %a will print as X, Y or Z, respectively. The address will never end up in, say, R30:R29.

The test case

void clear_3_regs (uint8_t volatile *psfr)
{
CLEAR_REG (PORTB);
CLEAR_REG (UDR0);
CLEAR_REG (*psfr);
}
unsigned char uint8_t
Definition: stdint.h:83

compiles for ATmega328 and with optimization turned on to

clear_3_regs:
movw r30,r24
/* #APP */
out 0x5, __zero_reg__
sts 198, __zero_reg__
st Z, __zero_reg__ ; psfr
/* #NOAPP */
ret

As __builtin_constant_p is used to infer whether the address of the SFR is known at compile-time, extra care must be taken when the functionality is implemented as an inline function:

static inline __attribute__((__always_inline__))
void clear_reg (uint8_t volatile *psfr)
{
// !!! The following cast is required to make __builtin_constant_p
// !!! work as expected in the inline function.
uintptr_t addr = (uintptr_t) psfr;
if (__builtin_constant_p (addr)
&& _SFR_IO_REG_P (* psfr))
asm volatile ("out %i0, __zero_reg__"
:: "I" (addr) : "memory");
else if (__builtin_constant_p (addr))
asm volatile ("sts %0, __zero_reg__"
:: "n" (addr) : "memory");
else
asm volatile ("st %a0, __zero_reg__"
:: "e" (addr) : "memory");
}
void clear_3_pregs (uint8_t volatile *psfr)
{
clear_reg (& PORTB);
clear_reg (& UDR0);
clear_reg (psfr);
}
uint16_t uintptr_t
Definition: stdint.h:151

Casting the address psfr to an integer type in the inline function is required so that the compiler will recognize constant addresses.
Also notice that we have to pass the address of the SFR to the inline function. Passing the SFR directly like in the marco approach won't work for obvious reasons.

Accessing Bytes of wider Expressions

Finally, an example that atomically increments a 16-bit integer. The code is wrapped in IN SREG / CLI / OUT SREG to make it atomic. It reads the 16-bit value data from its absolute address, increments it and then writes it back:

uint16_t volatile data;
void inc_data (void)
{
uint16_t tmp;
asm volatile ("in __tmp_reg__, __SREG__" "\n\t"
"cli" "\n\t"
"lds %A[temp], %[addr]" "\n\t"
"lds %B[temp], %[addr]+1" "\n\t"
#ifdef __AVR_TINY__
// Reduced Tiny does not have ADIW.
"subi %A[temp], lo8(-1)" "\n\t"
"sbci %B[temp], hi8(-1)" "\n\t"
#else
"adiw %[temp], 1" "\n\t"
#endif
"sts %[addr]+1, %B[temp]" "\n\t"
"sts %[addr], %A[temp]" "\n\t"
"out __SREG__, __tmp_reg__"
#ifdef __AVR_TINY__
// No need to restrict tmp to a "w" register. And on
// avr-gcc v13.2 and older, "w" contains no regs.
: [temp] "=d" (tmp), "+m" (data)
#else
: [temp] "=w" (tmp), "+m" (data)
#endif
: [addr] "i" (& data));
}
unsigned int uint16_t
Definition: stdint.h:93

Notice there are three different ways required to access the different bytes of the involved 16-bit entities:

  • For the 16-bit general purpose register %[temp], print modifiers %A and %B are used.
  • For the 16-bit value data in static storage, %[addr]+1 is used to access the high byte. The resulting expression data+1 is computable at link-time and evaluated by the linker.
  • In the compilation variant for Reduced Tiny, the bytes of the 16-bit subtrahend −1 are accessed with the operand modifiers lo8 and hi8 that are evaluated by the assembler because −1 is known at assembler-time.

data is located in static storage, hence its address is known to the linker and fits constraint "i".

The sole purpose of operand "+m" (data) is to describe the effect of the asm on data memory: It changes data. Notice that there is no "memory" clobber, because that operand already describes all memory side effects, and it does this in a less intrusive way than a catch-all "memory". The operand is not used in the asm template; but in principle it would be possible to use it as operand with LDS and STS instead of operand [addr] "i" (& data). However, there are many situations where a memory operand constrained by "m" takes a form that cannot be used with AVR instructions because there are no matching print modifiers, or because it is not known a priori what specific form the memory operand takes. In such cases, one would take the address of the operand and supply it as address in a pointer register to the inline asm. The compiler generates the required instructions for address computation, and the inline asm knows that it can use LD and ST.

Jumping and Branching

When an inline asm contains jumps, then it also requires labels. When the label is inside the asm, then care must be taken that the label is unique in the compilation unit even when the inline asm is used multiple times, e.g. when the code is located in an unrolled loop or a function has multiple incarnations due to cloning, or simply because a macro or inline function that contains an asm statement is used more than once.
There are two kinds of labels that can be used:

  • Local labels of the form n: where n is some (small, non-negative) number. They can be targeted by means of nb or nf, depending on whether the jump direction is backwards or forwards. Such a numeric labels may be present more than once. The taken label is the first one with the specified number in the respective direction:
    // Loop until bit PORTB.7 is set.
    asm volatile ("1: sbrs %i[sfr], %[bitno]" "\n\t"
    "rjmp 1b"
    :: [sfr] "I" (& PORTB), [bitno] "n" (PB7));
  • Local labels that contain the sequence %= which yields some number that's unique amongst all asm incarnations in the respective compilation unit:
    // Loop until bit PORTB.7 is set.
    asm volatile (".Loop.%=: sbrs %i[sfr], %[bitno]" "\n\t"
    "rjmp .Loop.%="
    :: [sfr] "I" (& PORTB), [bitno] "n" (PB7));

Which form is used is a matter of taste. In practice, the first variant is often preferred in short sequences, whereas the second form is usually seen in longer algorithms.

For labels that are defined in the surrounding C/C++ code, asm goto has to be used. The print modifier %x0 prints panic as a raw label, not as gs(panic) like it would be the case with %0.

int main (void)
{
asm goto ("tst __zero_reg__" "\n\t"
"brne %x0"
:::: panic);
/* ...Application code here... */
return 0;
panic:
// __zero_reg__ is supposed to contain 0, but doesn't.
return 1;
}

This assumes that the jump offset can be encoded in the brne instruction in all situations. When static analysis cannot prove that the jump offset fits, then a jumpity jump has to be used:

asm goto ("tst __zero_reg__" "\n\t"
"breq 1f" "\n\t"
"%~jmp %x0" "\n"
"1: ;; all fine"
:::: panic);

Sequence "%~jmp" yields "rjmp" or "jmp" depending on the architecture. Notice that a jmp can be relaxed to an rjmp with option -mrelax provided the jump offset fits.

Binding local Variables to Registers

One use of GCC's asm keyword is to bind local register variables to hardware registers.
Such bindings of local variables to registers are only guaranteed during inline asm which has these variables as operands.

Interfacing non-ABI Functions

Suppose we want to interface a non-ABI assembly function mul_8_16 that multiplies R24 with R27:R26, clobbers R0, R1 and R25, and returns the 24-bit result in R20:R19:R18. One way to implement such an interface would be to provide an assembly function that performs the required copying and call to mul_8_16. Such a function would destroy some of the performance gain obtained by using assembly for mul_8_16: Additional copying back and forth and extra CALL and RET instructions.

The compiler comes to the rescue. We can bind local variables to the required registers:

extern void mul_8_16 (void); // Non-ABI function. Don't call in C/C++!
static inline __attribute__((__always_inline__))
__uint24 mul_8_16_gccabi (uint8_t val8, uint16_t val16)
{
register uint8_t r24 __asm("r24") = val8;
register __uint24 r18 __asm("r18");
asm ("%~call %x[func]" "\n\t"
"clr __zero_reg__"
: "=r" (r18)
: "r" (r24), "x" (val16), [func] "i" (mul_8_16)
: "r25", "r0");
return r18;
}
  • The 8-bit parameter is bound to R24, and the 24-bit return value is bound to R18...R20.
  • The register keyword is mandatory.
  • The hard register is specified as a string literal for the lower case register name or register number, like "18" or "r18". Specifications like "R18", 18 or "Z" are not supported.
  • The 16-bit parameter of mul_8_16 happens to be required in R27:R26, which is the X register for which there is register constraint "x". Therefore, no register binding is required for val16.
  • As mul_8_16 clobbers the zero register R1, it has to be restored by means of
    clr __zero_reg__
  • The asm is pure arithmetic and hence not volatile. (It might be advisable to make it volatile anyway, so that it won't be reorderd across sei() or cli() instructions.)

Let's have a look at how this performs in a test case:

void use_mul_8_16_gccabi (uint8_t val, uint8_t a, uint8_t b)
{
if (mul_8_16_gccabi (val, a * b) >= 0x2010)
__builtin_abort();
}

For ATmega8 we get the following assembly:

use_mul_8_16_gccabi:
mul r22,r20
movw r26,r0
clr __zero_reg__
/* #APP */
rcall mul_8_16
clr __zero_reg__
/* #NOAPP */
cpi r18,16
sbci r19,32
cpc r20,__zero_reg__
brlo .L1
rcall abort
.L1:
ret
void abort(void)
Definition: abort.c:34

No superfluous register moves. Great!

Specifying the Assembly Name of Static Objects

Sometimes, it is desirable to use a different name for an object or function rather than the (mangled) name from the C/C++ implementation. Just add an asm specifier with the desired name as a string literal at the end of the declaration.

For example, this is how avr/eeprom.h implements the eeprom_read_double() function:

#if __SIZEOF_DOUBLE__ == 4
double eeprom_read_double (const double*) __asm("eeprom_read_dword");
#elif __SIZEOF_DOUBLE__ == 8
double eeprom_read_double (const double*) __asm("eeprom_read_qword");
#endif
double eeprom_read_double(const double *__p)
  • It uses the implementation of eeprom_read_dword for eeprom_read_double, provided double is a 32-bit type.
  • It uses the implementation of eeprom_read_qword for 64-bit doubles.

What won't work

GCC inline asm has some limitations.

Setting a Register on one asm and using it in a different one

Sequences like the following are not supposed to work:

char var;
void set_var (char c)
{
__asm ("inc r24");
__asm ("sts var, r24");
}
  • There is no guarantee whatsoever that the value in R24 will survive from one asm to the next. Such code might work in many situations, but it is still wrong and the compiler may very well put instructions bewtween the asm statements that change R24 prior to the first asm and also between the asm statements.
  • R24 is changed without noticing the compiler. When R24 contains other data, then that data will be trashed.

A correct code would be

__asm ("inc %0" "\n\t"
"sts var, %0"
:: "r" (c) : "memory");

or

__asm ("inc %1" "\n\t"
"sts %0, %1"
: "=m" (var) : "r" (c));

Letting an Operand cross the Boundaries of the Y Register

It is not possible to bind a value to a local register variable that crosses the boundaries of the Y register. For example, trying to bind a 32-bit value to R31:R28 by means of

register uint32_t r28 __asm ("28");
unsigned long int uint32_t
Definition: stdint.h:103

will result in an error message like

error: register specified for 'r28' isn't suitable for data type

Similarly, an operand described by a constraint will be located either completely below the Y register, as part of Y register, or above it.

Using Matching Constraints "=0"..."=9" with Output Operands

Suppose we want an inline asm that returns the low byte of a 16-bit value val16:

asm ("" : "=1" (lo8) : "r" (val16));

The diagnostic will be:

error: matching constraint not valid in output operand