Introduction

So you have some constant data and you're running out of room to store it? Many AVRs have limited amount of RAM in which to store data, but may have more Flash space available. The AVR is a Harvard architecture processor, where Flash is used for the program, RAM is used for data, and they each have separate address spaces. It is a challenge to get constant data to be stored in the Program Space, and to retrieve that data to use it in the AVR application.

The problem is exacerbated by the fact that the C Language was not designed for Harvard architectures, it was designed for Von Neumann architectures where code and data exist in the same address space. This means that any compiler for a Harvard architecture processor, like the AVR, has to use other means to operate with separate address spaces.

GCC has a special keyword, __attribute__ that is used to attach different attributes to things such as function declarations, variables, and types. This keyword is followed by an attribute specification in double parentheses. In AVR GCC, there is a special attribute called progmem. This attribute is used on data definitions, and tells the compiler to place the data in the Program Memory (Flash).

AVR-LibC provides a simple macro PROGMEM that is defined as the attribute syntax of GCC with the __progmem__ attribute. The PROGMEM macro is defined in the <avr/pgmspace.h> system header which also provides macros and inline functions to access such data.

An alternative approach is taken by named address-spaces like __flash and __flashx as proposed by the N1275 draft to the ISO/IEC DTR 18037 "Embedded C" specification. Named address-spaces are supported in avr-gcc since v4.7 (__flashx since avr-gcc v15) as part of the GNU-C99 language dialect (-std=gnu99 and up), see the avr-gcc documentation.

AVRs with a linear address space

There are AVR devices that provide a linear address space where the program memory is seen in the RAM address space and can be accessed with LD instructions. The respective device families are:

Devices from the AVRrc core in avrtiny.
AVR16 and AVR32 devices, and devices from the 0-series, 1-series, 2-series in avrxmega3.
AVR64 devices in avrxmega2 and AVR128 devices in avrxmega4. For these devices, only a 32 KiB portion of the program flash is visible in the RAM address space. And only without -mrodata-in-ram will .rodata be located in flash.

In all of these cases, complications like PROGMEM or __flash are not needed, though they are working as usual.

Why is GCC putting const Data into RAM to begin with?

From a technical point of view, GCC is putting constant data in static storage¹ into the .rodata input section, as opposed to non-const data which is put into .data or .bss. But the question is then: Why is the linker (script) putting the .rodata sections into RAM? In order to better understand this, take the following code:

#include <stdbool.h>
 
extern const char c_one;
extern char c_two;
 
bool is_one (const char *pc)
{
    return *pc == '1';
}
 
int test1 (void)
{
    if (is_one (&c_one))
        return 1;
    else if (is_one (&c_two))
        return 2;
    else
        return 0;
}

This is a completely valid C99 compilation unit.

Function is_one takes a const char* pointer argument because it is just reading through pc and does not modify the pointed-to object. Without the const qualifier for the pointed-to object it was not possible to use the function with constant pointers like &c_one, because the code would no more be const-correct.

Moreover, it is completely fine to pass the address of a non-const object like &c_two to a function that won't change the pointed-to object, and hence takes a pointer-to-const.

The big question in now: What assembly / machine code should a compiler generate for is_one()?

AVR GCC is using LD(pc) in order to read *pc:

is_one:
    movw r30, r24  ; move pc from r25:r24 to Z
    ldi  r24, 1    ; return value := true
    ld   r25, Z    ; r25 := *Z using LD
    cpi  r25, '1'  ; is r25 == '1' ?
    breq .L2       ; yes: then goto return
    ldi  r24, 0    ; no:  then return value := false
.L2:
    ret            ; return value (r24)

This works when c_one and c_two are located in RAM², so that the LD instruction can be used.

A different approach would be to use LD(pc) when *pc is located in RAM, and LPM(pc) when *pc is located in flash; something like:
```
if is_ram_pointer(pc)
    r25 = LD(pc)
else
    r25 = LPM(pc)
```
The drawbacks are obvious: Such code is expensive, because it has to discriminate at run-time whether pc points to RAM or to flash. Plus, there must be some means to tell which kind of pointer pc actually is. For example, the high bit of the address could be used to encode the information.

This approach is taken by avr-gcc's named address-space __memx, which uses 24-bit pointers and encodes the information in the high byte.

So when the compiler takes the first approach of always using LD, what will happen when we put is_one in PROGMEM?

The code will just not work!³

To use is_one() for pointers to RAM as well as pointers to progmem (or __flash for that matter), is_one() needs a second argument that tells which kind of pointer is being passed, and it has to adjust the code accordingly; something like:

#include <stdbool.h>
#include <avr/pgmspace.h>
 
bool is_one (const char *pc, bool is_ram_addr)
{
    char c = is_ram_addr
        ? pgm_read_char (pc)
        : *pc;
 
    return c == '1';
}

Notes

In C++, const static storage data might be written to. For example, in
volatile int vi;

const int i2 = vi;

the variable i2 is read-only for the C++ program, but i2 must not be put into .rodata because it cannot be initialized at load-time. Due to its initializer that is not computable at load-time, i2 has to be put into RAM and will be initialized (written to) at run-time by the startup code. avr-g++ will diagnose when an attempt is made to put i2 in PROGMEM.
More precisely, these variables have to be located in the RAM address space for the code to work. For example, some AVR devices see (a part of) the program memory in the RAM address space, and hence can use the LD instruction to access program memory.
For example, an ATmega3208 sees the program memory range of 0x0...0x7fff in the RAM address space at addresses 0x4000...0xbfff. So all the linker script has to do is to provide an appropriate VMA of 0x4000+LMA for .rodata objects. This is the case for devices from the avrxmega3 and avrtiny families. Since GCC v14 / Binutils v2.42 it is also the case for AVR64 and AVR128 devices when they use the default avrxmega2_flmap or avrxmega4_flmap emulation, i.e. without -mrodata-in-ram.
The code does actually work for Reduced Tiny devices because the compiler is implementing attribute progmem in a different way for the reduced core (AVRrc). See the GCC documentation on progmem.

Storing and Retrieving Data in the Program Space

With Attribute PROGMEM and pgm_read() Functions

Let's say you have some global data:

char mydata[2][8] =
{
    { 2, 3, 5,  7, 11, 13, 17, 19 },
    { 1, 4, 9, 16, 25, 36, 49, 64 }
};

and later in your code you access this data in a function and store a single byte into a variable value like so:

char value = mydata[i][j];

Now you want to store your data in Program Memory. Use the PROGMEM macro found in <avr/pgmspace.h> and put it after the declaration of the variable, but before the initializer, like so:

#include <avr/pgmspace.h>
 
const char mydata[2][8] PROGMEM =
{
    { 2, 3, 5,  7, 11, 13, 17, 19 },
    { 1, 4, 9, 16, 25, 36, 49, 64 }
};

That's it! Now your data is in the Program Space. You can compile, link, and check the map file to verify that mydata is placed in the correct section.

Now that your data resides in the Program Space, your code to access (read) the data will no longer work. The code that gets generated will retrieve the data that is located at the address of the mydata array, plus offsets indexed by the i and j variables. However, the final address that is calculated where to the retrieve the data points to the Data Space! Not the Program Space where the data is actually located. It is likely that you will be retrieving some garbage. The problem is that avr-gcc does not intrinsically know that the data resides in the Program Space.

The solution is fairly simple. The "rule of thumb" for accessing data stored in the Program Space is to access the data as you normally would (as if the variable is stored in Data Space), like so:

char value = mydata[i][j];

then take the address of the data:

... &(mydata[i][j]);

then use the appropriate pgm_read_* function, and the address of your data becomes the parameter to that function:

char value = pgm_read_char (&(mydata[i][j]));

The pgm_read_* functions take an address that points to the Program Space, and retrieves the data that is stored at that address. This is why you take the address of the offset into the array. This address becomes the parameter to the function so it can generate the correct code to retrieve the data from the Program Space. There are different pgm_read_* functions to read different types of data at the address given.

With Named Address-Space __flash

The same code in terms of address-space __flash is:

const __flash char mydata[2][8] =
{
    { 2, 3, 5,  7, 11, 13, 17, 19 },
    { 1, 4, 9, 16, 25, 36, 49, 64 }
};

In order to read from mydata, no special code is required:

char value = mydata[i][j];

You can also pass qualified addresses around, like in

char get_first (const __flash char *array)
{
    return array[0];
}
 
char get_mydata_nth_first (uint8_t n)
{
    return get_first (mydata[n]);
}

PROGMEM and __flash: The Differences

So what's are the ups and down of using PROGMEM or __flash ?

Named address-spaces are only available in GNU-C99 and up, and with avr-gcc v4.7 or newer. To date, GCC does not support named address-spaces in C++, whereas the pgm_read functions work in C++ just as well as in C.
Qualifiers like __flash are easier to port. For example, avr-gcc does not support named address-spaces for the Reduced Tiny devices like ATtiny10. This can be handled with the builtin macro __FLASH:
#ifndef __FLASH

#define __flash // empty

#endif

// Code that uses __flash

(Notice that on Reduced Tiny, section .rodata is located in program memory as opposed to many other AVR cores that have .rodata in RAM. Hence not using __flash does not cause a loss of performance.)
__flash is transparent to the compiler, for example an access like value = mydata[1][1] can be optimized to value = 4, whereas accesses through pgm_read cannot be optimized.
Qualifiers like __flash can be used in pointer targets, like in
char read_c (const __flash char *c)

{

return *c; // Compiles to LPM

}

whereas this is not possible for PROGMEM, which is an attribute in GCC and not a qualifier.
The analog of PSTR for address-spaces is FSTR or FXSTR. As of v14, avr-gcc still rejects to put local static compound literals into an address-space (GCC PR84163). While constructs like
const __flash char *ptext = FSTR ("Text");

FSTR
#define FSTR(str)
Definition: flash.h:180

will work and are qualifier-correct, i.e. avr-gcc will not raise a diagnostic with -Waddr-space-convert.
PSTR cannot be used on global scope, whereas a similar construct with address-spaces is possible: Take for example the code discussed in the next section that declares an array to string literals. With address-space we can write:
#include <avr/flash.h>

const __flash char* const __flash string_table[] =

{

FLIT("String 1"),

FLIT("String 2"),

FLIT("String 3")

};

flash.h

FLIT
#define FLIT(str)
Definition: flash.h:263

Notice that the __flash left of the * refers to strings pointed-to by string_table, whereas the __flash right of the * refers to string_table itself, i.e. the string literals as well as the table are in __flash.
For functions that take a pointer to program space like strcpy_P, there are address-space correct variants like strcpy_F and strcpy_FX that work nicely with -Waddr-space-convert.

Storing and Retrieving Strings in the Program Space

With __flash

For a solution with the __flash address-space, see the section above, the FLIT examples, or have a look at the FAQ: How to put an array of strings completely in ROM?

With PROGMEM

Now that you can successfully store and retrieve simple data from Program Space you want to store and retrieve strings from Program Space. And specifically you want to store and array of strings to Program Space. So you start off with your array, like so:

const char* const string_table[] =
{
    "String 1",
    "String 2",
    "String 3"
};

and then you add your PROGMEM macro to the end of the declaration:

const char* const string_table[] PROGMEM =
{
    "String 1",
    "String 2",
    "String 3"
};

Right? WRONG!

Unfortunately, with GCC attributes, they affect only the declaration that they are attached to. So in this case, we successfully put the string_table variable, the array itself, in the Program Space. This DOES NOT put the actual strings themselves into Program Space. At this point, the strings are still in the Data Space, which is probably not what you want.

In order to put the strings in Program Space, you have to have explicit declarations for each string, and put each string in Program Space:

static const char string_1[] PROGMEM = "String 1";
static const char string_2[] PROGMEM = "String 2";
static const char string_3[] PROGMEM = "String 3";

Then use the new symbols in your table, like so:

const char* const string_table[] PROGMEM =
{
    string_1,
    string_2,
    string_3
};

Now this has the effect of putting string_table in Program Space, where string_table is an array of pointers to characters (strings), where each pointer is a pointer to the Program Space, where each string is also stored.

Retrieving the strings are a different matter. You probably don't want to pull the string out of Program Space, byte by byte, using the pgm_read_byte() macro or prgm_read_char() function. There are other functions declared in the <avr/pgmspace.h> header file that work with strings that are stored in the Program Space.

For example, if you want to copy the string from Program Space to a buffer in RAM (like an automatic variable inside a function, that is allocated on the stack), you can do this:

void foo (void)
{
    char buffer[10];
 
    for (uint8_t i = 0; i < 3; i++)
    {
        strcpy_P (buffer, (const char*) pgm_read_ptr (& string_table[i]));
 
        // Display buffer on LCD.
    }
}

Here, the string_table array is stored in Program Space, so we access it normally, as if it was stored in Data Space, then take the address of the location we want to access, and use the address as a parameter to pgm_read_ptr. We use the pgm_read_ptr macro to read the string pointer out of the string_table array. Remember that a pointer is 16-bits, or word size. The pgm_read_ptr macro will return a void*. This pointer is an address in Program Space pointing to the string that we want to copy. This pointer is then used as a parameter to the function strcpy_P. The function strcpy_P is just like the regular strcpy function, except that it copies a string from Program Space (the second parameter) to a buffer in the Data Space (the first parameter).

There are many string functions available that work with strings located in Program Space. All of these special string functions have a suffix of _P in the function name, and are declared in the <avr/pgmspace.h> header file.

AVR-LibC Manual					AVR-LibC Sources
Main Page	User Manual	Library Reference	FAQ	Example Projects	Index

AVR-LibC Manual

AVR-LibC Sources

Main Page

User Manual

Lib­rary Refe­rence

FAQ

Exam­ple Pro­jects

Index

Introduction

Why is GCC putting const Data into RAM to begin with?

Storing and Retrieving Data in the Program Space

With Attribute PROGMEM and pgm_read() Functions

With Named Address-Space __flash

PROGMEM and __flash: The Differences

Storing and Retrieving Strings in the Program Space

With __flash

With PROGMEM

Library Reference

Example Projects