## Data structure alignment  ## Foreword

Having knowledge about how data is stored and the way we can access them will help us improve our system performance as well as optimize the memory, especially when you are developing on an embedded system with limited resources.

Now, let’s get started.

## What is data structure alignment?

Data structure alignment: when data is loaded to memory, they will be rearranged to make it more efficient to access by the CPU. There are 2 separate concepts while doing “Data structure alignment”:

• Data alignment: place all variables at an address while maintaining the offset equal to multiple of the word size.
• Data structure padding: in order to keep the address equal to multiple of the word size, sometime some meaningless bytes will be inserted between 2 variables, and this is “PADDING”

We will go more details in the next section about how the rearrangement happens and how it can improve our system performance.

Before getting into details, we will need to go through some fundamental knowledge about our system.

### Word size and address size

A processor does not access to memory one byte at a time but in 2, 4, 8, 16, or 32-byte block (base on the system). The reason for this is the performance for accessing an address on mutiple bytes boundary is a lot faster than on a single byte boundary.

Word size: This is the number of bits that a CPU can process at one time. In the modern CPU, word size can be 8, 16, 24, 32 or 64 bits depends on the system. We usually call a system base on it word size. For example:

• 16-bit system: 1 Word = 16 bits = 2 bytes
• 32-bit system: 1 Word = 32 bits = 4 bytes
• 64-bit system: 1 Word = 64 bits = 8 bytes Figure 01: Word size

Address size: this is the size of the address space. For example, if we use 4 bytes (32 bits) to store the address, we will have 2^32 = 4 294 967 296 different addresses. In modern CPU, the word size is usually (not always) used to also describe the size of the address space. This allows one memory address to be efficiently stored in one word. For example:

• 32-bit system: size of 1 word = size of 1 address space = 32 bits.
• 64-bit system: size of 1 word = size of 1 address space = 64 bits. Figure 03: Address size and Word size

### Why do we need it?

Let’s take a look at the example below to see how accessing a misalignment data can slow down the system’s performance. Figure 04: 4-byte variable on the unalignment system and alignment system

As can be seen in figure 04, we will need 5 steps in the misalignment system in comparison with only 2 steps in the alignment system to get the 4-byte variable.

### How it works?

Here are 3 steps I found useful when dealing with the structure padding:

1. Place all the variables in the struct at the address that can be evenly divisible to the size of that variable (if the system word size if bigger than variable size). If the system word size is smaller than the variable size, the variable will be placed at the address that can be evenly divisible to the word size. Figure 05: Different data type and their aligned address.

3. Calculate the final size of the struct.

Let’s take a look at some example below to know how it works

Example 01: Calculate size of struct_01 in 32-bit system and 64-bit system

Step 1 and 2: Put the variables to the appropriate place and add padding to unused bytes. Figure 06: Example 01

Step 3: Calculate the final size

• For 32-bit system: sizeof(struct_01) = 12 bytes
• For 64-bit system: sizeof(struct_01) = 16 bytes

Example 02: Calculate size of struct_02 in 32-bit system and 64-bit system

Step 1 and 2: Put the variables to the appropriate place and add padding to unused bytes. Figure 07: Example 02

Step 3: Calculate the final size

• For 32-bit system: sizeof(struct_02) = 20 bytes
• For 64-bit system: sizeof(struct_02) = 24 bytes

Example 03: Calculate the size of struct_03 in 32-bit system and 64-bit system

Step 1 and 2: Put the variables to the appropriate place and add padding to unused bytes. Figure 08: Example 3

Step 3: Calculate the final size

• For 32-bit system: sizeof(struct_03) = 6 bytes
• For 64-bit system: sizeof(struct_03) = 6 bytes

### Align and Padding macro in C language

In some compilers such as IAR or KeilC, you can use some struct attributes to have more control over data alignment.

PACK ATTRIBUTE

With a structure having pack attribute, we are not padding anything between struct elements.

Example 04: Calculate the size of struct_04 in the 32-bit system and 64-bit system

Step 1: Put the variables to the appropriate place, no padding. Figure 09: Example 04

Step 2: Calculate the final size

• For 32-bit system: sizeof(struct_04) = 15 bytes
• For 64-bit system: sizeof(struct_04) = 15 bytes

ALIGNED ATTRIBUTE

With the struct having aligned attribute, the value of the aligned attribute will overdrive the word-size. Therefore, 3 steps now will become:

1. Place all the variables in the struct at the address that can be evenly divisible to the size of that variable (if the aligned value if bigger than variable size). If the aligned value is smaller than the variable size, the variable will be placed at the address that can be evenly divisible to the aligned value.
3. Calculate the final size of the struct.
Remember that the final size is always evenly divisible to the aligned value

Example 05: Calculate the size of struct_05 in the 32-bit system and 64-bit system

Step 1 and 2: Put the variables to the appropriate place. Figure 10: Example 05

Step 3: Calculate the final size

• For 32-bit system: sizeof(struct_05) = 24 bytes
• For 64-bit system: sizeof(struct_05) = 24 bytes

USE “PACK” AND “ALIGNED” ATTRIBUTE TOGETHER

When using “pack” and “aligned” attribute together, we will:

1. “Pack” the struct first.
2. Add padding bytes at the end to make sure the size of our structure is evenly divisible to the aligned value.

Example 06: Calculate the size of struct_06 in the 32-bit system and 64-bit system

Pack the struct first and add padding bytes at the end to make sure it is evenly divisible to the aligned value (8 in this example) Figure 11: Example 06

### More thoughts

Some more interesting article about how data structure alignment affects our system’s performance: https://www.ibm.com/developerworks/library/pa-dalign/

WRITTEN BY

Trung Do

Firmware engineer, blogger and a makerholic

## Compiling process in C programming  ## Foreword

We usually write our embedded system with C language. However, C is still a high level language and we will need a compiler to generate it to executable code that can run on our system. Today we will see how compiler can do this.

Now, let’s get started.

## All compilation process

Here are the full step of compilation process ### Step 1: Pre-processing

The pre-processing step will take source file (.c file) and generate to .i file

In the pre-processing step, the compiler will do 3 things:

• Expand macros and inline functions.

Let’s take a look at how .i file looks like after doing pre-processing step in figure 2 Figure 2: Pre-processing step

### Step 2: Compiling

The compiling step will take .i file and generate to assembly code (.s file), which is an intermediate human readable language.

The .s file will have something like in figure 3 Figure 3: Compiling step (.s file)

### Step 3: Assembly

The assembly step will take assembly code (.s file) and generate to object code (.o or .obj file)

.o file will be something like figure 4 if open with a hex editor Figure 4: Assembly step (.o file)

In a project with several modules, we will have several object files after step 3. In order to make an executable program, all of these files have to be rearranged and all the missing instructions (if you are using libraries) must be linked together. That’s why this process is called linking.

After linking, we will have only 1 executable file (the name is a.out if we compile without any options) which can be run on our target controller.

### Compile yourself a simple program and run

On Linux or Mac OS, you can follow these commands below to generate those files for yourself. Suppose that you have a source file like this
```// Program to multiply 2 numbers

#include <stdio.h>

#define MUL(a,b) (a*b)

int main(void)
{
int a = 5;
int b = 10;

printf("Result: %d\n", MUL(a,b));

return 0;
}```
Run this command in terminal, it will create .i, .s, .o and .out file from your source file
``` gcc –Wall –save-temps main.c
```
Let’s run your executable file by run this command in terminal
``` ./a.out
```
The result which is displayed on terminal should be 50

WRITTEN BY

Trung Do

Firmware engineer, blogger and a makerholic

## Different microcontroller GPIO settings  ## Foreword

Anyone who works with an embedded system must interact with the GPIO pins. Besides all the most basic configurations such as input, output high or low, there are many more than that supported in our microcontrollers nowadays and this blog will help you go through all of these things.

Now, let’s get started.

## Some definitions

### What are Tri-state (3-state), High-impedance, High-Z, Floating?

Tri-state

• This is a term to specify that the pin can be driven to either low, high or high-z mode.
• Don’t misuse this term with high-z or high-impedance because tri-state is not a mode. You can say “config a pin as high-z” not “config a pin as tri-state”. Figure 01: An example of a tri-state circuit Figure 02: 3 different states of a tri-state circuit

High-Z/High-Impedance:

• This is one state of the tri-state. You can check the “State 3” on figure 02.
• Whenever you configure a pin as High-Z (High-Impedance), that pin will be completely removed from the device.

Floating:

• This is just the result after configuring a pin as High-Z (High-Impedance). In other words, we can say “after config a pin a High-Z, that pin will be floating”.
• The logic state of that pin is obviously unknown. It will “float” to match the residual voltage and depends on the external circuit which is connected to that pin.

Current source and Current sink Figure 03: Current source vs current sink

Current source:

• A device is called “current source” when it is connected with a load and supplies current that load.
• The load can be LED, a motor…

Current sink:

• A device is called “current sink” when it is connected with a load and the current flows from the power supply, through the load into the device.

### Settings for INPUT pin

An input pin can be configured as:

• Input pull-up
• Input pull-down
• Input high-z (high-impedance)

Input high-z

• If a pin is configured as input high-z, the input default state will be indeterminate unless it is driven high or low by an external source. Figure 04: Input high-z

Input pull-up and Input pull-down Figure 05: Input pull-up and pull-down

• Sometimes we might want to set the default state for the input while it is not driven by an external source, pull-up/pull-down is used in these cases.
• With pull-up register, the default state will be HIGH and can be overridden by an external source.
• With pull-down register, the default state will be LOW and can be overridden by an external source.
• Let’s take a look at the case that pull-up resistor helps Figure 06: Button with input pull-up

### Settings for OUTPUT pin

An output pin can be configured as:

• Output push-pull
• Output open-drain (if using FET) or open-collector (if using BJT)

Output push-pull

• There will be 2 transistors connect on the GPIO pin to VCC and GND.
• When the output goes LOW, it is actively “pulled” to GND.
• When the output goes HIGH, it is actively “pushed” to VCC. Figure 07: Output push-pull

Output open-drain (or open-collector)

• There will be 1 transistor connect on the GPIO pin to GND, the collector will be left open.
• This is useful when we need to isolate that pin for external circuit to control (such as I2C bus in multi-master mode) Figure 08: Output open-collector

### Drive-strength

Drive-strength will determine the output impedance.
The value of drive-strength of some microcontrollers is 2mA/4mA/8mA/12mA… The default value is around 4mA.
If drive-strength is too weak, the rising and falling time of a signal are affected and you may not meet the timing specification.
If drive-strength is too strong, there are noise, overshoot, ringing can happen on the bus.

Here is the different in rising time between driver strength = 4mA and 12mA Figure 09: Drive-strength

### High-drive

Some GPIO pins are able to provide more current than typical pins and they are used for directly driving IO which requires high-current such as LED or motor.

By using high-drive for that pin, it can avoid making another amplifying circuit, thus, reduce cost and effort.

### Slew-rate

Slew-rate is the maximum rate of change of output voltage per unit of time. There might be SLOW (default) or FAST slew-rate.

Let’s take a look at figure 10 for the slew-rate definition. Figure 10: Slew-rate

Slow slew-rate will limit the production of high frequency. Therefore, we should use the slowest slew-rate which still satisfies the GPIO signal timing specification to minimize any possible signal integrity issues.

Update 14th, Jan 2019

Here is the simplified circuit of a physical GPIO pin on Raspberry Pi Figure 11: Simplified circuit of Raspberry Pi physical GPIO pin

WRITTEN BY

Trung Do

Firmware engineer, blogger and a makerholic