Data structure alignment

06_Data_Structure_Alignment_Figure_00
admin

Data structure alignment

Share on facebook
Share on twitter
Share on linkedin

Foreword

Having knowledge about how data is stored and the way we can access them will help us improve our system performance as well as optimize the memory, especially when you are developing on an embedded system with limited resources.

Now, let’s get started.

What is data structure alignment?

Data structure alignment is the way data is arranged and accessed in computer memory. It consists of two separate but related issues: data alignment and data structure padding.

Wikipedia

Data structure alignment: when data is loaded to memory, they will be rearranged to make it more efficient to access by the CPU. There are 2 separate concepts while doing “Data structure alignment”:

  • Data alignment: place all variables at an address while maintaining the offset equal to multiple of the word size.
  • Data structure padding: in order to keep the address equal to multiple of the word size, sometime some meaningless bytes will be inserted between 2 variables, and this is “PADDING”

We will go more details in the next section about how the rearrangement happens and how it can improve our system performance.

About the system

Before getting into details, we will need to go through some fundamental knowledge about our system.

Word size and address size

A processor does not access to memory one byte at a time but in 2, 4, 8, 16, or 32-byte block (base on the system). The reason for this is the performance for accessing an address on mutiple bytes boundary is a lot faster than on a single byte boundary.

Word size: This is the number of bits that a CPU can process at one time. In the modern CPU, word size can be 8, 16, 24, 32 or 64 bits depends on the system. We usually call a system base on it word size. For example:

  • 16-bit system: 1 Word = 16 bits = 2 bytes
  • 32-bit system: 1 Word = 32 bits = 4 bytes
  • 64-bit system: 1 Word = 64 bits = 8 bytes

Figure 01: Word size

Address size: this is the size of the address space. For example, if we use 4 bytes (32 bits) to store the address, we will have 2^32 = 4 294 967 296 different addresses.

Figure 02: Address size

In modern CPU, the word size is usually (not always) used to also describe the size of the address space. This allows one memory address to be efficiently stored in one word. For example:

  • 32-bit system: size of 1 word = size of 1 address space = 32 bits.
  • 64-bit system: size of 1 word = size of 1 address space = 64 bits.  

Figure 03: Address size and Word size

Why do we need it?

Let’s take a look at the example below to see how accessing a misalignment data can slow down the system’s performance.

Figure 04: 4-byte variable on the unalignment system and alignment system

As can be seen in figure 04, we will need 5 steps in the misalignment system in comparison with only 2 steps in the alignment system to get the 4-byte variable.

How it works?

Here are 3 steps I found useful when dealing with the structure padding:

  1. Place all the variables in the struct at the address that can be evenly divisible to the size of that variable (if the system word size if bigger than variable size). If the system word size is smaller than the variable size, the variable will be placed at the address that can be evenly divisible to the word size.


                                              Figure 05: Different data type and their aligned address.

  2. Padding to unused bytes.
  3. Calculate the final size of the struct. 

Let’s take a look at some example below to know how it works

Example 01: Calculate size of struct_01 in 32-bit system and 64-bit system

Step 1 and 2: Put the variables to the appropriate place and add padding to unused bytes.

Figure 06: Example 01

Step 3: Calculate the final size

  • For 32-bit system: sizeof(struct_01) = 12 bytes
  • For 64-bit system: sizeof(struct_01) = 16 bytes

Example 02: Calculate size of struct_02 in 32-bit system and 64-bit system

Step 1 and 2: Put the variables to the appropriate place and add padding to unused bytes.

Figure 07: Example 02

Step 3: Calculate the final size

  • For 32-bit system: sizeof(struct_02) = 20 bytes
  • For 64-bit system: sizeof(struct_02) = 24 bytes

Example 03: Calculate the size of struct_03 in 32-bit system and 64-bit system

Step 1 and 2: Put the variables to the appropriate place and add padding to unused bytes.

Figure 08: Example 3

Step 3: Calculate the final size

  • For 32-bit system: sizeof(struct_03) = 6 bytes
  • For 64-bit system: sizeof(struct_03) = 6 bytes

Align and Padding macro in C language

In some compilers such as IAR or KeilC, you can use some struct attributes to have more control over data alignment.

PACK ATTRIBUTE

With a structure having pack attribute, we are not padding anything between struct elements.

Example 04: Calculate the size of struct_04 in the 32-bit system and 64-bit system

Step 1: Put the variables to the appropriate place, no padding.

Figure 09: Example 04

Step 2: Calculate the final size

  • For 32-bit system: sizeof(struct_04) = 15 bytes
  • For 64-bit system: sizeof(struct_04) = 15 bytes

 

ALIGNED ATTRIBUTE

With the struct having aligned attribute, the value of the aligned attribute will overdrive the word-size. Therefore, 3 steps now will become:

  1. Place all the variables in the struct at the address that can be evenly divisible to the size of that variable (if the aligned value if bigger than variable size). If the aligned value is smaller than the variable size, the variable will be placed at the address that can be evenly divisible to the aligned value.
  2. Padding to unused bytes.
  3. Calculate the final size of the struct. 
Remember that the final size is always evenly divisible to the aligned value

Example 05: Calculate the size of struct_05 in the 32-bit system and 64-bit system

Step 1 and 2: Put the variables to the appropriate place.

Figure 10: Example 05

Step 3: Calculate the final size

  • For 32-bit system: sizeof(struct_05) = 24 bytes
  • For 64-bit system: sizeof(struct_05) = 24 bytes

USE “PACK” AND “ALIGNED” ATTRIBUTE TOGETHER

When using “pack” and “aligned” attribute together, we will:

  1. “Pack” the struct first.
  2. Add padding bytes at the end to make sure the size of our structure is evenly divisible to the aligned value.

Example 06: Calculate the size of struct_06 in the 32-bit system and 64-bit system

Pack the struct first and add padding bytes at the end to make sure it is evenly divisible to the aligned value (8 in this example)

Figure 11: Example 06

More thoughts

Some more interesting article about how data structure alignment affects our system’s performance: https://www.ibm.com/developerworks/library/pa-dalign/

WRITTEN BY

Trung Do

Firmware engineer, blogger and a makerholic 

Related Articles

Compiling process in C programming

05_Embedded_Compile_Process_Figure_01
admin

Compiling process in C programming

Share on facebook
Share on twitter
Share on linkedin

Foreword

We usually write our embedded system with C language. However, C is still a high level language and we will need a compiler to generate it to executable code that can run on our system. Today we will see how compiler can do this.

Now, let’s get started.

All compilation process

Here are the full step of compilation process

Step 1: Pre-processing

The pre-processing step will take source file (.c file) and generate to .i file

In the pre-processing step, the compiler will do 3 things:

  • Expand header files.
  • Expand macros and inline functions.
  • Remove all comments

Let’s take a look at how .i file looks like after doing pre-processing step in figure 2

Figure 2: Pre-processing step

 

Step 2: Compiling

The compiling step will take .i file and generate to assembly code (.s file), which is an intermediate human readable language.

The .s file will have something like in figure 3

Figure 3: Compiling step (.s file)

Step 3: Assembly

The assembly step will take assembly code (.s file) and generate to object code (.o or .obj file)

.o file will be something like figure 4 if open with a hex editor

Figure 4: Assembly step (.o file)

Step 4: Linking

In a project with several modules, we will have several object files after step 3. In order to make an executable program, all of these files have to be rearranged and all the missing instructions (if you are using libraries) must be linked together. That’s why this process is called linking.

 

Step 5: Loading

After linking, we will have only 1 executable file (the name is a.out if we compile without any options) which can be run on our target controller.

Compile yourself a simple program and run

On Linux or Mac OS, you can follow these commands below to generate those files for yourself. Suppose that you have a source file like this
// Program to multiply 2 numbers

#include <stdio.h>

#define MUL(a,b) (a*b)

int main(void)
{
  int a = 5;
  int b = 10;
  
  printf("Result: %d\n", MUL(a,b));
  
  return 0;
}
Run this command in terminal, it will create .i, .s, .o and .out file from your source file
 gcc –Wall –save-temps main.c
Let’s run your executable file by run this command in terminal
 ./a.out
The result which is displayed on terminal should be 50  

WRITTEN BY

Trung Do

Firmware engineer, blogger and a makerholic 

Related Articles

Different microcontroller GPIO settings

Embedded_GPIOSettings_Figure_00
admin

Different microcontroller GPIO settings

Share on facebook
Share on twitter
Share on linkedin

Foreword

Anyone who works with an embedded system must interact with the GPIO pins. Besides all the most basic configurations such as input, output high or low, there are many more than that supported in our microcontrollers nowadays and this blog will help you go through all of these things.

Now, let’s get started.

Some definitions

What are Tri-state (3-state), High-impedance, High-Z, Floating?

Tri-state

  • This is a term to specify that the pin can be driven to either low, high or high-z mode.
  • Don’t misuse this term with high-z or high-impedance because tri-state is not a mode. You can say “config a pin as high-z” not “config a pin as tri-state”.

Figure 01: An example of a tri-state circuit

Figure 02: 3 different states of a tri-state circuit

High-Z/High-Impedance:

  • This is one state of the tri-state. You can check the “State 3” on figure 02.
  • Whenever you configure a pin as High-Z (High-Impedance), that pin will be completely removed from the device. 

Floating:

  • This is just the result after configuring a pin as High-Z (High-Impedance). In other words, we can say “after config a pin a High-Z, that pin will be floating”.
  • The logic state of that pin is obviously unknown. It will “float” to match the residual voltage and depends on the external circuit which is connected to that pin.

Current source and Current sink

Figure 03: Current source vs current sink

Current source:

  • A device is called “current source” when it is connected with a load and supplies current that load.
  • The load can be LED, a motor…

Current sink:

  • A device is called “current sink” when it is connected with a load and the current flows from the power supply, through the load into the device.

Settings for INPUT pin

An input pin can be configured as:

  • Input pull-up
  • Input pull-down
  • Input high-z (high-impedance)

Input high-z

  • If a pin is configured as input high-z, the input default state will be indeterminate unless it is driven high or low by an external source.

Figure 04: Input high-z

Input pull-up and Input pull-down

Figure 05: Input pull-up and pull-down

  • Sometimes we might want to set the default state for the input while it is not driven by an external source, pull-up/pull-down is used in these cases.
  • With pull-up register, the default state will be HIGH and can be overridden by an external source.
  • With pull-down register, the default state will be LOW and can be overridden by an external source.
  • Let’s take a look at the case that pull-up resistor helps
Figure 06: Button with input pull-up
 
 

Settings for OUTPUT pin

An output pin can be configured as:

  • Output push-pull
  • Output open-drain (if using FET) or open-collector (if using BJT)

Output push-pull

  • There will be 2 transistors connect on the GPIO pin to VCC and GND.
  • When the output goes LOW, it is actively “pulled” to GND.
  • When the output goes HIGH, it is actively “pushed” to VCC.

Figure 07: Output push-pull

Output open-drain (or open-collector)

  • There will be 1 transistor connect on the GPIO pin to GND, the collector will be left open.
  • This is useful when we need to isolate that pin for external circuit to control (such as I2C bus in multi-master mode)

Figure 08: Output open-collector

Drive-strength

Drive-strength will determine the output impedance.
The value of drive-strength of some microcontrollers is 2mA/4mA/8mA/12mA… The default value is around 4mA.
If drive-strength is too weak, the rising and falling time of a signal are affected and you may not meet the timing specification.
If drive-strength is too strong, there are noise, overshoot, ringing can happen on the bus.

Here is the different in rising time between driver strength = 4mA and 12mA

Figure 09: Drive-strength

High-drive

Some GPIO pins are able to provide more current than typical pins and they are used for directly driving IO which requires high-current such as LED or motor. 

By using high-drive for that pin, it can avoid making another amplifying circuit, thus, reduce cost and effort. 

Slew-rate

Slew-rate is the maximum rate of change of output voltage per unit of time. There might be SLOW (default) or FAST slew-rate.

Let’s take a look at figure 10 for the slew-rate definition.

Figure 10: Slew-rate

Slow slew-rate will limit the production of high frequency. Therefore, we should use the slowest slew-rate which still satisfies the GPIO signal timing specification to minimize any possible signal integrity issues.


Update 14th, Jan 2019

Here is the simplified circuit of a physical GPIO pin on Raspberry Pi

Figure 11: Simplified circuit of Raspberry Pi physical GPIO pin

WRITTEN BY

Trung Do

Firmware engineer, blogger and a makerholic 

Related Articles

Embedded knowledge: Basic communications protocol – I2C

Embedded_I2C_Figure_00
admin

Embedded knowledge:
Basic communication protocol - I2C

Share on facebook
Share on twitter
Share on linkedin

Foreword

Let’s continue with the “Embedded knowledge: Basic communication” series. Today, we will look into another popular protocol: Inter-Integrated Circuit or I2C.

Here are 3 blogs in “Embedded knowledge: Basic communications” series in case you’ve missed any of them:

1. Embedded knowledge: Basic communication protocol – UART.

2. Embedded knowledge: Basic communication protocol – SPI.

3. Embedded knowledge: Basic communication protocol – I2C.

In this series, I also share my working experience with these protocols.

Now, let’s get started!

Introduction

Let’s get back to Ted and Marshall who are in a conversation.

Figure 01: Ted and Marshall are in a conversation

With I2C, we still have the same principle as shown in figure 02.

Figure 02: The most basic I2C connection

Only 2 wires are required in I2C bus, which is equal with UART. However, I2C works in a totally different way in comparison with UART. Let’s go to find out what I2C protocol can do and why over 1000 ICs manufactured have used I2C in their product so far.

Connection

Features

Let’s again take a look at the most basic cases

Figure 03: The most basic I2C connection

There is something we should notice about this figure:

Only 2 wires are used: SDA (aka. Serial Data Line) and SCL (aka. Serial Clock Line)

SCL – Marked as “1” in figure 03:
Only goes from Master to Slave -> Device generating clock and initiating the communication is Master

SDA – Marked as “2” in figure 03:
Data is transferred between master and slave on this line.
This is a bi-directional bus. As a result, at one time, it is either Master or Slave transfers data. For this reason, I2C is a half-duplex protocol.

Pull-up resistor to VDD – Marked as “3” in figure 03:
How to choose the correct value for that pull-up resistor will be discussed later of this post.

Last but not least, each device connected to the I2C line is required to have a unique address, which is normally addressed by software.

Operation

Basic operation

We will go through some basic cases between master and slave before deep diving into details:

Figure 04-1: Read 1 byte from slave

Figure 04-2: Read several bytes from slave

Figure 04-3: Write 1 byte to slave

Figure 04-4: Write multiple bytes to slave

Figure 04-5: Read right after write to slave

 

SDA and SCL signal and logic levels

Let’s consider to a more complex case as the picture below

Figure 05: Multi masters and multi slaves 

As you can notice in the figure, we have multiple devices drive the SDA and SCL lines. If master and slaves use the normal configuration for the output pin, voltage conflict will happen in some cases (Master drives SDA to 5V while slave 1’s SDA is still 0V) 

In order to avoid this issue, both SDA and SCL must have an open-drain (for CMOS) or open-collector (for BJT). I will tell you why but first take a look at an example of SDA open-drain output in figure 06:

Figure 06: SDA open-drain output

There are only 2 cases here in figure 07:

Figure 07-1: Different output to SDA open-drain

When ever we apply “1” to G (gate), D (Drain) and S (Source) are shorted. SDA line is now connected to GND.
When ever we apply “0” to G, D and S are disconnected. SDA line is disconnected from GND.

For example, if master drives SDA to 5V, there are 2 cases:
1. Slave outputs “0” to its Gate -> its SDA pin passively goes HIGH through pulled up resistor Rd (SDA = 5V) -> No voltage conflict. 
2. Slave outputs “1” to its Gate -> its SDA pin is connected to GND (SDA = 0V) -> No voltage conflict.

LOGIC LEVELS

Figure 07-2: Logic levels

Data validity

Data is sampled on SDA line everytime clock pulse goes HIGH. Therefore, the data on SDA line must be kept unchanged during that high period of clock.

On the other hands, the data on SDA line is only allowed to change during the low period of clock.

Figure 08: Data validity

Start and Stop condition

Both start and stop conditions are created by the master.

Master will generate either Stop (if it wants to stop the transferring session) or Start repeat (if it wants to continue the transferring session) after the Start condition.

We use Start repeat instead of Stop then Start again when the master wants to start a new communication without let the bus go idle in between, in which it can lose control to another master (in multimaster case).

Figure 09 shows the start and stop condition in a frame:

Figure 09: Start, start repeat and stop condition

In short:

Figure 10: Start, start repeat and stop condition definition

Byte format

One data bit is transfer in each clock pulse and I2C requires to use 8-bit long per bytes.

Figure 11: Bits and Byte

 

Acknowledge (ACK) and Not Acknowledge (NACK)

Slave sends an ACK if the data is received correctly.

Slave sends an NACK when:
• The receiver is not ready to start communication with the master because it’s busy doing other tasks.
• Not understand data from master during transmission.
• Cannot receive any more bytes from master during transmission

When master receives data from slave, it will send ACK if it’s still waiting for another byte.
Also in this case, master will send NACK to slave to terminate the transmission. Remember that NACK from master to slave does not mean that the last byte from slave has error.

The transmitter must release the SDA line before receiving any response from the receiver.

Figure 12: ACK and NACK

Clock stretching

When the slave able to receive all data but it needs time to store and prepare for the new bytes, it can hold the SCL line (extending clock LOW period) after receiving all 8 bits to put the master into wait state. After finishing all tasks, slave will response ACK to master. This procedure is called clock stretching.

The I2C slave is allowed to hold down the clock if it needs to reduce the bus speed.

The master is required to read back and wait until it is actually reach the high state after releasing it.

Figure 13: Clock stretching

  1. Before clock stretching: After finish sending 8 bits, master releases SDA and SCL. The master is required to keep reading back SCL until it goes HIGH.
  2. During clock stretching: Slave does clock stretching by driving SDA to LOW (for ACK) but keep holding the SCL.
  3. Finish clock stretching: Slave releases SCL. SCL now passively goes HIGH.
  4. After clock stretching: master detect 9th clock pulse while SDA is LOW. This is the ACK signal. 

 

Slave address (7 bits) and Read/Write bit

After the START, master firstly needs to send a byte to inform slave which command (READ or WRITE) master is going to do at which address. The format of this frame is shown on figure 14:

Figure 14: Slave address (7 bits) and Read/Write bit

10-bit slave addressing

10-bit addressing is born to extend the number of address on I2C bus. However, it is not widely used nowadays.

10-bit address format will be created after receiving the first 2 bytes after Start or Start repeat. Note that the first 5 bits of the first bytes is fixed at 1-1-1-1-0.

Figure 15: 10-bit address format

We can use 7-bit addresses and 10-bit addresses device on the same I2C bus. Due to the first byte address with format 1-1-1-1-0-x-x is reserved for 10-bit device only, we can distinguish between 2 type of different addresses. 

Let’s take a look at “Reserved address” section below for more details.

 

Reserved address

Figure 16: Reserved address

General call address

The general call address is used if master would like to execute something on several slave devices at the same time. It is required that the slave devices are configured to receive the general call address previously.

If a device does not required any data from a general call address, it responses nothing. Otherwise, it will ACK this address and turn into a slave-receiver.

The master cannot detect how many devices are using the message.

The general command will be sent in the second bytes with the following format:

Figure 17: General call address bytes format

In the second byte, if B = 0:
0000 011 0 (0x06): Reset and write programmable part of slave address by hardware.
0000 0100 (0x04): Write programmable part of slave address by hardware.

If B = 1: this is a “hardware general call”

 

Device ID

This is an optional 3-byte read-only with the following format:

Figure 18: Device ID

Multi-master bus: clock synchronization and arbitration

Let’s consider the case that we have multi masters on the same bus and two of them start transferring on the free bus at the same time.
For this reason, there must be a way to determine which master takes control the bus and finish its transmission. This problem is solved by using clock synchronization and arbitration.

CLOCK SYNCHRONIZATION

Because SCL pin is configured as an open-drain output, the SCL will be 0 if at least 1 master drives it to 0. Therefore, this is called “the wired-AND” connection of I2C interfaces.

Figure 19: Multi-master bus

Here is how clock synchronization works

Figure 20: Clock synchronization

When Master-1 release SCL1, the shared SCL is still driven low by SCL2. Therefore, Master-1 will enter the wait state until Master-2 completes its low-state and releasing SCL2. Master-1 will be informed by the Shared SCL goes high event and starts counting its high state.

By doing this, the clock synchronization is done in every clock period.

ARBITRATION

We use arbitration to determine which master will complete its transmission while other must stop (in multi master bus with several START conditions are sent out at almost the same time)

Figure 21: Arbitration

Master-1 and Master-2 start the transmission almost at the same time.
Both Master-1 data and Master-2 data are sent out. 
Just like the shared SCL, the shared SDA is also a wired-AND, thus, Shared SDA will be equal to (SDA1 AND SDA2).
After sending every bit, each master will check the shared SDA to see if it’s matched.
The first time the mismatch happens, the master knows that it loses the arbitration to other master and therefore, turning off its SDA. 

Bus speed

The I2C speed relies on the pull-up resistor value as well as the bus capacitance. 

The maximum speed is the largest clock speed at which, our MCU can sample the correct logic level. 

The way we choose the value for the pull-up resistor and number of devices connected to I2C (bus capacitance) are directly related to the rise-time of clock. As a result, the sampling process is affected.

According to User Guide UM10204 page 35, we have 5 operating speed categories:

Figure 22: Bus speed

Electrical specification

Logic level voltage, rising time and bus capacitance

Pull-up resistor

How does pull-up resistor affect clock pulse?
R is high → long rise time.
R is low  → short rise time but the power consumption is high.
⇒ Need to find Rp_max and Rp_min value of Rp to meet the signal timing specification.
If your system is sensitive to power, should choose Rp around Rpmax to minimize the current consumption.

Calculate Rp_max and Rp_min

Rp_max = t_r/(0.8473 x C_b) 

Rp_min = (VDD – VOLmax)/IOL

t_r: SDA and SCL maximum rising time allowed
C_b: estimated bus capacitance
VOLmax: maximum LOW-level output voltage
IOL: sink current (3mA for standard mode and fast mode, 20mA for fast mode plus) 

More information about this please refer in the UM10204.

After having Rp_max and Rp_min, we can choose our R_p within this range.

What is bus capacitance?
The bus capacitance is the total capacitance of wire, connections and pins.
⇒ That’s why when sniff with a logic analyzer, the I2C signal might be affected.
⇒ Should choose probe with as low capacitance as possible to make sure we still meet the I2C signal timing specification.

 

Operating above the maximum allowable bus capacitance

Decrease the clock speed: as the bus capacitance will slow down the rise-time of clock, decreasing the clock speed to make sure it reaches the correct logic level might help.

Increase drive-strength output: higher drive strength on SCL can help decrease the rising time dramatically.

There are other ways such as bus buffers or switched pull-up circuit also helps in this situation.

 

Coming up next!

We have just finished the last article in the Embedded knowledge: Basic communication series. I hope you find it helpful and don’t forget to subscribe for more exciting articles is coming out real soon.

WRITTEN BY

Trung Do

Firmware engineer, blogger and a makerholic 

Related Articles

Embedded knowledge: Basic communications protocol – SPI

Embedded_SPI_Feature
admin

Embedded knowledge:
Basic communication protocol - SPI

Share on facebook
Share on twitter
Share on linkedin

Foreword

Let’s continue with the “Embedded knowledge: Basic communication” series. Today, we are going to discuss about another common protocol: Serial Peripheral Bus or SPI.

Here are 3 blogs in “Embedded knowledge: Basic communications” series in case you’ve missed any of them:

1. Embedded knowledge: Basic communication protocol – UART.

2. Embedded knowledge: Basic communication protocol – SPI.

3. Embedded knowledge: Basic communication protocol – I2C.

In this series, I also share my working experience with these protocols.

Now, let’s get started!

Introduction

Let’s get back to Ted and Marshall who are in a conversation.

Figure 01: Ted and Marshall are in a conversation

With SPI, we still have the same principle as shown in figure 02.

Figure 02: SPI

As can be seen in the figure above, SPI tends to use more wires than UART. I will explain why we need these lines in the SPI along with the pros and cons in the next section. However, at the end of the day, no matter how many wires a protocol is using, the main purpose is still to make two or several devices talking with each other.

 

Connection

In SPI, we will have 4 signal lines:

  1. SCKL (Signal Clock)
  2. MOSI (Master Out Slave In)
  3. MISO (Master In Slave Out)
  4. CS (Chip Select)

Figure 03: 4 signal lines in SPI

SCLK – Signal Clock:

  • Unlike UART, SPI is a Synchronous protocol so it needs a clock signal to synchronize between Master and Slave while transferring data.
  • The clock signal is only generated by Master.
  • By using one more line for the clock signal, the transfer speed is pretty high in comparison with other protocol. SPI protocol itself theoretically does not define any limitation for the transferring speed and the implementation can reach over 10Mbps. However, in the production line, there are still a couple of things that need to be concerned. I will talk about this in section 2.5.

MISO – Master In Slave Out

  • The data output from Slave will go to Master on this line.

MOSI – Master Out Slave In

  • The data output from Master will go to Slave on this line.

CS – Chip Select (or SS – Slave Select)

  • Master chooses which slave to communication by pulling low that slave’s CS pin.

Basically, the simplest form of SPI will have both MISO and MOSI line, which makes SPI is a full-duplex protocol or in other words, it can send and receive data at the same time.

SPI is a Master-Slave protocol or moreover, Single Master-Slave protocol. That’s why we need the CS pin, thus, Master will pull low the CS pin of the slave it would like to communicate with. We will talk more about single master – multiple slaves in section 2.4.

Clock signal plays an important role in SPI protocol. In the SPI configuration process, there are 2 things about clock should be considered: polarity and phase. 

 

 

Clock polarity and clock phase

Clock polarity – CPOL: define the polarity of the clock.
Clock phase – CPHA: define when you would like to sample.

Figure 04: Different clock polarity and clock phase

We have 4 SPI Mode in total, all are shown in the table below

Figure 05: Different clock polarity and clock phase in SPI modes.

Some devices and sensors only support some of the SPI modes above so we have to choose the correct mode before using. The supported modes can be found in the datasheet. Overall, the SPI mode only affects the compatibility between master and slave, not the efficiency.

In addition, we also need to care about the data order (least significant bit first or most significant bit first).

 

Operation

Let’s take a look at how SPI works. In this example, I will use the SPI mode 0 (CPOL = 0, CPHA = 0) in full-duplex.

Figure 06: SPI in full duplex with CPOL = 0 and CPHA = 0

Purpose:
Master 1 wants to transfer 8 bits “1 0 1 1 0 0 1 0” to Slave and expect  8 bits “0 1 1 0 1 0 1 0 ” from slave

Pre-condition:
1. Master-CS connects with Slave-CS
2. Master-MISO connects with Slave-MISO
3. Master-MOSI connects with Slave-MOSI
4. Master-SCLK connects with Slave-SCLK
5. Slave supports SPI Mode 0 (CPOL=0, CPHA=0)

Process flow:
Step 1: Master pulls CS low to start communicating with the slave.
Step 2: Slave starts sampling data at rising edge of the clock and Master starts shifting data at falling edge of the clock. (CPOL = 0, CPHA = 0)
Step 3: After sampling all 8 bits, slave receives “1 0 1 1 0 0 1 0” from master.
Step 4: Master will continue to generate clock until slave responses if the master expects a response after sending a command through MOSI.
Step 5: Master receives all 8 bits from the slave.
Step 6: Master pulls CS line high to stop the transfer session.

Notes
In some cases, there will a delay between sending and receiving bytes (right before step 4). It is caused by the code execution time (interrupt, …) right after master finishes sending its first command. You might also encounter this while sending or reading multiple bytes through SPI and it’s called “inter-byte gap”

In full duplex mode like figure 06, the master can send and receive data at the same time. In this case, the master will shift data to the MOSI line at the clock’s falling edge and sampling data on the MISO line at the clock’s rising edge.

 

Different mode

Quad-SPI:

Overall:

Figure 07: Quad-SPI connection

  • Instead of using MISO and MOSI, we will take 2 more wires and change to a 4-bit data bi-directional bus.
  • Data transfer speed will be 4 times higher than the standard 4-wire SPI interface.
  • Often used in flash memories.

Quad-SPI operation

Figure 08: Quad-SPI operation

Here is the link to the Quad-SPI operation I’ve referenced: link.

Each command in Quad-SPI can include 5 phases: instruction, address, alternate byte, dummy, and data. Any phases can be skipped but at least 1 of them must be present.

Figure 09: 5 phase in Quad-SPI operation

  1. Instruction phase:
    8-bit instruction is sent, specifying the type of operation to be performed.
    Most flash memories can only receive instruction 1 bit at a time through DQ0.
    You can still send 2 bit at a time (through DQ0, DQ1) or 4 bit at a time (DQ0->DQ3) if the slave supports.

  2. Address phase:
    From 1 to 4 bytes address are sent, indicating the address of the operation.

  3. Alternate-bytes phase
    From 1 to 4 bytes are sent, generally to control the mode of operation.

  4. Dummy-cycles phase:
    From 1 to 31 cycles are given without any data being sent or received to allow the slave time to prepare for the data phase.
    The number of clock cycles in this phase can be configured in the register.
    At least 1 dummy cycle when using dual or quad mode.

  5. Data phase: 
    Any number of bytes can be sent to or received from slave.
    Depends on the mode we are using, data can be transfered either 1 bit at a time (DQ0, 3-wire mode), 2 bit at a time (DQ0-DQ1, dual SPI mode) or 4 bit at a time (DQ0-DQ3, quad SPI mode).

From Quad-SPI mode, by changing the number of data line (DQ0->DQ3) and the way we sampling data, there are 5 more different SPI protocol modes:

  1. 3-wire SPI mode
  2. Dual SPI mode
  3. Quad SPI mode
  4. Single data rate (SDR) mode
  5. Doule data rate (DDR) mode

3-wire SPI:

One thing can be noticed from section 2.3 is that MOSI and MISO signals are not always used at the same time. As a result, we can combine them to save 1 pin for other purposes.
In general, if your application does not need to transfer and receive data at the same time, we can switch to SPI 3-wire mode. 

Figure 10: 3-wire SPI

  • MISO and MOSI lines are mixed to a single bi-directional data line.
  • The operation is just the same with standard 4-wire mode instead of it needs time to swap from MISO to MOSI.
  • SPI now becomes half duplex.
  • Can be applied when we don’t need to transfer and receive data at the same time.

3-wire SPI operation:

Figure 11: 3-wire SPI operation

  • The 3-wire SPI operates mostly the same as the standard one (as described in section 2.3) except having a bus direction swap phase.
  • Number of the clock cycle in the swap phase can be determined in the MCU SPI-related register and must be at least 1. 

Dual-SPI:

Figure 12: Dual SPI

  • Just like the quad-SPI but only DQ0 and DQ1 bi-directional 2-bit data will be used.

Dual-SPI operation:

Figure 13: Dual SPI operation

Single data rate (SDR):

In SDR mode, the data is shifted only on one edge (either falling or rising edge). This is the default mode.
When the slave receives data in SDR mode, it has to response the data using the same edge as the master.

Double data rate (DDR):

In DDR mode, a bit is sent on both falling and rising edge expect the instruction phase. The instruction phase is always sent using one clock edge.

Figure 14: Double data rate in quad mode

 

Other type of SPI connection

Multiple slave

Figure 15: Multiple slaves 

Should have a pull-up register on the chip select line for each slave to reduce cross-talk between devices.

SPI master will pull low the selected slave’s CS pin to start communication. At this time, all other slaves’ CS pin are still in their normal state (high).

In systems with many slave devices, the MCU will need as many active-low SS outputs as the number of slaves. This architecture increases hardware and layout complexity. Therefore, daisy chain configuration

Daisy chain configuration

Figure 16: Daisy chain wiring

Figure 17: Daisy chain operation

 

Coming up next!

So we’ve just finished the second article in the Basic Communication Protocol series. In the next post, we will discuss the I2C protocol

Have fun!

 

WRITTEN BY

Trung Do

Firmware engineer, blogger and a makerholic 

Related Articles

Embedded knowledge: Basic communications protocol – UART

Embedded_01_Figure_08
admin

Embedded knowledge:
Basic communication protocol - UART

Share on facebook
Share on twitter
Share on linkedin

Foreword

Anyone who first enter to the embedded world are either taught or at least heard about the 3 most basic communication protocols: UART, SPI and I2C. These protocols, in other words, are as important as languages in human world.

Basically, different embedded hardwares (micro-controller, sensor, etc) support different ways to interact with. Some of them are only support one protocol, like people who can only speak in their mother language, while other supports several ones. Therefore, having a deep knowledge about them definitely help you a lot in building your own projects, especially if you are an embedded engineer.

Here are 3 all my blogs in “Embedded knowledge: Basic communications” series in case you’ve missed any of them:

1. Embedded knowledge: Basic communication protocol – UART.

2. Embedded knowledge: Basic communication protocol – SPI.

3. Embedded knowledge: Basic communication protocol – I2C.

In this series, I also share my working experience with these protocols.

Each section is arranged in order, includes explanation why we need to have this and that.

Now, let’s get started!

Introduction

Let’s begin with a simple case: Ted and Marshall are in a conversation.

Figure 01: Ted and Marshall are in a conversation

In the simplest case, there are 3 pre-conditions for a successful communication:

a. A or B or both are able to speak and hear.

b. A and B are speaking the same language (aka. protocol)

c. An environment for the conversation. This is air in this case.

The exact same things happen when we deal with the microcontroller. In order to transfer data between 2 devices:

a. One or both are able to transfer data (through a logic pin, antenna, …)

b. Both are using the same protocol (UART, I2C, SPI…)

c. Environment:

  • Wire: UART, I2C, SPI… . The number of wires depends on each protocol.
  • Wireless: bluetooth, wifi, NFC, …

Therefore, with UART, we will have

Figure 2: UART

Figure 02: UART

Connection

There are many ways to connect 2 microcontrollers in UART protocol themselves. Let’s take a look at some most popular.

Connection - Simplex

Figure 03: Simplex connection

  • Data transmission goes only in 1 direction.
  • Use 1 wire for data transferring (2 wires if having flow control)

Connection - Half-duplex

Figure 04: Half-duplex connection

  • Data transmission goes in both directions but not simultaneously.
  • Use 1 wire (2 wires if having flow control).
  • 1 pin will be either TX (transmit data) pin or RX (receive data) pin.
  • If flow control is enabled, it will be either CTS if its side would like to transfer data or RTS if receiving data. (We will talk about this in section 2.5)

Connection - Full-duplex

Figure 05: Full-duplex connection

  • Data transmission goes in both directions and simultaneously.
  •  Use 2 wires (4 wires if having flow control).

Notes

  • About flow control, I will explain what this is and why we need it in section 2.5
  • The most common way to set up UART is using the full-duplex without flow control.

Figure 06: Full-duplex, no flow control

Protocol

To make sure the data is correct, A and B must have to have the same 5 configures as below:

Figure 07: UART configurations

Why do we need Start bit and Stop bit?

UART stands for Universal Asynchronous Receiver/Transmitter. Asynchronous means that we are not using a clock signal to synchronize the transmission. 
→ Need another way to know when to read the data and when to stop transferring data.
→ Start bit and stop bit: these bits define the beginning and end of the data packet. Therefore, the receiving UART knows when to start reading the bits. 

Why do we need Baudrate?

When receiver detect start bit, it will start reading the incoming bits at a specified frequency called BAUD RATE (unit: bit per second) (aka sampling the data on the line with the frequency BAUD RATE)
→ Both UART have to at about the same Baudrate. If not, the data after sampling will return false value leading to wrong data.

Moreover, there is always an error rate when transferring data through UART. The baud rate between the transmitting and receiving UARTs can only differ by about 10% before the timing of bits gets too far off and cause an unacceptable error rate.

Here is an example of UART error rate corresponding to each Baudrate (picture is taken from MSP430FR userguide, pg.579)

Figure 07.1: UART Baudrate and error rate

As can be seen in the table, as Baudrate increases, the error rate is also increased. This is the trade-off between speed and efficiency you will have to accept.

Why do we need parity bit?

What if you want to have a high transfer bit but still want the received data correct? 

We will need an error detection method. By using this, we know when the data is wrong and therefore, request the TX side to re-send that data.

The parity bit is the simplest form of error detecting code and only can detect a single bit error. In the production code, we usually implement other methods like CRC, MD5,… to make sure our data is preserved.

I have another article here talking about hashing mechanism (CRC, MD5,..) in case you would like to know more.


Operation

And here is the whole communication process between A and B

Figure 08: Full UART transmission process

Purpose:
MCU 1 wants to transfer 8 bits “1 0 1 1 0 0 1 0” to MCU2

Pre-condition:
1. Transmit pin of MCU1 has to be connected with the Receive pin of MCU2.
2. Both sides has to be config as the same Start bit, Baud rate, Parity and Stop bit.

Process flow:
Step 1: MCU1 has 8bits needed to be sent out.

Step 2: MCU1 creates a full data frame with Start bit – Data – Parity – Stop bit.

Step 3: MCU1 starts generating the signal on its TX pin accordingly to the data frame. Bit width is 1/BAUDRATE.

Step 4: Right at the time MCU2 receives the falling edge on the line, it will start the clock signal to start sampling data. The clock period on MCU2 = bit width on MCU1 = 1/BAUDRATE. That’s why we have to has the same baud rate on both side or MCU2 cannot sampling at the correct frequency, leading to failure in reading data.

There are several ways to config sampling: sampled at the rising edge of the clock (as shown on the picture), sampled at the falling edge or sampled at the middle of the pulse.

Step 5: MCU2 stops the clock which is used to sampling data rafter receiving stop bit. The data will be processed after this step.

In the microcontroller, the UART peripheral will do all for us from step 2 to step 5. All we have to do is the “precondition” and step 1 (which is deciding what we want to send out). 

 

UART Bit-bang

What is UART Bit-bang?

So what if the MCU does not have enough UART module to use? or we want to use another pin for UART which is not supported by the UART peripherals. 
We have to create our own UART module, in another word, UART bit-bang.

In UART bit-bang, we have to do all step from 1 to 5 (section 2.3) by ourselves, from either 1 or both sides of 2 MCUs.

Advantages:

  • Can use other pins supporting GPIO.
  • Fully-control, actually we can do some modifications to make UART protocol more reliable. Considering the case that we are able to change the duration of start-bit from 1 bit to more than 1 bit. This is because sometimes the data line is accidentally pulled to low by some disturbance, long wiring, etc… By doing this we can make it more stable to the disturbance.

Disadvantages:

  • Complicated and can have potential bugs.
 

Flow control

What is flow control ?

Let’s consider about another case:

MCU1 transfers data to MCU2. However, MCU2’s processing time is slower than MCU1s.

At some point, MCU2 cannot keep up with this anymore. It either processes data or somehow empty its receive buffer before continuing to receive data. If not, it can be ended up by losing data while MCU1 continues to send out the message while MCU2 are busy with the old data.
→ Using flow control to handle this issue.

Flow control is the method to handle the UART communication between a fast and slow device without the risk of losing data.
MCU2 will use flow-control to create a signal back to MCU1 to stop/pause or resume the transmission.

There are severals way to implement flow-control to our UART protocol, either by using hardware flow-control or software flow-control.

Hardware flow control

Let’s take a look at how we config the hardware flow control in figure 09 and 10.

Figure 09: UART flow control config

Figure 10: UART flow control

A would like to transfer data to B, before sending data, A will check A_CTS (aka B_RTS). If that line is low → B is asserted it → B is still ready to receive more data → A continue to send.

Whenever B want to stop receiving data to process something else, it will de-assert the B_RTS (A_CTS) → A will check that line and stop sending.

Software flow control

Instead of using additional wires for flow control, software flow control start/stop by sending special flow control character on the TX/RX line.

The special flow control characters are usually the ASCII code XON and XOFF (0x11 and 0x13).

When A send XOFF to B, B will stop the transmission until receiving XON from A.

Coming up next!

So we’ve just finished the first article in the Basic Communication Protocol series. We will talk about the SPI protocol in the next article.

Have fun!

WRITTEN BY

Trung Do

Firmware engineer, blogger and a makerholic 

Related Articles