Optimizations around atomic load stores in C++

By Ritesh Sahu - July 12, 2022

I have read about std::memory_order in C++ and understood partially. But I still had some doubts around it.

Explanation on std::memory_order_acquire says that, no reads or writes in the current thread can be reordered before this load. Does that mean compiler and cpu is not allowed to move any instruction present below the acquire statement, above it?

auto y = x.load(std::memory_order_acquire);
z = a;  // is it leagal to execute loading of shared `b` above acquire? (I feel no)
b = 2;  // is it leagal to execute storing of shared `a` above acquire? (I feel yes)

I can reason out why it is illegal for executing loads before acquire. But why it is illegal for stores?

Is it illegal to skip a useless load or store from atomic objects? Since they are not volatile, and as I know only volatile has this requirement.

auto y = x.load(std::memory_order_acquire);  // `y` is never used
return;

This optimization is not happening even with relaxed memory order.

Is compiler allowed to move instructions present above acquire statement, below it?

z = a;  // is it leagal to execute loading of shared `b` below acquire? (I feel yes)
b = 2;  // is it leagal to execute storing of shared `a` below acquire? (I feel yes)
auto y = x.load(std::memory_order_acquire);

Can two loads or stores be reordered without crossing acquire boundary?

auto y = x.load(std::memory_order_acquire);
a = p;  // can this move below the below line?
b = q;  // shared `a` and `b`

Similar and corresponding 4 doubts with release semantics also.

Related to 2nd and 3rd question, why no compiler is optimizing f(), as aggressive as g() in below code?

#include <atomic>

int a, b;

void dummy(int*);

void f(std::atomic<int> &x) {
    int z;
    z = a;  // loading shared `a` before acquire
    b = 2;  // storing shared `b` before acquire
    auto y = x.load(std::memory_order_acquire);
    z = a;  // loading shared `a` after acquire
    b = 2;  // storing shared `b` after acquire
    dummy(&z);
}

void g(int &x) {
    int z;
    z = a;
    b = 2;
    auto y = x;
    z = a;
    b = 2;
    dummy(&z);
}

f(std::atomic<int>&):
        sub     rsp, 24
        mov     eax, DWORD PTR a[rip]
        mov     DWORD PTR b[rip], 2
        mov     DWORD PTR [rsp+12], eax
        mov     eax, DWORD PTR [rdi]
        lea     rdi, [rsp+12]
        mov     DWORD PTR b[rip], 2
        mov     eax, DWORD PTR a[rip]
        mov     DWORD PTR [rsp+12], eax
        call    dummy(int*)
        add     rsp, 24
        ret
g(int&):
        sub     rsp, 24
        mov     eax, DWORD PTR a[rip]
        mov     DWORD PTR b[rip], 2
        lea     rdi, [rsp+12]
        mov     DWORD PTR [rsp+12], eax
        call    dummy(int*)
        add     rsp, 24
        ret
b:
        .zero   4
a:
        .zero   4

Search This Blog

Theprogrammersfirst | A technical portal.

Optimizations around atomic load stores in C++

Comments

Post a Comment

Popular posts from this blog

Today Walkin 14th-Sept

Network Error and Timeout on Authorize.net JS

Spring Elasticsearch Operations