글쓴이:Ross Bencina (rossb@audiomulch.com)
제목:Re: What does Memory Barriers mean ??
View: Complete Thread (14 글)
Original Format
뉴스그룹:comp.programming.threads
날짜:2004-06-10 08:57:30 PST
[see below for revised Memory Barrier definition which I intend to post to
wikipedia]
I wasn’t sure where to put this post, but I’ve revised the definition
considerably to include everyone’s comments including those of the parent
(thanks Joe).
I’ve avoided confusing “memory visibility semantics” and “memory barriers
and hardware memory models” and tried to make it clear that these are
different levels of abstraction. This also allows me to discuss the hazards
of depending on a particular mapping between them.
I’ve made it clear that out-of-order execution is the main reason memory
barriers are needed. I havn’t mentioned cache coherency at all, inspite of
Joe’s mention of the Alpha processor.
There is now a small mention of C’s “volatile” keyword.
I would have liked to include a discussion of “portable” memory barrier
instructions such as the Linux kernel functions.. perhaps in a later
revision this can be covered.
—
Memory Barrier
a.k.a membar or memory fence
Modern CPUs employ mechanisms which can result in operations being executed
out-of-order, including memory loads and stores. A Memory Barrier is a
general term used to refer to instructions which cause the CPU to enforce an
ordering constraint on memory operations issued before and after the barrier
instruction. The exact nature of the ordering constraint is hardware
dependent, and is defined by the architecture’s memory model. Some
architectures provide multiple barriers for enforcing different ordering
constraints.
Memory barriers are typically used when implementing low-level code which
operates on memory shared by multiple devices. Such code includes
sychronisation primitives and lock free data structures on multiprocessor
systems, and drivers which communicate with hardware.
An Illustrative Example
———————–
When a program runs on a singe CPU, the hardware performs the necessary
book-keeping to ensure that programs execute as if all memory operations
were performed in program order, hence memory barriers are not necessary.
However, when the memory is shared with multiple devices, such as other CPUs
in a multiprocessor system, or memory mapped prehipherals, out-of-order
access may affect program behavior. For example a second CPU may see memory
changes made by the first CPU in a sequence which differs from program
order.
The following two processor program gives a concrete example of how such
out-of-order execution can affect program behavior:
>>>
Initially, memory locations x and f both hold the value 0. The program
running on processor #1 loops until the value of f is non-zero, then it
prints the value of x. The program running on processor #2 stores the value
42 into x and then stores the value 1 into f. Pseudo code for the two
program fragments is shown below. The steps of the program correspond to
individual processor instructions.
Processor #1:
loop:
load the value in location f, if it is 0 goto loop
print the value in location x
Processor #2:
store the value 42 into location x
store the value 1 into location f
<<<
You might expect the print statement to always print the number "42",
however if processor #1's store operations are executed out-of-order it is
possible that f would be updated _before_ x, and the print statement might
print "0". For some programs this situation is not acceptable. A memory
barrier can be inserted before processor #1's assignment to f to ensure that
the new value of x was visible to other processors at or prior to the change
in the value of f.
Low Level Architecture-Specific Primitives
------------------------------------------
Memory barriers are low level primitives which are part of the definition of
an architecture's memory model. Like instruction sets, memory models vary
considerably between architctures, so it is not appropriate to generalise
about memory barrier behavior. The received wisdom is that to use memory
barriers correctly you should study the architecture manuals for the
hardware which you are programming. That said, the following paragraph
offers a glimpse of some memory barriers which exist in the wild.
Some architectures provide only a single memory barrier instruction
sometimes called "full fence". A full fence ensures that all load and store
operations prior to the fence will have been commited prior to any loads and
stores issued following the fence. Other architectures provide separate
"aquire" and "release" memory barriers which address the visibility of
read-after-write operations from the point of view of a reader (sink) or
writer (source) respectively. Some architectures provide separate memory
barriers to control ordering between different combinations of system memory
and i/o memory. When more than one memory barrier instruction is available
it is important to consider that the cost of different instructions may vary
considerably.
"Threaded" Programming and Memory Visibility
--------------------------------------------
Threaded programs usually use synchronisation primitives provided by a
high-level programming environment such as Java, or a C API such as POSIX
pthreads or Win32. Primitives such as mutexes and semaphores are provided to
synchronise access to resources from paralell threads of execution. These
primitives are usually implemented with the memory barriers required to
provide the expected memory visibility semantics. When using such
environments explicit use of memory barriers is not generally necessary.
Each API or programming environment has it's own high-level memory model
which defines its memory visibility semantics. Although you don't usually
need to use memory barriers in such high level environments, it's important
to understand their memory visibility semantics. Such understanding is not
necessarily easy to achieve because memory visibility semantics are not
always consistently specified or documented.
Just as programming language semantics are defined at a different level of
abstraction to machine language opcodes, a programming environment's memory
model is defined at a different level of abstraction to that of a hardware
memory model. It's important to understand this distinction and realise that
there is not always a simple mapping between low-level hardware memory
barrier semantics and the high-level memory visibility semantics of a
particular programming environment. As a result, a particular platform's
implementation of (say) pthreads may employ stronger barriers than required
by the specification. Programs which take advantage of memory visibility
as-implemented rather than as-specified may not be portable.
Out-of-order Execution vs. Compiler Reordering Optimisations
------------------------------------------------------------
Memory barrier instructions only address reordering effects at the hardware
level. Compilers may also reorder instructions as part of the program
optimization process. Although the effects on parallel program behavior can
be similar in both cases, in general it is necessary to take separate
measures to inhibit compiler reordering optimisations for data that may be
shared by multiple threads of execution. Note that such measures are usually
only necessary for data which is not protected by synchronisation primitives
such as those discussed in the previous section.
In C, the “volatile” keyword is provided to inhibit optimisations which
remove or reorder memory operations on a variable marked as volatile. This
will provide a kind of barrier for interruptions which occur on a single
CPU, such as signal handlers or concurrent threads on a uniprocessor system.
However, the use of “volatile” is insufficient to guarantee correct ordering
for multiprocessor systems because it only impacts reorderings performed by
the compiler, not those which may be performed by the CPU during execution.
Some languages and compilers may provide sufficient facilities to implement
functions which address both the compiler reordering and machine reordering
issues, however it is usually advisable to be very careful about this, for
example by carefully inspecting compiler generated code. Some developers
advocate coding in assembley language to avoid compiler reordering issues.
Comments anyone?
Ross.
–
<snip>
> In C, the “volatile” keyword is provided to inhibit optimisations which
> remove or reorder memory operations on a variable marked as volatile. This
> will provide a kind of barrier for interruptions which occur on a single
> CPU, such as signal handlers or concurrent threads on a uniprocessor system.
> However, the use of “volatile” is insufficient to guarantee correct ordering
> for multiprocessor systems because it only impacts reorderings performed by
> the compiler, not those which may be performed by the CPU during execution.
>
> Some languages and compilers may provide sufficient facilities to implement
> functions which address both the compiler reordering and machine reordering
> issues, however it is usually advisable to be very careful about this, for
> example by carefully inspecting compiler generated code. Some developers
> advocate coding in assembley language to avoid compiler reordering issues.
</snip>
How about java and .NET? They also have volatile keyword and AFAIK,
volatile keyword has the same effect as synchronized/lock. (Though the
JDK 1.4 or lower does not properly implementing volatile, let’s
concentrate on the intented meaning of volatile.)
For example,
int a;
Object lock;
…
void someMethod() {
synchronized(lock) {
a++;
}
}
has the same effect as
volatile int a;
void someMethod() {
a++;
}
In other words, volatile in JAVA and .NET is also a construct of
*locking* as well as preventing reordering. Am I missing something
here?
Regards,
Minkoo Seo
–
목록안의 메시지 15
글쓴이:Alexander Terekhov (terekhov@web.de)
제목:Re: What does Memory Barriers mean ??
View this article only
뉴스그룹:comp.programming.threads
날짜:2004-06-12 09:02:54 PST
Min-Koo Seo wrote:
[…]
> void someMethod() {
> synchronized(lock) {
> a++;
> }
> }
>
> has the same effect as
>
> volatile int a;
>
> void someMethod() {
> a++;
> }
>
> In other words, volatile in JAVA [… snip …] is also a construct
> of *locking* as well as preventing reordering.
No. In {revised} Java, volatile reads and writes are atomic (and
they also seem to prevent reordering in a somewhat “stronger” way
than locks — StoreLoad barrier), but there’s no guarantee that
volatile read-modify-write is atomic. Volatiles are braindead.
regards,
alexander.
–
목록안의 메시지 20
글쓴이:SenderX (xxx@xxx.com)
제목:Re: What does Memory Barriers mean ??
View this article only
뉴스그룹:comp.programming.threads
날짜:2004-06-12 14:00:49 PST
> but there’s no guarantee that
> > volatile read-modify-write is atomic.
That’s what java’s atomic ops are for…
;)
–
글쓴이:SenderX (xxx@xxx.com)
제목:Re: What does Memory Barriers mean ??
View this article only
뉴스그룹:comp.programming.threads
날짜:2004-06-12 15:48:20 PST
> > > Volatiles are braindead.
The C/C++ std shoud rip its dead brain out, and replace it with a new one
that can actually comprehend threads and memory visibility…
;)
Leave a Reply