2020-05-15

Java - Foreign-Memory Access API

The Foreign-Memory Access API was proposed by JEP 370 and targeted to Java 14 in late 2019 as an incubating API. This JEP proposes to incorporate refinements based on feedback, and re-incubate the API in Java 15.

The following changes will be considered for inclusion:

A rich VarHandle combinator API, to customize memory access var handles;
Targeted support for parallel processing of a memory segment via the Spliterator interface;
Enhanced support for mapped memory segments (e.g., MappedMemorySegment::force);
Safe API points to support serial confinement (e.g., to transfer thread ownership between two threads); and
Unsafe API points to manipulate and dereference addresses coming from, e.g., native calls, or to wrap such addresses into synthetic memory segments.

Goals

Generality: A single API should be able to operate on various kinds of foreign memory (e.g., native memory, persistent memory, managed heap memory, etc.).
Safety: It should not be possible for the API to undermine the safety of the JVM, regardless of the kind of memory being operated upon.
Determinism: Deallocation operations on foreign memory should be explicit in source code.
Usability: For programs that need to access foreign memory, the API should be a compelling alternative to legacy Java APIs such as sun.misc.Unsafe.

The Foreign-Memory Access API introduces three main abstractions: MemorySegment, MemoryAddress, and MemoryLayout:

A MemorySegment models a contiguous memory region with given spatial and temporal bounds.
A MemoryAddress models an address. There are generally two kinds of addresses: A checked address is an offset within a given memory segment, while an unchecked address is an address whose spatial and temporal bounds are unknown, as in the case of a memory address obtained -- unsafely -- from native code.
A MemoryLayout is a programmatic description of a memory segment's contents.
Memory segments can be created from a variety of sources, such as native memory buffers, Java arrays, and byte buffers (either direct or heap-based). For instance, a native memory segment can be created as follows:

try (MemorySegment segment = MemorySegment.allocateNative(100)) {
   ...
}
This will create a memory segment that is associated with a native memory buffer whose size is 100 bytes.

Memory segments are spatially bounded, which means they have lower and upper bounds. Any attempt to use the segment to access memory outside of these bounds will result in an exception. As evidenced by the use of the try-with-resource construct, memory segments are also temporally bounded, which means they must be created, used, and then closed when no longer in use. Closing a segment is always an explicit operation and can result in additional side effects, such as deallocation of the memory associated with the segment. Any attempt to access an already-closed memory segment will result in an exception. Together, spatial and temporal bounding guarantee the safety of the Foreign-Memory Access API and thus guarantee that its use cannot crash the JVM.

Dereferencing the memory associated with a segment is achieved by obtaining a var handle, which is an abstraction for data access introduced in Java 9. In particular, a segment is dereferenced with a memory-access var handle. This kind of var handle has an access coordinate of type MemoryAddress that serves as the address at which the dereference occurs.

Memory-access var handles are obtained using factory methods in the MemoryHandles class. For instance, to set the elements of a native memory segment, we could use a memory-access var handle as follows:

VarHandle intHandle = MemoryHandles.varHandle(int.class,
        ByteOrder.nativeOrder());

try (MemorySegment segment = MemorySegment.allocateNative(100)) {
    MemoryAddress base = segment.baseAddress();
    for (int i = 0; i < 25; i++) {
        intHandle.set(base.addOffset(i * 4), i);
    }
}
Memory-access var handles can acquire extra access coordinates, of type long, to support more complex addressing schemes, such as multi-dimensional addressing of an otherwise flat memory segment. Such memory-access var handles are typically obtained by invoking combinator methods defined in the MemoryHandles class. For instance, a more direct way to set the elements of a native memory segment is through an indexed memory-access var handle, constructed as follows:

VarHandle intHandle = MemoryHandles.varHandle(int.class,
        ByteOrder.nativeOrder());
VarHandle indexedElementHandle = MemoryHandles.withStride(intHandle, 4);

try (MemorySegment segment = MemorySegment.allocateNative(100)) {
    MemoryAddress base = segment.baseAddress();
    for (int i = 0; i < 25; i++) {
        indexedElementHandle.set(base, (long) i, i);
    }
}
To enhance the expressiveness of the API, and to reduce the need for explicit numeric computations such as those in the above examples, a MemoryLayout can be used to programmatically describe the content of a MemorySegment. For instance, the layout of the native memory segment used in the above examples can be described in the following way:

SequenceLayout intArrayLayout
    = MemoryLayout.ofSequence(25,
        MemoryLayout.ofValueBits(32,
            ByteOrder.nativeOrder()));
This creates a sequence memory layout in which a given element layout (a 32-bit value) is repeated 25 times. Once we have a memory layout, we can get rid of all the manual numeric computation in our code and also simplify the creation of the required memory access var handles, as shown in the following example:

SequenceLayout intArrayLayout
    = MemoryLayout.ofSequence(25,
        MemoryLayout.ofValueBits(32,
            ByteOrder.nativeOrder()));

VarHandle indexedElementHandle
    = intArrayLayout.varHandle(int.class,
        PathElement.sequenceElement());

try (MemorySegment segment = MemorySegment.allocateNative(intArrayLayout)) {
    MemoryAddress base = segment.baseAddress();
    for (int i = 0; i < intArrayLayout.elementCount().getAsLong(); i++) {
        indexedElementHandle.set(base, (long) i, i);
    }
}
In this example, the layout object drives the creation of the memory-access var handle through the creation of a layout path, which is used to select a nested layout from a complex layout expression. The layout object also drives the allocation of the native memory segment, which is based upon size and alignment information derived from the layout. The loop constant in the previous examples (25) has been replaced with the sequence layout's element count.

Dereference operations are only possible on checked memory addresses. Checked addresses are typical in the API, such as the address obtained from a memory segment in the above code (segment.baseAddress()). However, if a memory address is unchecked and does not have any associated segment, then it cannot be dereferenced safely, since the runtime has no way to know the spatial and temporal bounds associated with the address. Some helper functions will be provided to, e.g., attach spatial bounds to an otherwise unchecked address, so as to allow dereference operations. Such operations are, however, unsafe by their very nature, and must be used with care. The API might require such unsafe operations to be enabled by a command-line option at startup.

The Foreign-Memory Access API will be provided as an incubator module named jdk.incubator.foreign, in a package of the same name.

No comments:

Post a Comment