Project Nayuki


Java Native Interface compared to Python/C API

Introduction

Java and Python are both high-level object-oriented programming languages. Sometimes they need to invoke code implemented in a lower level language like C++, C, and assembly for the sake of faster calculations, finer control over memory layout, or machine-specific features.

I have worked with the standard foreign function interface (FFI) in both languages – Java’s JNI and CPython’s C API – and found interesting differences in the architecture of their functions and data values. By comparing and contrasting the two languages, I hope to illustrate that JNI and CPython take diametrically opposite approaches in many aspects. Both approaches are appropriate for their respective language environment, but would fare poorly in the other language – neither way is right or wrong in an absolute sense.

Code examples

Here are two complete application examples in both Java and Python. Download all the files in java-python-native-examples.zip, or view individual files below:

Program Java Python
Sum array
Create map

The first program, “Sum array”, demonstrates how native C code can read numbers coming from Java/Python code. The comparison is not fair because Java has primitive integer types whereas Python only has bigint objects. The Java version shows how the Java primitive types easily correspond to C integer types, and how arrays of numbers are accessed in JNI. The Python version shows how to convert between Python’s int objects and native C integer types.

The second program, “Create map”, demonstrates how native C code can create a complex graph of Java or Python objects. The Java version looks up existing Java classes and methods, builds strings in C, then invokes the Java methods to create the necessary objects. The Python version looks up one custom function, but otherwise invokes statically known global C functions to create and manipulate Python objects; note that we need to pay attention to reference counts and handle reference stealing and decrementing properly. In this demonstration, there is a fairly good correspondence between the native code in the JNI and CPython versions, because they both take an object-oriented approach to structuring the data.

The C code for Python only works with the CPython 3 interpreter, not Python 2 or a different interpreter. The C code for Java should work with all compliant JDK and JVM implementations.

Feature comparison

Feature Java Python
Native library loading
Loading mechanism
  • First a Java .class file is loaded. Then the class calls System.loadLibrary(String) to load and activate the native methods that the given native library implements.
  • System.loadLibrary() only needs to be called before the native method is invoked. It can be done early or late.
  • If the native library declares a JNI_OnLoad() function (optional), then it is invoked. Otherwise no native code is run when the library is loaded
  • The native library defines the entirety of a Python module. There is no pure Python code in the module at all.
  • The act of loading a native library causes native code to be executed, namely the initializer function.
  • The native library is loaded when another module imports it. The import statement is in the same syntax as importing a pure Python module.
Module mapping
  • Has a many-to-many relationship between native libraries (.so/.dll file) and Java classes.
  • A native library can supply method implementations needed by many classes. Also, a class can load multiple native libraries to cover all its native methods.
  • Consequently, there is no restriction whatsoever on the file name of the native library.
  • One native library corresponds to one Python module, generally speaking. Its file name must be the name of the module, so that the interpreter can find the file when the module gets imported.
Member mapping
  • The JVM examines the native library’s exported function symbols and matches them up with Java method names.
  • Thus the C-Java mapping is implicit. Use the javah tool to generate C function prototypes conveniently.
  • The native library defines a list of tuples, each consisting of {member name, pointer to function, extra metadata}. Then it calls a CPython interpreter function (such as PyModule_Create()) with this list to register this new module and its members.
  • Thus the C-Python mapping is explicit. The member functions can be named arbitrarily and don’t even have to be exported symbols (i.e. they can be declared as static in C).
Working with functions and data
Unique data types
  • Primitive types: jboolean, jbyte, jshort, jchar, jint, jsize, jlong, jfloat, jdouble.
  • Reference types: jobject, jclass, jstring, jthrowable, jarray, jobjectArray, jbooleanArray, …, jdoubleArray.
  • Pointer to Python object: PyObject*, PyTypeObject*, PyLongObject*, PyListObject*, etc.
  • Integers: Py_ssize_t, PY_LONG_LONG, Py_UNICODE, Py_UCS1, Py_UCS2, Py_UCS4.
Integer types used
  • char is used, but other idiomatic C types like int, long, size_t, uint32_t are never mentioned.
  • Array offsets and lengths are given as jsize (alias of jint, which means int32_t).
  • C types used: char, int, long, unsigned int, unsigned long, size_t.
  • Fixed-bitwidth types like uint32_t are not used.
  • Offsets and lengths are usually given in Py_ssize_t.
Arithmetic operations
  • All Java primitive numeric types map directly onto C/C++ primitive types. Thus doing arithmetic on them (e.g. jbyte + jint) uses C language semantics and does not involve the JVM or JNI.
  • Boxed numbers (such as Integer) are cumbersome to use in JNI. Because they don’t have any arithmetic methods, they need to be converted to primitive numbers to do anything useful.
  • However, BigInteger and BigDecimal do have arithmetic methods, and should only be used in their object form. (They should not be converted to fixed-width primitive numbers due to potential loss of precision.)
  • Operate on Python numeric objects (e.g. int, float, fraction.Fraction) using the abstract number protocol: PyNumber_Add(), PyNumber_Multiply(), etc.
  • Can convert Python’s native bigint objects to fixed-width C numbers, but need to handle overflow conditions.
Accessing native API functionality
  • The JNIEnv structure pointer is always passed as the first argument to every JNI user function. It contains pointers to all the JNI API functions, and this is the only way to access them. This indirect design avoids polluting the C global namespace and enables multithreaded JVM implementations.
  • In C, a JNI API call is always of the form (*env)->FooBar(env, ...); the first argument is the env pointer, which appears redundant and clunky. In C++, a JNI API call can also be expressed as env->FooBar(...), which is de-sugared into the C form.
  • There are fewer than 70 JNI functions, making it a compact and easy-to-learn API. (We assume that functions parameterized by type or style are merged.)
  • All Python/C API functions are global and their names begin with Py. The naming scheme is in the format Py<Module>_<Function>().
  • There are over 700 of these API global functions. These functions have some patterns to reduce the learning effort, but many have subtle behaviors that require individual attention.
  • All the popular built-in types and their methods are directly accessible as global functions. For example: PyDict_New(), PyList_Size(), PyUnicode_Concat().
  • Built-in functions like abs(), open(), range(), etc. are not directly available, and need to be retrieved through PyEval_GetBuiltins().
Native function/method arguments
  • Native methods in C/C++ have a static type signature that corresponds to the Java type signature. Numeric types are mapped to j-numeric types (e.g. Java int → C jint), and all reference types are mapped to jobject.
  • There is only one way to declare the function parameters of a native function.
  • Method arguments are directly accessible in native code using C semantics, without calling JNI APIs.
  • Every value in Python is an object (PyObject*). Hence a native function will receive Python objects as arguments.
  • When a native module is initialized, each member function’s call style is explicitly indicated. For example:
    • METH_NOARGS:
      f(PyObject *self)
    • METH_O:
      f(PyObject *self, PyObject *one)
    • METH_VARARGS:
      f(PyObject *self, PyObject *list)
    • METH_KEYWORDS:
      f(PyObject *self, PyObject *list, PyObject *dict)
  • When a function has multiple arguments, you need to call a function like PyArg_ParseTuple() or PyList_GetItem() to retrieve individual items.
Native function/method return value
  • Type: jboolean, jint, jfloat, etc., jobject.
  • Value: A number, C’s NULL (maps to Java’s null), or a pointer to a valid Java object.
  • When returning a primitive numeric value, there is no need to call the JNI API.
  • No effort needed to track references upon return.
  • Type: Always PyObject*. Never a C numeric type.
  • Value: Always a pointer to a valid Python object (possibly Py_None). Never C’s NULL.
  • Return values must come from existing Python objects or be newly constructed using the Python API functions.
  • Need to increment the reference count of the returned object, but weighed against any decrements due to the stack frame being popped.
Calling managed methods/functions
  • Constructing objects and calling methods is done through a multi-step, rather cumbersome, but very uniform mechanism.
  • The constructors/methods that can be called through JNI are exactly the ones that are declared in the Java class and hence can be called in Java code.
  • The procedure in JNI is to look up the jclass object, look up the constructor/method based on the type signature to get a jmethodID handle, and then call the method with an instance and appropriate arguments.
  • Constructing objects and calling methods is done differently for each built-in type (e.g. int, str, list, dict, etc.), and again differently for custom types.
  • For built-in types, simply find the appropriate global function in the documentation (e.g. PyTuple_New(), PyDict_DelItem()) and call it with the appropriate values. Note that these C API functions usually take different arguments compared to their pure Python analogs; this is a separate API compared to the pure Python stuff.
  • For custom types, the procedure is similar to Java, which involves looking up the module, looking up the class (if applicable), looking up the method/member by name, and then calling the method handle with appropriate arguments.
Preprocessor macros
  • Several macros that define values (e.g. JNI_FALSE).
  • No macros compute expressions or call functions.
  • Py_INCREF(), Py_DECREF(), Py_RETURN_NONE, others.
  • Some macros behave like functions, while some are like statements (e.g. return).
Object reference management
  • Has an elaborate system of global, local, and weak references. Each reference is an independent item, and objects are not reference-counted.
  • Local references are implicitly created by many JNI API functions, and are automatically freed when the JNI user function returns.
  • Global references are valid even when no JNI function is active.
  • Weak global references don’t prevent the garbage collector from disposing of an object, if there are no other strong references.
  • Using references correctly is not much extra burden compared to managing references in pure Java code.
  • Uses explicit reference counting. Provides functions to increment or decrement the number of references of a given object.
  • The system of reference counting is conceptually simple but difficult to use in practice.
  • Need to understand which API calls increment the reference count (“new reference”) and which calls leave the count uncharged (“borrowed/stolen reference”).
  • Needs great care to avoid over-incrementing/under-decrementing the count (memory leak) or under-incrementing/over-decrementing the count (double-free or use-after-free). This is a consequence of using reference counts rather than reference handles.

Condensed list of JNI API functions

The entire JNI API has just under 70 unique functions as of Java 8. (This is the number once you merge the raw functions that are simply variations on return type or argument style.) This API is surprisingly compact, and you can reasonably memorize the 10 to 20 most commonly used functions after a bit of practice, and rarely need to refer to the documentation thereafter.

-- Getting handles --
FindClass()
DefineClass()
GetSuperclass()
GetStaticFieldID()
GetStaticMethodID()
GetFieldID()
GetMethodID()
FromReflectedField()
FromReflectedMethod()
ToReflectedField()
ToReflectedMethod()

-- Methods, fields, objects --
GetStatic<Type>Field()
SetStatic<Type>Field()
CallStatic<Type>Method<Style>()
IsSameObject()
IsInstanceOf()
IsAssignableFrom()
GetObjectClass()
AllocObject()
NewObject<Style>()
CallNonvirtual<Type>Method<Style>()
Call<Type>Method<Style>()
Get<Type>Field()
Set<Type>Field()
Note: <Style> is blank, A, or V.
Note: <Type> is Boolean, Byte, Short, Char,
      Int, Long, Float, Double, or Object.

-- Object references --
GetObjectRefType()
NewGlobalRef()
NewWeakGlobalRef()
DeleteGlobalRef()
DeleteWeakGlobalRef()
EnsureLocalCapacity()
PushLocalFrame()
PopLocalFrame()
NewLocalRef()
DeleteLocalRef()

-- Arrays --
New<Type>Array()
GetArrayLength()
GetObjectArrayElement()
SetObjectArrayElement()
Get<Primitive>ArrayElements()
Release<Primitive>ArrayElements()
Get<Primitive>ArrayRegion()
Set<Primitive>ArrayRegion()
GetPrimitiveArrayCritical()
ReleasePrimitiveArrayCritical()
Note: <Primitive> is Boolean, Byte, Short, Char,
      Int, Long, Float, or Double (but not Object).

-- Strings --
GetString<Style>Length()
GetString<Style>Chars()
NewString<Style>()
ReleaseString<Style>Chars()
GetString<Style>Region()
GetStringCritical()
ReleaseStringCritical()
Note: <Style> is blank or UTF.

-- Exception mechanism --
Throw()
ThrowNew()
ExceptionCheck()
ExceptionOccurred()
ExceptionDescribe()
ExceptionClear()
FatalError()

-- Miscellaneous --
GetVersion()
GetJavaVM()
RegisterNatives()
UnregisterNatives()
MonitorEnter()
MonitorExit()
NewDirectByteBuffer()
GetDirectBufferAddress()
GetDirectBufferCapacity()

Notes:

  • In the list above, the function names come from the JNI API, but the subheading names and the way the functions are grouped are made by Nayuki (not a part of any official specification).

  • Working with static methods and fields uses a different API than working with instance methods/fields. This differs from how the java.lang.reflect API treats both cases the same.

  • All of the JNI string functions are for convenience and not strictly necessary. Their functionalities can still be accomplished tediously using numerous reflection function calls and temporary values.

Partial list of Python/C API functions

The list of Python/C API functions/macros is enormous, and has over 700 entries as of CPython 3.5. Some very low-level or esoteric functions were arbitrarily excluded by me. An explicit list of entries can be found in python-c-api-function-list.txt. A summary of function counts per topic is given below:

  5: Reference counting
 79: Exception handling
 22: Operating system utilities
 25: Importing modules
  8: Data marshalling support
  9: Parsing arguments and building values
  6: String conversion and formatting
  7: Reflection
 18: Codec registry and support functions

 41: Object protocol
 38: Number protocol
 23: Sequence protocol
 12: Mapping protocol
  2: Iterator protocol
  7: Buffer protocol

 14: Type objects
 25: Integer objects
  2: Boolean objects
 10: Floating point objects
 13: Complex number objects
 15: Bytes objects
 10: Byte array objects
138: Unicode objects and codecs
 21: Tuple objects
 17: List objects
 24: Dictionary objects
 15: Set objects
 12: Function objects
 11: Instance method objects
  5: File objects
 27: Module objects
  4: Iterator objects
  4: Slice objects
  7: Weak reference objects
  4: Generator objects
  2: Coroutine objects
 30: DateTime objects

 23: Common object structures