How are method references implemented in Java 8 inside the JVM?


I was wondering how method references(::) are implemented in Java 8. For comparison, I will describe how they are implemented in Object Pascal-in the simplest and most efficient way. A pointer to a method in it is a record of two pointers:

TMethod = record
  Code, Data: Pointer;
end;

One stores the address of the procedure, the other stores the address of the owner object.

Button1.OnClick = MyClick1;

Translates to just copying two pointers.
Calling such a method is calling a procedure at an address, with passing an implicit parameter this.

But I was told here that in Java 8, method references are syntactic sugar and that something terrible is hidden behind them. Is this really true? Has anyone looked at what code on the JVM translates references to methods?

Author: Anton Sorokin, 2019-04-14

1 answers

In Java, method references are abbreviated lambda expressions:

There are 3 method reference constructs in total:

  1. object::instanceMethod - refers to the object method of the proposed object
  2. Class::staticMethod - refers to the static method of the{[23] class]}
  3. Class::instanceMethod - refers to the object method of the proposed object. It works the same way as in point 1, only with the class name

Examples of the implementation of these constructs and their lambda analogues below:

  1. System.out::println equals x -> System.out.println(x)
  2. Math::max equals (x,y) -> Math.max(x,y)
  3. String::length equals x -> x.length()

But still, references to methods are implemented a little differently(more on this below).

In essence, lambda expressions are like anonymous classes with one method, but implemented in a different way. A detailed analysis of their implementation is below.


A brief selection from the article, about how lambdas work under the hood of the JVM:

Anonymous internal classes have undesirable characteristics that can affect the performance of your application. First, the compiler generates a new class file for each anonymous internal class. Creating many class files is undesirable because each class file must be loaded and checked before use, which affects the performance when the application starts. If the lambdas were translated into anonymous inner classes, you would have a new class file for each lambda. As a result, anonymous inner classes will increase your application's memory consumption.

Instead of using a separate class for lambda expressions, Java version 8 relies on the bytecode instruction invokedynamic, added in Java version 7. The instruction invokedynamic is focused on the bootstrap method, which, in turn, creates an implementation of the lambda expression when this method is first called.

The translation of a lambda expression into bytecode is performed in two steps: stages:

  1. A dynamic lambda factory is generated, which, when called, returns an instance of the function interface to which the lambda is converted.
  2. The body of the lambda expression is converted to a method that will be called using invokedynamic.

In the case of method references, everything happens almost the same as for lambdas - but javac does not generate a syntactic method(because it already exists in the class from which the method is called, and in lambdas there is no such class), and can refer to the desired method directly.

To illustrate the first step, let's look at the bytecode generated from a simple class containing a lambda:

import java.util.function.Function;

public class Lambda {
    Function<String, Integer> f = s -> Integer.parseInt(s);
}

This class is generated in the following bytecode:

 0: aload_0
 1: invokespecial #1 // Method java/lang/Object."<init>":()V
 4: aload_0
 5: invokedynamic #2, 0 // InvokeDynamic
                  #0:apply:()Ljava/util/function/Function;
10: putfield #3 // Field f:Ljava/util/function/Function;
13: return

How the second step is performed depends on whether the lambda expression is raw(the lambda does not access any variables outside of its body), or trailing (the lambda accesses variables outside of it bodies).

Note: closure - when a lambda expression uses a variable declared outside of that expression.

In the first case, the lambda expressions are simply turned into a static method that has exactly the same signature of the lambda expression, and are declared in the same class where the lambda expression is used. For example, a lambda expression declared in the Lambda class above can be converted to a method like this:

static Integer lambda$1(String s) {
    return Integer.parseInt(s);
}

Short circuit case lambda expressions are a bit more complicated, because the trailing variables must be passed to a method that implements the body of the lambda expression along with the formal arguments of the lambda expression. In this case, the general strategy is to use the arguments of a lambda expression with an additional argument for each external variable. Let's look at a practical example:

int offset = 100;
Function<String, Integer> f = s -> Integer.parseInt(s) + offset; 

The corresponding method implementation will be generated like this:

static Integer lambda$1(int offset, String s) {
    return Integer.parseInt(s) + offset;
}

There is also a quote from Brian Getz from this answer to enSO:

When the compiler encounters a lambda expression, it first lowers the body of the lambda into a method(similar to a lambda), possibly with some additional arguments (if the lambda is trailing). At the moment when the lambda expression is captured, it generates a dynamic call location (CallSite object), which, when called, returns an instance of the functional interface to which the lambda expression was converted. This is the place of the call it is called a lambda-factory for a given lambda. Dynamic lambda factory arguments are values derived from a lexical context. The bootstrap method of the lambda expression factory is a standardized method in the Java runtime library called the lambda expression metafabric.

Method references are handled in the same way as lambda expressions, except that most method references do not need to be entered in a new one. method; we can simply load a handle for the reference method and pass it to the metafab.


UPD: I highly recommend this article from Habr: Parsing lambda expressions in Java

I used two articles as sources - Java 8 Lambdas report A Peek Under the Hood and IBM article about lambdas.

 7
Author: Anton Sorokin, 2020-11-17 11:45:12