What is interning and how to use it


What is interning? What is it used for? When should it be used and what possible pitfalls?

Author: andreycha, 2015-04-19

1 answers

Interning is a method of storing only one copy of many identical objects. Applies in C# and Java to strings, as well as (in Java) to small numbers.

Let's look at the example of strings. When you say string.Intern(s) in C# or s.intern() in Java for the string s, you get a string with the same content, but the returned string is guaranteed to be the same (i.e., the same object), if you request an interned string with the same content. Also, string constants are automatically interned.

However, strings obtained in a different way, such as through StringBuilder or concatenation, will not be interned, at least in the current version of the languages. (However, the optimizer can optimize concatenation if it can calculate the arguments at compile time, so you shouldn't count on it.)

Example:

// C#
object.ReferenceEquals("123", "123")                             // true
object.ReferenceEquals(string.Intern("12" + "3"), "123")         // true
char[] chars = new[] { '1', '2', '3' };
object.ReferenceEquals(new string(chars), new string(chars))     // false
object.ReferenceEquals(new string(chars), "123")                 // false
object.ReferenceEquals(string.Intern(new string(chars)), "123")  // true

// Java
"123" == "123"                          // true
("12" + "3").intern() == "123"          // true
new String("123") == new String("123")  // false
new String("123") == "123"              // false
new String("123").intern() == "123"     // true

This means that interned objects can be compared via ReferenceEquals (C#) / == (Java).

When the method is calledIntern()/intern(), the runtime library scans the pool of interned objects in search of a given or equal one. If such an object is found, it is returned, if not, this object is interned and returned.


What can I use this for? For example, you can reduce the memory consumption of a program if it uses a large number of lines, including many duplicates. For example, you have a huge XML file consisting of almost identical records. Or a huge program text in some programming language. Then, in some cases, you can reduce memory consumption by interning strings: for example, all instances of while will be the same object.

Attention! By itself, a string read from a file is not interned, even if it is equal to some interned string.

Note, however, that once interned, a string cannot be "de-interned", and it will take up memory programs even when you no longer need them. Therefore, keep in mind that interning strings can also have a negative effect on the memory consumption of the program!

Therefore, if you decide to use interning in your program, be sure to calculate the memory consumption and make sure that your optimization really improves the situation! (However, this applies to almost all optimizations.)

Next, interning the string does a search in global structures, and so it will probably require a global lock. Therefore, several threads that actively use interning will "fight" for a common resource.

Another advantage of interned strings is that they can be compared faster. For example, if you are parsing a program text, and all the keywords are interned, you can compare them as objects (which, of course, is much more likely).


In .NET you can control whether automatic application will be applied. interning string constants at the assembly level. By default, string constants, as mentioned above, are interned, but you can prevent this by specifying the attribute CompilationRelaxations.NoStringInterning.


In Java, in addition to strings, packed numbers are also interned. For example, packed constants of types Integer and Long ranging from -128 to 127, Boolean and Byte are stored in the interned object pool. Example:

Integer x = 1;
Integer y = 1;
Integer z = new Integer(1);

x == y    // true
y == z    // false
 56
Author: VladD, 2016-01-29 14:28:18