This post was part of a series that I later improved and consolidated. I recommend checking the updated version.
There are a number of ways to classify data types in computer science. Of all of them, I find that the difference between value data types and reference data types is a useful classification for the daily life of application programmers – knowing the differences results in fewer bugs, less time to understand code, and more confidence to sleep well at night.
One way to think about them is by considering what is the content of the variable for each data type:
- Value data types store their payload as the contents of the variable.
- Reference data types store an identifier as the contents of the variable, and that identifier is a reference to the actual payload in an external structure.
Let’s say the
FOO variable is a value data type and its payload is
42, while the
BAR variable is a reference data type and has
42 as payload. A visual representation of this might look like:
We usually are interested in the payload of the variable (in green), not in their metadata (in red), yet fundamental operations of the languages we use every day have a different behavior depending on whether the variable content is a value or a reference.
In terms of memory management, it is common for value data types and reference identifiers to be assigned a fixed amount of memory, and to live in a part of the memory called the stack. On the other hand, the reference payload usually doesn’t have a fixed amount of memory assigned so it can grow to any length, and tends to be stored in a different part of the memory sometimes called the heap. This is a generalization and an area that depends heavily on the language and its interpreters, but the reason this distinction exists in some manner is that we want fast and easy operations for an unlimited amount of data: operating with fixed memory variables is easier and faster, but dynamic memory allocation makes a better use of the limited space in memory – it’s a space/time tradeoff.
Boxing and unboxing
Languages with both value and reference data types, tend to provide ways to convert values into references, and vice-versa. This is called boxing and unboxing.
string primitive and the String object, the
number primitive and the Number object, the
boolean primitive and the Boolean object.
This is a source of confusion, and the reason why:
var foo = 'meaning of life'; // Defines foo as a primitive string. // To define it as the reference object String we'd do // var foo = new String('meaning of life'); foo.toUpperCase(); // This yields 'MEANING OF LIFE'. // Although foo is a primitive we can use the object methods // thanks to the autoboxing. // We could think of it as a type conversion in other languages: // ((String) foo).toUpperCase(); foo.constructor === String; // This yields true. // When we call a property or method belonging the object String, // foo will automatically boxed, so it behaves like the object. foo instanceof String; // This yields false. // In this case foo is in its natural state (unboxed), // so we are comparing the primitive to the reference. typeof foo; // This yields 'string'. // In this case, foo is in its natural state (unboxed), // so we are asking the system what kind of variable it is.
A note about references VS pointers
Some may argue that reference is how Object Oriented languages coined the old pointer data type. They are different things, though. The way I set them apart is by picturing what are the contents of the variables. References contain an identifier of the payload in an external structure; pointers index the content of another variable.
If, for example, a language would allow us to define a variable called
Z as a pointer to
X, visually it might look like this:
Although the difference between pointers and reference might be subtle, it has deep connotations when it comes to how operations work with them.
We, applications programmers, are mostly interested in the payload of the variables, but our programs consist of wrangling variables around with operations such as equality checks, copying, and passing arguments to other functions. These operations depend on the nature of the data they work with, so we are bound to deeply understand their inner workings. That will be the topic for the next post of the series.