- Value and reference data types.
- Identity and copy operations.
- Use case: How comparing things is faster and simpler with immutability.
- Passing arguments to functions (to be published soon).
Value & reference data types
Let’s say we are using a hypothetical language and we define two variables, x
and y
, whose value is the same: 42. x
is a value data type and y
is a reference data type. How would they be different?
The difference relies on how these types behave when it comes to operating with them: checking for identity, copying, passing them to functions, etc.:
- value data types operate with the value of the variable
- reference data types operate with a reference to the actual value
The way I picture this is something like:
As a programmer, having this separation seems inconvenient at first sight: our code is supposed to work with the values of the variables, not an internal reference to them. So why is it that most programming languages have value and reference data types? The reason is memory.
Think about what happens every time that our code makes a copy of a variable: it needs more memory. For structures that are created to hold big data, if the program copies its reference instead of its actual value, it’s more memory-efficient. Programming is a space-time bound activity: we want to operate quickly with potentially big data structures without running out of memory. Achieving that goal requires trade-offs, and one that most languages do is having value and reference data types.
Now, let’s talk JavaScript.
The language defines the following data types: undefined,
null
, number
, boolean
, string
, symbol
and object
— note that array
is a particular kind of object. It also considers every data type but object
a primitive. Unlike objects, primitives in JavaScript are immutable and don’t have properties or methods. Although the JavaScript standard doesn’t explicitly mention anything about value data types VS reference data types, it implicitly does it by the way it defines the operations of the language. It is safe to say that:
- primitive types behave like value data types
object
type behaves like a reference data type — and so any sub-types such asarray
Boxing and unboxing
Languages with both value and reference data types, tend to provide ways to convert values into references, and vice-versa. This is called boxing and unboxing. It is common that each value has a reference counterpart, and languages tend to provide automatic boxing and unboxing in some situations.
JavaScript has reference data types for the corresponding value data types:
Type | Primitive (value) | Object (reference) |
---|---|---|
string: primitive / object | var str = 'The meaning of life.'; | var str = new String( 'The meaning of life.' ); |
number: primitive / object | var number = 42; | var number = new Number( 42 ); |
boolean: primitive / object | var bool = true; | var bool = new Boolean( true ); |
JavaScript primitives don’t have methods or extra properties like the reference objects have. Yet, they’ll be automatically boxed to the equivalent reference object when you’re trying to use one of its methods or properties. This is the reason why:
// str is a primitive string.
// Uppercasing yields 'THE MEANING OF LIFE.'.
var str = 'The meaning of life';
str.toUpperCase();
Although str
is a primitive, we can use the object methods, such as toUpperCase
, thanks to the auto-boxing. We can think of it as a type conversion in other languages: ((String) str).toUpperCase()
. Auto-boxing is also at the root of some confusing behavior in JavaScript:
// str is a primitive string.
var str = 'The meaning of life.';
// str is automatically boxed to the object
// when a property or method belonging
// the String object is used.
//
// The following can be think of as
// doing an explicit casting:
// ((String) str).constructor === String
.
//
str.constructor === String; // True.
// str is in its natural state (unboxed),
// so this compares the primitive
// to the reference.
str instanceof String; // False.
// str is in its natural state (unboxed),
// so this asks the system
// what kind of variable it is.
typeof str; // Yields 'string'.
Identity and copy operations
Let’s review how identity and copy operations behave with value and reference data types. Remember that they’ll use the value for value data types, and the reference for reference data types.
Value data types
Let’s say we have the following value variables:
var x = 42;
var y = 42;
x === y; // True.
If we copied them it’ll yield the same results:
var x = 42;
var y = x;
x === y; // True.
y = 23;
x === y; // False.
Reference data types
Let’s say now that we define the following reference variables:
// Creates a new reference @x1.
var x = {
'title': 'The dispossessed',
'genre': 'Science Fiction'
};
// Creates a new reference @y1.
var y = {
'title': 'The dispossessed',
'genre': 'Science Fiction'
};
x === y; // False: @x1
not equals @y1
.
When we create reference data type variables, they are assigned a new reference, no matter whether their value is actually the same as other existing variables. The reason the equality operation yields false in this case is because it is comparing different references (@x1
is not @y1
) not the actual values.
What if we copied them:
// Creates new reference @x1.
var x = {
'title': 'The dispossessed',
'genre': 'Science Fiction'
};
var y = x; // Copies @x1 reference.
x === y; // True: @x1
equals @x1
.
So far, so good. What would happen when the value is changed?
// Change some content.
x['title'] = 'Bellwether';
// Still true.
// The references haven't changed.
x === y;
// Yields 'Bellwether'.
console.log( y['title'] );
But if we do:
// Creates new reference @x1.
var x = {
'title': 'The dispossessed',
'genre': 'Science Fiction'
};
var y = x; // Copies @x1
reference.
x === y; // True.
// Creates new reference @x2
// and stores it in the x variable.
x = {
'title': 'Bellwether',
'genre': 'Science Fiction'
};
x === y; // False: @x2
not equals @x1
.
// Yields 'Bellwether'.
console.log( x['title'] );
// Yields 'The dispossessed'.
console.log( y['title'] );
For reference data types, identity and equality operations work with the reference of the variable and don’t go the extra step to find and work with the actual value. These are called shallow operations. On the other hand, deep operations do the extra lookup and work with the actual value. Languages usually have shallow/deep equality checks and shallow/deep copy operations. JavaScript, in particular, doesn’t provide built-in mechanisms for deep equality checks or deep copy operations — these are things left for developers to implement.
Nested reference data types
There’s a JavaScript idiom to create new objects by reusing parts of existing ones: Object.assign( target, …sources ). It creates a shallow copy of every own property in the source objects into the target object. If the target has the same property, it’ll be overwritten. In the example below, we’re assigning a new reference to the variable y
, whose own properties will be the ones present in the object x
.
// Creates reference @x1
// and stores it in variable x.
var x = {
'title': 'The dispossessed',
'genre': 'Science Fiction'
};
// Creates new reference @y1
,
// and stores it in variable y.
// Note it has the same properties
// as the x object.
var y = Object.assign( {}, x );
// False: reference @x1 not equals to @y1.
x === y;
// True: values are equal.
x[title] === y[title];
This works fine for objects whose own properties are value data structures, such as string or number, but it gets muddy if any property is a reference.
For example:
var x = {
'title': 'The dispossesed',
'genre': 'Science fiction',
'author': {
'name': 'Ursula K. Le Guin',
'born': '1929-10-29'
}
};
// New object:
//
// * the reference will be new
// * the value would be created
// by shallow copying x
's own properties
//
var y = Object.assign( {}, x );
x === y; // False.
// Compare value properties:
x['title'] === y['title']; // True.
x['genre'] === y['genre']; // True.
// Compare reference properties:
x['author'] === y['author']; // True.
Note how both x
and y
objects have the same reference for the property author
which references the same value. We have two different objects with some shared parts:
If we change some properties but not the author reference, both x
and y
will still share the same author value:
y['title'] = 'Bellwether';
y['author']['name'] = 'Connie Willis';
y['author']['born'] = '1945-12-31';
// Value variables have diverged.
x['title'] === y['title']; // False.
// The author reference hasn't changed,
// so both objects point to same values.
// These all are true.
x['author'] === y['author'];
x['author']['name'] === y['author']['name'];
x['author']['born'] === y['author']['born'];
For both objects to be completely separate entities, we need to dereference the author in any of them. For example:
// Creates a new reference @author2.
x['author'] = {
'name': 'Ursula K. Le Guin',
'born': '1929-10-29'
};
// Author reference has changed,
// so @author2
is not equals to @author1
.
x['author'] === y['author'];
In working with reference data types, the copy and equality operations may be misleading. In the next section, we shall talk about one of the tricks that we have to deal with this issue: to make reference data types immutable.
How comparing things is faster and simpler with immutability
In the previous sections, I wrote about the nature of value and reference data types, and the differences between shallow and deep operations. In particular, the fact that we need to rely on deep operations to compare things is a major source of complexity in our code. But we can do better.
Comparing mutable structures
When working with mutable data structures, structures that can be modified, determining whether something has actually been changed or not is not so straightforward:
var film = {
'title': 'Piratees of the Caribean',
'released': 2003
};
// At some point, we receive an object
// and one of its properties
// might have changed. How do we know?
newFilm = doSomething( film );
// What does a shallow equality yield?
film === newFilm;
If we mutate objects, a shallow equality check doesn’t suffice to tell apart objects without knowing the internals of the doSomething
function:
film
andnewFilm
references may be equal but the values might have been updated.film
andnewFilm
references may be different but their values might be equal.
Comparing immutable structures
In JavaScript, primitives (numbers, strings, …) are immutable, and reference data types (object, arrays, …) are not. Mutable structures are the reason why comparing things is difficult, so what if we worked with reference data types as if they were immutable?
Let’s see how this would work:
- If something changes, do not mutate the given object, but create a new one with the adequate properties. As the new and the old object will have different references, a shallow equality check will set them apart.
var film = {
'title': 'Piratees of the Caribbean',
'released': 2003
};
var doSomeThing = function( film ) {
// ...
// Something has changed,
// so return new reference.
return Object.assign(
{},
film,
{'title': 'Piratees of the Caribbean: the curse of the Black Pearl'}
);
}
var newFilm = doSomething( film );
film === newFilm; // False.
- If nothing changes, return the same object. Because the reference is the same, the shallow equality check will yield true.
var film = {
'title': 'Piratees of the Caribean',
'released': 2003
};
var doSomeThing = function( film ) {
// ...
// Everything stays the same,
// return same reference.
return film;
}
var newFilm = doSomething( film );
film === newFilm; // True.
It is easier to tell what has changed when reference data types are immutable because we can leverage the shallow equality operations. As a side-effect, it takes less effort to build a whole lot of systems that depend on calculating differences: undo/redo operations, memoization and cache invalidation, state machines, frameworks to build interfaces with the immediate mode paradigm, etc.
Coda
One of the reasons I started this series of posts was to explain how using immutable reference data types was one of the tricks at the core of Redux and React. Their success is teaching us a valuable lesson: immutability and pure functions are the core ideas of the current cycle of building applications — being the separation between API and interface the dominant idea of the previous cycle.
I have already mentioned this some time ago, but, at the time, I wasn’t fully aware of how quickly these ideas will spread to other areas of the industry or how that will force us to gain a deeper understanding of language fundamentals. I’m glad they did because I believe that investing in core concepts is what really matters to stay relevant and make smart decisions in the long term.