Reverse-engineering MAT-files
Part 3: Classes with attributes
In part 2 of this series we discovered how
simple classes are stored in MAT v7.3. We did so by looking at a few
very simple classes and examining how our MAT-file changes when we, say,
alter the class structure a bit, or when we add more objects of the same
class.
The simple examples we considered merely scratch the surface of what is
possible in MATLAB’s class system. MATLAB supports many features
typical of an object-oriented programming language, such as abstract
classes, inheritance, private and protected attributes, operator
overloading, and so on and so forth.
Luckily for us, most of these features do not seem to influence
the way that MATLAB stores the object in a MAT-file. There are, however,
a few exceptions to this, and the goal of this post is to highlight
these exceptions.
More concretely, we will look at classes exhibiting the following
features:
Preliminary: Arrays of classes
Before looking at classes, we need to talk about arrays.
Arrays of primitive objects such as doubles and chars are stored in a
straightforward way. Consider the following example.
>> x = rand(1, 2, 3);
>> save('out.mat', 'x', '-v7.3');
Our MAT-file consists of a single 🔢 dataset x
:
julia> read(h["x"])
1×2×3 Array{Float64, 3}:
[:, :, 1] =
0.815771 0.804843
[:, :, 2] =
0.360404 0.913814
[:, :, 3] =
0.288316 0.165002
The only exception to this logic is if our array has size zero in some
dimension. In MATLAB, an ‘empty’ array of size 3 by 0 is
different from an array of size 0 by 3. Yet we evidently cannot
distinguish them by their content, since there is no content!
Here’s what MATLAB does instead.
>> x = rand(9001, 0, 9002);
>> save('out.mat', 'x', '-v7.3');
julia> read(h["x"])
3-element Vector{UInt64}:
0x0000000000002329
0x0000000000000000
0x000000000000232a
If at least one of the sizes of an array is zero, MATLAB does a
‘fallback’ by saving the size of the array in the
form of 64-bit integers.Note 1 The
type that the elements would have had if there were elements can be
inferred from the MATLAB_class
attribute. This allows us to
distinguish, for instance, a 3×0
character array from a
3×0
array of doubles.
Now what would happen if we take say a 1×2×3
array of
objects?
We define the following simple class which we’ll use throughout
this post.
classdef MyClass
properties
Foo
end
methods
function obj = MyClass(x)
obj.Foo = x;
end
end
end
Now consider the following:
x = [MyClass(9001) MyClass(9002) MyClass(9003) MyClass(9004) MyClass(9005) MyClass(9006)];
x = reshape(x, 1, 2, 3);
save('out.mat', 'x', '-v7.3');
So far, our classes had always been represented by a length-six vector
of integers, of which the last two encode
object ID and
class ID. So should we now expect a
6×1×2×3
array of integers instead?
julia> read(h["x"])
12×1 Matrix{UInt32}:
0xdd000000
0x00000003
0x00000001
0x00000002
0x00000003
0x00000001
0x00000002
0x00000003
0x00000004
0x00000005
0x00000006
0x00000001
This is altogether different. For convenience, I’ve indicated in
mint a new component which we
haven’t elaborated on before. This component indicates the
size of the array of objects. Specifically, it tells us that
we’re dealing with a
3
-dimensional array of size
1×2×3
. After that, the object ID of each
object in the array is listed in column-major order, and finally, we
specify the class ID of the objects.
Until this point, objects have always been represented with six
integers, of which the first four seemed fixed. In hindsight,
you’ll be able to recognise that the second, third and fourth
integer simply told us that the object was a
2
-dimensional array of size
1×1
.
Before moving on to the next section, I leave you with one example. Can
you guess how MATLAB encodes the following? For a hint, see
Note 2.
>> x = MyClass.empty; % This returns a 0×0 array of objects of type MyClass
>> save('out.mat', 'x', '-v7.3');
Default values
Consider the following class.
classdef MyClassWithDefault
properties
Bar = MyClass(9001)
end
end
As you’d probably expect, if I create a variable
x = MyClassWithDefault()
, the field Bar
will
obtain the default value MyClass(9001)
.
Let’s take a look at the MAT-file that we get upon saving
x
.
🗂️ HDF5.File: (read-only) out.mat
├─ 📂 #refs#
│ ├─ 🔢 a
│ │ ├─ 🏷️ MATLAB_class
│ │ └─ 🏷️ MATLAB_empty
│ ├─ 🔢 b
│ │ ├─ 🏷️ H5PATH
│ │ └─ 🏷️ MATLAB_class
│ ├─ 🔢 c
│ │ ├─ 🏷️ H5PATH
│ │ └─ 🏷️ MATLAB_class
│ ├─ 🔢 d
│ │ ├─ 🏷️ H5PATH
│ │ └─ 🏷️ MATLAB_class
│ ├─ 🔢 e
│ │ ├─ 🏷️ H5PATH
│ │ └─ 🏷️ MATLAB_class
│ ├─ 🔢 f
│ │ ├─ 🏷️ H5PATH
│ │ ├─ 🏷️ MATLAB_class
│ │ └─ 🏷️ MATLAB_empty
│ ├─ 📂 g
│ │ ├─ 🏷️ H5PATH
│ │ ├─ 🏷️ MATLAB_class
│ │ └─ 🔢 Bar
│ │ ├─ 🏷️ H5PATH
│ │ └─ 🏷️ MATLAB_class
│ └─ 🔢 h
│ ├─ 🏷️ H5PATH
│ ├─ 🏷️ MATLAB_class
│ └─ 🏷️ MATLAB_empty
├─ 📂 #subsystem#
│ └─ 🔢 MCOS
│ ├─ 🏷️ MATLAB_class
│ └─ 🏷️ MATLAB_object_decode
└─ 🔢 x
├─ 🏷️ MATLAB_class
└─ 🏷️ MATLAB_object_decode
Interestingly, /#refs#/g
is now a 📂 group rather than a
🔢 dataset. Here’s what’s inside /#refs#
:
Dataset |
MATLAB_class |
Value
|
/#refs#/a |
"canonical empty" |
0x0000000000000000
0x0000000000000000
|
/#refs#/b |
"uint8" |
256×1 Matrix{UInt8}
(Content omitted)
|
/#refs#/c |
"double" |
9001.0
|
/#refs#/d |
"int32" |
0
0
0
|
/#refs#/e |
"cell" |
Ref to /#refs#/f
Ref to /#refs#/g
Ref to /#refs#/h
|
/#refs#/f |
"struct" |
0x0000000000000001
0x0000000000000000
|
/#refs#/g |
"struct" |
"Bar" => [
0xdd000000
0x00000002
0x00000001
0x00000001
0x00000002
0x00000002
]
|
/#refs#/h |
"struct" |
0x0000000000000001
0x0000000000000000
|
/#refs#/g/Bar
encodes the default value of the field
Bar
of the first class. As we learned in
part 2, the six UInt32
’s are
MATLAB’s way of telling you that this default value is itself an
object, of object ID
2 and
class ID
2. In turn, the value of the field
Foo
of this object is stored in /#refs#/c
.
As you may expect, /#refs#/b
is what ties things together.
00000000: 0300 0000 0400 0000 5000 0000 8000 0000 ........P.......
00000010: 8000 0000 c800 0000 e800 0000 0001 0000 ................
00000020: 0000 0000 0000 0000 4d79 436c 6173 7357 ........MyClassW
00000030: 6974 6844 6566 6175 6c74 0042 6172 0046 ithDefault.Bar.F
00000040: 6f6f 004d 7943 6c61 7373 0000 0000 0000 oo.MyClass......
00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000060: 0000 0000 0100 0000 0000 0000 0000 0000 ................
00000070: 0000 0000 0400 0000 0000 0000 0000 0000 ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000090: 0000 0000 0000 0000 0100 0000 0000 0000 ................
000000a0: 0000 0000 0000 0000 0100 0000 0200 0000 ................
000000b0: 0200 0000 0000 0000 0000 0000 0000 0000 ................
000000c0: 0200 0000 0100 0000 0000 0000 0000 0000 ................
000000d0: 0000 0000 0000 0000 0100 0000 0300 0000 ................
000000e0: 0100 0000 0000 0000 0000 0000 0000 0000 ................
000000f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
The red component still lists all the
sizes that we identified in part 2, the
green component lists of the names of the
classes and fields, and the
blue component lists which of these names
are classes. But something interesting happens in the
cyan and
yellow components. Using the shorthand
that we introduced here,
it’s tempting to represent the components as follows:
100011 200021 0000 1-310
This is not correct. The cyan and
yellow components are separated by
two
rather than four zeros, so the third and fourth zero aren’t
separators. Instead, they are ‘empty’ triplets which
specifically serve to indicate that the value at this object is default.
So our shorthand representation should be
100011 200021 00 0-0 1-310
Now let’s turn to the next MAT-file.
>> x = MyClassWithDefault();
>> x.Bar.Foo = 9002;
>> save('out.mat', 'x', '-v7.3');
We again start with an object x = MyClassWithDefault()
, but
this time, we change the value of x.Bar.Foo
from the default
9001
to 9002
. Then you may expect to get the
same MAT-file, but with a 9002
in /#refs#/c
.
If only things were that easy.
🗂️ HDF5.File: (read-only) out.mat
├─ 📂 #refs#
│ ├─ 🔢 a
│ │ ├─ 🏷️ MATLAB_class
│ │ └─ 🏷️ MATLAB_empty
│ ├─ 🔢 b
│ │ ├─ 🏷️ H5PATH
│ │ └─ 🏷️ MATLAB_class
│ ├─ 🔢 c
│ │ ├─ 🏷️ H5PATH
│ │ └─ 🏷️ MATLAB_class
│ ├─ 🔢 d
│ │ ├─ 🏷️ H5PATH
│ │ └─ 🏷️ MATLAB_class
│ ├─ 🔢 e
│ │ ├─ 🏷️ H5PATH
│ │ └─ 🏷️ MATLAB_class
│ ├─ 🔢 f
│ │ ├─ 🏷️ H5PATH
│ │ └─ 🏷️ MATLAB_class
│ ├─ 🔢 g
│ │ ├─ 🏷️ H5PATH
│ │ └─ 🏷️ MATLAB_class
│ ├─ 🔢 h
│ │ ├─ 🏷️ H5PATH
│ │ ├─ 🏷️ MATLAB_class
│ │ └─ 🏷️ MATLAB_empty
│ ├─ 🔢 i
│ │ ├─ 🏷️ H5PATH
│ │ ├─ 🏷️ MATLAB_class
│ │ └─ 🏷️ MATLAB_empty
│ └─ 📂 j
│ ├─ 🏷️ H5PATH
│ ├─ 🏷️ MATLAB_class
│ └─ 🔢 Bar
│ ├─ 🏷️ H5PATH
│ └─ 🏷️ MATLAB_class
├─ 📂 #subsystem#
│ └─ 🔢 MCOS
│ ├─ 🏷️ MATLAB_class
│ └─ 🏷️ MATLAB_object_decode
└─ 🔢 x
├─ 🏷️ MATLAB_class
└─ 🏷️ MATLAB_object_decode
There are now ten members of /#refs#
rather than
eight. Inside, we find the following:
Dataset |
MATLAB_class |
Value
|
/#refs#/a |
"canonical empty" |
0x0000000000000000
0x0000000000000000
|
/#refs#/b |
"uint8" |
312×1 Matrix{UInt8}
(Content omitted)
|
/#refs#/c |
"double" |
9002.0
|
/#refs#/d |
"uint32" |
0xdd000000
0x00000002
0x00000001
0x00000001
0x00000002
0x00000001
|
/#refs#/e |
"double" |
9001.0 |
/#refs#/f |
"int32" |
0
0
0
|
/#refs#/g |
"cell" |
Ref to /#refs#/h
Ref to /#refs#/i
Ref to /#refs#/j
|
/#refs#/h |
"struct" |
0x0000000000000001
0x0000000000000000
|
/#refs#/i |
"struct" |
0x0000000000000001
0x0000000000000000
|
/#refs#/j |
"struct" |
"Bar" => [
0xdd000000
0x00000002
0x00000001
0x00000001
0x00000003
0x00000001
]
|
Take a good look at the values and compare them to the previous table.
-
For whatever reason, MATLAB has now reversed the ordering of the
classes.
MyClass
comes first, and then comes
MyClassWithDefault
. We will also see this in the
green component of
/#refs#/b
in a moment. Luckily for us, we don’t need
to know why MATLAB orders classes in a certain way, as it does not
impact our ability to parse the file.
-
The default value of
Bar
hasn’t been replaced;
rather, the replaced value has simply been added to a new dataset in
/#refs#
.
-
In fact, our MAT-file now contains three objects rather than
two:
-
The object
x
, with
object ID
1 and
class ID
2, sits in /x
(not
shown above).
-
The object
x.Bar
, with
object ID
2 and
class ID
1, sits in /#refs#/d
.
-
The default value of
x.Bar
, with
object ID
3 and
class ID
1, sits in
/#refs#/j
under the key "Bar"
. This object
never actually gets loaded; it’s just hanging around in our
file for no good reason.
Here’s /#refs#/b
:
00000000: 0300 0000 0400 0000 5000 0000 8000 0000 ........P.......
00000010: 8000 0000 e000 0000 1801 0000 3801 0000 ............8...
00000020: 0000 0000 0000 0000 4261 7200 466f 6f00 ........Bar.Foo.
00000030: 4d79 436c 6173 7300 4d79 436c 6173 7357 MyClass.MyClassW
00000040: 6974 6844 6566 6175 6c74 0000 0000 0000 ithDefault......
00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000060: 0000 0000 0300 0000 0000 0000 0000 0000 ................
00000070: 0000 0000 0400 0000 0000 0000 0000 0000 ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000090: 0000 0000 0000 0000 0200 0000 0000 0000 ................
000000a0: 0000 0000 0000 0000 0100 0000 0300 0000 ................
000000b0: 0100 0000 0000 0000 0000 0000 0000 0000 ................
000000c0: 0200 0000 0100 0000 0100 0000 0000 0000 ................
000000d0: 0000 0000 0000 0000 0300 0000 0200 0000 ................
000000e0: 0000 0000 0000 0000 0100 0000 0100 0000 ................
000000f0: 0100 0000 0100 0000 0100 0000 0200 0000 ................
00000100: 0100 0000 0000 0000 0100 0000 0200 0000 ................
00000110: 0100 0000 0200 0000 0000 0000 0000 0000 ................
00000120: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000130: 0000 0000 0000 0000 ........
Next, I’ll show you the cyan and
yellow components in compact form
juxtaposed against the previous MAT-file.
Previous: 100011 200021 00 0-0 1-310
New: 200013 100021 100032 00 1-111 1-210 1-212
There are now three sextets and
triplets. From left to right, the three
sextets correspond to:
- The object
x
.
- The object
x.Bar
.
-
The default value of
x.Bar
, which happens to be an object.
The corresponding three
triplet correspond to:
- The value of
x.Bar
.
- The value of
x.Bar.Foo
.
-
The value of the
Foo
field of the default value of
x.Bar
.
For the sake of clarity we briefly look at one more example. Consider the
following scenario.
classdef MyClassWithManyDefaults
properties
Bar1 = 9001
Bar2 = 9002
Bar3 = 9003
end
end
>> x = MyClassWithManyDefaults();
>> x.Bar3 = 9004;
>> save('out.mat', 'x', '-v7.3');
When we do this, we find all the default values of
MyClassWithManyDefaults
in the 📂 group
/#refs#/g
, and we additionally find the value
9004
stored in /#refs#/c
. Perhaps surprisingly,
the cyan and
yellow components are as follows.
100011 00 1-110
Since none of our objects have a default value, there’s no
0-0
— yet MATLAB
leaves implicit the fact that x
has three fields rather than
one; instead, this has to be inferred from the default values as stored in
/#refs#/g
.
I’ll leave you with one more example to think about for yourself.
The following example is perfectly legal in MATLAB.
classdef MyClassWithRandomDefault
properties
Bar = rand(3, 3)
end
end
This class has a default value for its field Bar
, but the
default value isn’t really fixed. So suppose I initialise an object
of type MyClassWithRandomDefault
, and I replace its field
Bar
with a freshly generated 3-by-3 matrix of random numbers.
Will MATLAB consider it the default value or will it store both the
initial value and the replaced one?
Constant and dependent values
We can be rather brief here. Constant and dependent values don’t get
stored.
To illustrate, consider the following example.
classdef MyClassWithConstant
properties
Foo1
end
properties (Constant)
Foo2 = MyClass(9002)
end
end
>> x = MyCLassWithConstant();
>> x.Foo1 = 9001;
>> save('out.mat', 'x', '-v7.3');
Looking through /#refs#
, we see no reference to the value
9002
. Moreover, when we look at /#refs#/b
, we
find the following.
00000000: 0300 0000 0200 0000 4800 0000 6800 0000 ........H...h...
00000010: 6800 0000 9800 0000 b000 0000 c000 0000 h...............
00000020: 0000 0000 0000 0000 466f 6f31 004d 7943 ........Foo1.MyC
00000030: 6c61 7373 5769 7468 436f 6e73 7461 6e74 lassWithConstant
00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000050: 0000 0000 0000 0000 0000 0000 0200 0000 ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000080: 0100 0000 0000 0000 0000 0000 0000 0000 ................
00000090: 0100 0000 0100 0000 0000 0000 0000 0000 ................
000000a0: 0100 0000 0100 0000 0100 0000 0000 0000 ................
000000b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
From the perspective of /#refs#/b
,
MyClassWithConstant
has only a single field,
Foo1
; the constant field Foo2
isn’t
referenced anywhere in the file.
Constant values disappear even in more complex cases:
classdef MyClassWithConstant2
properties (Constant)
Foo = {MyClass(9001), 'Hello, world!', rand(100, 100)};
end
end
When saving an object of class MyClassWithConstant2
, the
object will show up but it will appear to have no content whatsoever.
Our discussion on constant properties applies verbatim to dependent
properties. They neither get referenced nor stored in the MAT-file.
Properties with specified type and/or size
When defining a class, MATLAB allows you to specify the type and size of
the properties. For example:
classdef MyClassWithFixedSizeDouble
properties
Bar (1, 2, :, 4) double
end
end
The field Bar
of this class must be a 4-dimensional
array of chars, where the size of the array along the first, second, and
fourth dimension is restricted to 1, 2, and 4, respectively.
x = MyClassWithFixedSizeDouble();
x.Bar = rand(1, 2, 3, 4);
save('out.mat', 'x', '-v7.3');
Brief aside: When we run MyClassWithFixedSizeDouble()
without
any arguments, MATLAB calls double()
without any arguments
1×2×3×4
times, giving you a zero array in x.Bar
.
Here’s what we find under /#refs
:
Dataset |
MATLAB_class |
Value
|
/#refs#/a |
"canonical empty" |
0x0000000000000000
0x0000000000000000
|
/#refs#/b |
"uint8" |
192×1 Matrix{UInt8}
(Content omitted)
|
/#refs#/c |
"double" |
1×2×3×4 Array{Float64, 4}
(Content omitted)
|
/#refs#/d |
"int32" |
0
0
|
/#refs#/e |
"cell" |
Ref to /#refs#/f
Ref to /#refs#/g
|
/#refs#/f |
"struct" |
0x0000000000000001
0x0000000000000000
|
/#refs#/g |
"struct" |
"Bar" => [
0x0000000000000001
0x0000000000000002
0x0000000000000000
0x0000000000000004
]
|
Our table looks very similar to the objects with
default values we studied earlier. Indeed,
this is not a coincidence. The general pattern is as follows.
When a field, say Bar
, is constrained to be of type
D
and of size n1×n2×n3×...
, MATLAB will give
Bar
a default value. This default value is an array of size
n1×n2×n3×...
where each element is D()
.Note 3
If the size is unconstrained in some direction, the size of the default
value in that direction will be zero. But, as we discussed in the
preliminary, arrays that are of size zero in
some direction cannot be represented by their content because there is no
content, and so in that case, MATLAB falls back to representing the array
in terms of its size. This is why /#refs#/g/Bar
has value
[1, 2, 0, 4]
.
So to be clear: If we had required the field Bar
to have size
1×2×3×4
, /#refs#/g/Bar
would contain a
4-dimensional array consisting of 24 zeros, and not
[1, 2, 3, 4]
.
Two exceptional cases should be considered.
-
What if we specify the type but not the size? In this case, the default
constructor will have size
0×0
.
-
What if we specify the size but not the type? In this case, the default
constructor will consist of doubles.
I should point out that there are other ways to implicitly specify the
type of a property. For instance, consider the following.
classdef MyClassWithSmallNumber
properties
Bar {mustBeLessThanOrEqual(Bar, 2)}
end
end
It is implied by {mustBeLessThanOrEqual(Bar, 2)}
that
Bar
must be a number. As such,
MyClassWithSmallNumber
will have a 0×0
default
constructor in the same way that it would have if we had declared
Bar
to be double
.
Before closing off, here’s the last example for you to think about.
classdef MyClassWithClass
properties
Bar MyClass
end
methods
function obj = MyClassWithClass(x)
obj.Bar = x;
end
end
end
>> x = MyClassWithClass(MyClass(9001));
>> save('out.mat', 'x', '-v7.3');
The default value of the field Bar
resides in
/#refs#/i/Bar
. Based on the information in this post, can you
guess what it looks like?
In the next post, we’ll use our knowledge
to analyse how some common built-in classes are saved. We will
particularly spend some time on strings, as they will reveal new phenomena
that we haven’t yet encountered.
-
We’ve seen that
/#refs#/a
, if it exists, is of type
"canonical empty"
and contains the data
0x0000000000000000
0x0000000000000000
We now understand that this literally how MATLAB encodes an array of
size
0×0
.
- Hint: No fallback is required for objects.
-
D
requires a zero-argument default constructor for this to
work. If it doesn’t have one, and you call C()
,
MATLAB will give you an error about not being able to construct an
object of class D
.