Reverse-engineering MAT-files
Part 3: Classes with attributes
In part 2 of this series we discovered how simple classes are stored in MAT v7.3. We did so by looking at a few very simple classes and examining how our MAT-file changes when we, say, alter the class structure a bit, or when we add more objects of the same class.
The simple examples we considered merely scratch the surface of what is possible in MATLAB’s class system. MATLAB supports many features typical of an object-oriented programming language, such as abstract classes, inheritance, private and protected attributes, operator overloading, and so on and so forth.
Luckily for us, most of these features do not seem to influence the way that MATLAB stores the object in a MAT-file. There are, however, a few exceptions to this, and the goal of this post is to highlight these exceptions.
More concretely, we will look at classes exhibiting the following features:
Preliminary: Arrays of classes
Before looking at classes, we need to talk about arrays.
Arrays of primitive objects such as doubles and chars are stored in a straightforward way. Consider the following example.
>> x = rand(1, 2, 3);
>> save('out.mat', 'x', '-v7.3');
Our MAT-file consists of a single 🔢 dataset x:
julia> read(h["x"])
1×2×3 Array{Float64, 3}:
[:, :, 1] =
0.815771 0.804843
[:, :, 2] =
0.360404 0.913814
[:, :, 3] =
0.288316 0.165002
The only exception to this logic is if our array has size zero in some dimension. In MATLAB, an ‘empty’ array of size 3 by 0 is different from an array of size 0 by 3. Yet we evidently cannot distinguish them by their content, since there is no content! Here’s what MATLAB does instead.
>> x = rand(9001, 0, 9002);
>> save('out.mat', 'x', '-v7.3');
julia> read(h["x"])
3-element Vector{UInt64}:
0x0000000000002329
0x0000000000000000
0x000000000000232a
If at least one of the sizes of an array is zero, MATLAB does a
‘fallback’ by saving the size of the array in the
form of 64-bit integers.Note 1 The
type that the elements would have had if there were elements can be
inferred from the MATLAB_class attribute. This allows us to
distinguish, for instance, a 3×0 character array from a
3×0 array of doubles.
Now what would happen if we take say a 1×2×3 array of
objects?
We define the following simple class which we’ll use throughout this post.
classdef MyClass
properties
Foo
end
methods
function obj = MyClass(x)
obj.Foo = x;
end
end
end
Now consider the following:
x = [MyClass(9001) MyClass(9002) MyClass(9003) MyClass(9004) MyClass(9005) MyClass(9006)];
x = reshape(x, 1, 2, 3);
save('out.mat', 'x', '-v7.3');
So far, our classes had always been represented by a length-six vector
of integers, of which the last two encode
object ID and
class ID. So should we now expect a
6×1×2×3 array of integers instead?
julia> read(h["x"])
12×1 Matrix{UInt32}:
0xdd000000
0x00000003
0x00000001
0x00000002
0x00000003
0x00000001
0x00000002
0x00000003
0x00000004
0x00000005
0x00000006
0x00000001
This is altogether different. For convenience, I’ve indicated in
mint a new component which we
haven’t elaborated on before. This component indicates the
size of the array of objects. Specifically, it tells us that
we’re dealing with a
3-dimensional array of size
1×2×3. After that, the object ID of each
object in the array is listed in column-major order, and finally, we
specify the class ID of the objects.
Until this point, objects have always been represented with six
integers, of which the first four seemed fixed. In hindsight,
you’ll be able to recognise that the second, third and fourth
integer simply told us that the object was a
2-dimensional array of size
1×1.
Before moving on to the next section, I leave you with one example. Can you guess how MATLAB encodes the following? For a hint, see Note 2.
>> x = MyClass.empty; % This returns a 0×0 array of objects of type MyClass
>> save('out.mat', 'x', '-v7.3');
Default values
Consider the following class.
classdef MyClassWithDefault
properties
Bar = MyClass(9001)
end
end
As you’d probably expect, if I create a variable
x = MyClassWithDefault(), the field Bar will
obtain the default value MyClass(9001).
Let’s take a look at the MAT-file that we get upon saving
x.
🗂️ HDF5.File: (read-only) out.mat ├─ 📂 #refs# │ ├─ 🔢 a │ │ ├─ 🏷️ MATLAB_class │ │ └─ 🏷️ MATLAB_empty │ ├─ 🔢 b │ │ ├─ 🏷️ H5PATH │ │ └─ 🏷️ MATLAB_class │ ├─ 🔢 c │ │ ├─ 🏷️ H5PATH │ │ └─ 🏷️ MATLAB_class │ ├─ 🔢 d │ │ ├─ 🏷️ H5PATH │ │ └─ 🏷️ MATLAB_class │ ├─ 🔢 e │ │ ├─ 🏷️ H5PATH │ │ └─ 🏷️ MATLAB_class │ ├─ 🔢 f │ │ ├─ 🏷️ H5PATH │ │ ├─ 🏷️ MATLAB_class │ │ └─ 🏷️ MATLAB_empty │ ├─ 📂 g │ │ ├─ 🏷️ H5PATH │ │ ├─ 🏷️ MATLAB_class │ │ └─ 🔢 Bar │ │ ├─ 🏷️ H5PATH │ │ └─ 🏷️ MATLAB_class │ └─ 🔢 h │ ├─ 🏷️ H5PATH │ ├─ 🏷️ MATLAB_class │ └─ 🏷️ MATLAB_empty ├─ 📂 #subsystem# │ └─ 🔢 MCOS │ ├─ 🏷️ MATLAB_class │ └─ 🏷️ MATLAB_object_decode └─ 🔢 x ├─ 🏷️ MATLAB_class └─ 🏷️ MATLAB_object_decode
Interestingly, /#refs#/g is now a 📂 group rather than
a 🔢 dataset. Here’s what’s inside
/#refs#:
| Dataset | MATLAB_class |
Value |
|---|---|---|
/#refs#/a |
"canonical empty" |
0x0000000000000000 0x0000000000000000 |
/#refs#/b |
"uint8" |
256×1 Matrix{UInt8}(Content omitted) |
/#refs#/c |
"double" |
9001.0 |
/#refs#/d |
"int32" |
0 0 0 |
/#refs#/e |
"cell" |
Ref to /#refs#/fRef to /#refs#/gRef to /#refs#/h
|
/#refs#/f |
"struct" |
0x0000000000000001 0x0000000000000000 |
/#refs#/g |
"struct" |
"Bar" => [ 0xdd000000 0x00000002 0x00000001 0x00000001 0x00000002 0x00000002 ] |
/#refs#/h |
"struct" |
0x0000000000000001 0x0000000000000000 |
/#refs#/g/Bar encodes the default value of the field
Bar of the first class. As we learned in
part 2, the six UInt32’s
are MATLAB’s way of telling you that this default value is itself
an object, of object ID
2 and
class ID
2. In turn, the value of the field
Foo of this object is stored in /#refs#/c.
As you may expect, /#refs#/b is what ties things together.
00000000: 0300 0000 0400 0000 5000 0000 8000 0000 ........P....... 00000010: 8000 0000 c800 0000 e800 0000 0001 0000 ................ 00000020: 0000 0000 0000 0000 4d79 436c 6173 7357 ........MyClassW 00000030: 6974 6844 6566 6175 6c74 0042 6172 0046 ithDefault.Bar.F 00000040: 6f6f 004d 7943 6c61 7373 0000 0000 0000 oo.MyClass...... 00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000060: 0000 0000 0100 0000 0000 0000 0000 0000 ................ 00000070: 0000 0000 0400 0000 0000 0000 0000 0000 ................ 00000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000090: 0000 0000 0000 0000 0100 0000 0000 0000 ................ 000000a0: 0000 0000 0000 0000 0100 0000 0200 0000 ................ 000000b0: 0200 0000 0000 0000 0000 0000 0000 0000 ................ 000000c0: 0200 0000 0100 0000 0000 0000 0000 0000 ................ 000000d0: 0000 0000 0000 0000 0100 0000 0300 0000 ................ 000000e0: 0100 0000 0000 0000 0000 0000 0000 0000 ................ 000000f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
The red component still lists all the sizes that we identified in part 2, the green component lists of the names of the classes and fields, and the blue component lists which of these names are classes. But something interesting happens in the cyan and yellow components. Using the shorthand that we introduced here, it’s tempting to represent the components as follows:
100011 200021 0000 1-310
This is not correct. The cyan and yellow components are separated by two rather than four zeros, so the third and fourth zero aren’t separators. Instead, they are ‘empty’ triplets which specifically serve to indicate that the value at this object is default. So our shorthand representation should be
100011 200021 00 0-0 1-310
Now let’s turn to the next MAT-file.
>> x = MyClassWithDefault();
>> x.Bar.Foo = 9002;
>> save('out.mat', 'x', '-v7.3');
We again start with an object x = MyClassWithDefault(), but
this time, we change the value of x.Bar.Foo from the
default 9001 to 9002. Then you may expect to
get the same MAT-file, but with a 9002 in
/#refs#/c.
If only things were that easy.
🗂️ HDF5.File: (read-only) out.mat ├─ 📂 #refs# │ ├─ 🔢 a │ │ ├─ 🏷️ MATLAB_class │ │ └─ 🏷️ MATLAB_empty │ ├─ 🔢 b │ │ ├─ 🏷️ H5PATH │ │ └─ 🏷️ MATLAB_class │ ├─ 🔢 c │ │ ├─ 🏷️ H5PATH │ │ └─ 🏷️ MATLAB_class │ ├─ 🔢 d │ │ ├─ 🏷️ H5PATH │ │ └─ 🏷️ MATLAB_class │ ├─ 🔢 e │ │ ├─ 🏷️ H5PATH │ │ └─ 🏷️ MATLAB_class │ ├─ 🔢 f │ │ ├─ 🏷️ H5PATH │ │ └─ 🏷️ MATLAB_class │ ├─ 🔢 g │ │ ├─ 🏷️ H5PATH │ │ └─ 🏷️ MATLAB_class │ ├─ 🔢 h │ │ ├─ 🏷️ H5PATH │ │ ├─ 🏷️ MATLAB_class │ │ └─ 🏷️ MATLAB_empty │ ├─ 🔢 i │ │ ├─ 🏷️ H5PATH │ │ ├─ 🏷️ MATLAB_class │ │ └─ 🏷️ MATLAB_empty │ └─ 📂 j │ ├─ 🏷️ H5PATH │ ├─ 🏷️ MATLAB_class │ └─ 🔢 Bar │ ├─ 🏷️ H5PATH │ └─ 🏷️ MATLAB_class ├─ 📂 #subsystem# │ └─ 🔢 MCOS │ ├─ 🏷️ MATLAB_class │ └─ 🏷️ MATLAB_object_decode └─ 🔢 x ├─ 🏷️ MATLAB_class └─ 🏷️ MATLAB_object_decode
There are now ten members of /#refs# rather than
eight. Inside, we find the following:
| Dataset | MATLAB_class |
Value |
|---|---|---|
/#refs#/a |
"canonical empty" |
0x0000000000000000 0x0000000000000000 |
/#refs#/b |
"uint8" |
312×1 Matrix{UInt8}(Content omitted) |
/#refs#/c |
"double" |
9002.0 |
/#refs#/d |
"uint32" |
0xdd000000 0x00000002 0x00000001 0x00000001 0x00000002 0x00000001 |
/#refs#/e |
"double" |
9001.0 |
/#refs#/f |
"int32" |
0 0 0 |
/#refs#/g |
"cell" |
Ref to /#refs#/hRef to /#refs#/iRef to /#refs#/j
|
/#refs#/h |
"struct" |
0x0000000000000001 0x0000000000000000 |
/#refs#/i |
"struct" |
0x0000000000000001 0x0000000000000000 |
/#refs#/j |
"struct" |
"Bar" => [ 0xdd000000 0x00000002 0x00000001 0x00000001 0x00000003 0x00000001 ] |
Take a good look at the values and compare them to the previous table.
-
For whatever reason, MATLAB has now reversed the ordering of the
classes.
MyClasscomes first, and then comesMyClassWithDefault. We will also see this in the green component of/#refs#/bin a moment. Luckily for us, we don’t need to know why MATLAB orders classes in a certain way, as it does not impact our ability to parse the file. -
The default value of
Barhasn’t been replaced; rather, the replaced value has simply been added to a new dataset in/#refs#. -
In fact, our MAT-file now contains three objects rather than
two:
-
The object
x, with object ID 1 and class ID 2, sits in/x(not shown above). -
The object
x.Bar, with object ID 2 and class ID 1, sits in/#refs#/d. -
The default value of
x.Bar, with object ID 3 and class ID 1, sits in/#refs#/junder the key"Bar". This object never actually gets loaded; it’s just hanging around in our file for no good reason.
-
The object
Here’s /#refs#/b:
00000000: 0300 0000 0400 0000 5000 0000 8000 0000 ........P....... 00000010: 8000 0000 e000 0000 1801 0000 3801 0000 ............8... 00000020: 0000 0000 0000 0000 4261 7200 466f 6f00 ........Bar.Foo. 00000030: 4d79 436c 6173 7300 4d79 436c 6173 7357 MyClass.MyClassW 00000040: 6974 6844 6566 6175 6c74 0000 0000 0000 ithDefault...... 00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000060: 0000 0000 0300 0000 0000 0000 0000 0000 ................ 00000070: 0000 0000 0400 0000 0000 0000 0000 0000 ................ 00000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000090: 0000 0000 0000 0000 0200 0000 0000 0000 ................ 000000a0: 0000 0000 0000 0000 0100 0000 0300 0000 ................ 000000b0: 0100 0000 0000 0000 0000 0000 0000 0000 ................ 000000c0: 0200 0000 0100 0000 0100 0000 0000 0000 ................ 000000d0: 0000 0000 0000 0000 0300 0000 0200 0000 ................ 000000e0: 0000 0000 0000 0000 0100 0000 0100 0000 ................ 000000f0: 0100 0000 0100 0000 0100 0000 0200 0000 ................ 00000100: 0100 0000 0000 0000 0100 0000 0200 0000 ................ 00000110: 0100 0000 0200 0000 0000 0000 0000 0000 ................ 00000120: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000130: 0000 0000 0000 0000 ........
Next, I’ll show you the cyan and yellow components in compact form juxtaposed against the previous MAT-file.
Previous: 100011 200021 00 0-0 1-310 New: 200013 100021 100032 00 1-111 1-210 1-212
There are now three sextets and triplets. From left to right, the three sextets correspond to:
- The object
x. - The object
x.Bar. -
The default value of
x.Bar, which happens to be an object.
The corresponding three triplet correspond to:
- The value of
x.Bar. - The value of
x.Bar.Foo. -
The value of the
Foofield of the default value ofx.Bar.
For the sake of clarity we briefly look at one more example. Consider the following scenario.
classdef MyClassWithManyDefaults
properties
Bar1 = 9001
Bar2 = 9002
Bar3 = 9003
end
end
>> x = MyClassWithManyDefaults();
>> x.Bar3 = 9004;
>> save('out.mat', 'x', '-v7.3');
When we do this, we find all the default values of
MyClassWithManyDefaults in the 📂 group
/#refs#/g, and we additionally find the value
9004 stored in /#refs#/c. Perhaps
surprisingly, the cyan and
yellow components are as follows.
100011 00 1-110
Since none of our objects have a default value, there’s no
0-0 — yet MATLAB
leaves implicit the fact that x has three fields rather
than one; instead, this has to be inferred from the default values as
stored in /#refs#/g.
I’ll leave you with one more example to think about for yourself. The following example is perfectly legal in MATLAB.
classdef MyClassWithRandomDefault
properties
Bar = rand(3, 3)
end
end
This class has a default value for its field Bar, but the
default value isn’t really fixed. So suppose I initialise an
object of type MyClassWithRandomDefault, and I replace its
field Bar with a freshly generated 3-by-3 matrix of random
numbers. Will MATLAB consider it the default value or will it store both
the initial value and the replaced one?
Constant and dependent values
We can be rather brief here. Constant and dependent values don’t get stored.
To illustrate, consider the following example.
classdef MyClassWithConstant
properties
Foo1
end
properties (Constant)
Foo2 = MyClass(9002)
end
end
>> x = MyCLassWithConstant();
>> x.Foo1 = 9001;
>> save('out.mat', 'x', '-v7.3');
Looking through /#refs#, we see no reference to the value
9002. Moreover, when we look at /#refs#/b, we
find the following.
00000000: 0300 0000 0200 0000 4800 0000 6800 0000 ........H...h... 00000010: 6800 0000 9800 0000 b000 0000 c000 0000 h............... 00000020: 0000 0000 0000 0000 466f 6f31 004d 7943 ........Foo1.MyC 00000030: 6c61 7373 5769 7468 436f 6e73 7461 6e74 lassWithConstant 00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000050: 0000 0000 0000 0000 0000 0000 0200 0000 ................ 00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000080: 0100 0000 0000 0000 0000 0000 0000 0000 ................ 00000090: 0100 0000 0100 0000 0000 0000 0000 0000 ................ 000000a0: 0100 0000 0100 0000 0100 0000 0000 0000 ................ 000000b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
From the perspective of /#refs#/b,
MyClassWithConstant has only a single field,
Foo1; the constant field Foo2 isn’t
referenced anywhere in the file.
Constant values disappear even in more complex cases:
classdef MyClassWithConstant2
properties (Constant)
Foo = {MyClass(9001), 'Hello, world!', rand(100, 100)};
end
end
When saving an object of class MyClassWithConstant2, the
object will show up but it will appear to have no content whatsoever.
Our discussion on constant properties applies verbatim to dependent properties. They neither get referenced nor stored in the MAT-file.
Properties with specified type and/or size
When defining a class, MATLAB allows you to specify the type and size of the properties. For example:
classdef MyClassWithFixedSizeDouble
properties
Bar (1, 2, :, 4) double
end
end
The field Bar of this class must be a
4-dimensional array of chars, where the size of the array along the
first, second, and fourth dimension is restricted to 1, 2, and 4,
respectively.
x = MyClassWithFixedSizeDouble();
x.Bar = rand(1, 2, 3, 4);
save('out.mat', 'x', '-v7.3');
Brief aside: When we run
MyClassWithFixedSizeDouble() without any arguments, MATLAB
calls double() without any arguments
1×2×3×4 times, giving you a zero array in
x.Bar.
Here’s what we find under /#refs:
| Dataset | MATLAB_class |
Value |
|---|---|---|
/#refs#/a |
"canonical empty" |
0x0000000000000000 0x0000000000000000 |
/#refs#/b |
"uint8" |
192×1 Matrix{UInt8}(Content omitted) |
/#refs#/c |
"double" |
1×2×3×4 Array{Float64, 4}(Content omitted) |
/#refs#/d |
"int32" |
0 0 |
/#refs#/e |
"cell" |
Ref to /#refs#/fRef to /#refs#/g
|
/#refs#/f |
"struct" |
0x0000000000000001 0x0000000000000000 |
/#refs#/g |
"struct" |
"Bar" => [ 0x0000000000000001 0x0000000000000002 0x0000000000000000 0x0000000000000004 ] |
Our table looks very similar to the objects with default values we studied earlier. Indeed, this is not a coincidence. The general pattern is as follows.
When a field, say Bar, is constrained to be of type
D and of size n1×n2×n3×..., MATLAB will give
Bar a default value. This default value is an array of size
n1×n2×n3×... where each element is D().Note 3
If the size is unconstrained in some direction, the size of the default
value in that direction will be zero. But, as we discussed in the
preliminary, arrays that are of size zero in
some direction cannot be represented by their content because there is
no content, and so in that case, MATLAB falls back to representing the
array in terms of its size. This is why /#refs#/g/Bar has
value [1, 2, 0, 4].
So to be clear: If we had required the field Bar to have
size 1×2×3×4, /#refs#/g/Bar would contain a
4-dimensional array consisting of 24 zeros, and not
[1, 2, 3, 4].
Two exceptional cases should be considered.
-
What if we specify the type but not the size? In this case, the
default constructor will have size
0×0. - What if we specify the size but not the type? In this case, the default constructor will consist of doubles.
I should point out that there are other ways to implicitly specify the type of a property. For instance, consider the following.
classdef MyClassWithSmallNumber
properties
Bar {mustBeLessThanOrEqual(Bar, 2)}
end
end
It is implied by {mustBeLessThanOrEqual(Bar, 2)} that
Bar must be a number. As such,
MyClassWithSmallNumber will have a 0×0 default
constructor in the same way that it would have if we had declared
Bar to be double.
Before closing off, here’s the last example for you to think about.
classdef MyClassWithClass
properties
Bar MyClass
end
methods
function obj = MyClassWithClass(x)
obj.Bar = x;
end
end
end
>> x = MyClassWithClass(MyClass(9001));
>> save('out.mat', 'x', '-v7.3');
The default value of the field Bar resides in
/#refs#/i/Bar. Based on the information in this post, can
you guess what it looks like?
In the next post, we’ll use our knowledge to analyse how some
common built-in classes are saved. We will particularly spend some
time on strings, as they will reveal new phenomena that we
haven’t yet encountered.
Actually scrap that. I never got to writing it. Sorry!