Reverse-engineering MAT-files
Part 3: Classes with attributes

In part 2 of this series we discovered how simple classes are stored in MAT v7.3. We did so by looking at a few very simple classes and examining how our MAT-file changes when we, say, alter the class structure a bit, or when we add more objects of the same class.

The simple examples we considered merely scratch the surface of what is possible in MATLAB’s class system. MATLAB supports many features typical of an object-oriented programming language, such as abstract classes, inheritance, private and protected attributes, operator overloading, and so on and so forth.

Luckily for us, most of these features do not seem to influence the way that MATLAB stores the object in a MAT-file. There are, however, a few exceptions to this, and the goal of this post is to highlight these exceptions.

More concretely, we will look at classes exhibiting the following features:

Preliminary: Arrays of classes

Before looking at classes, we need to talk about arrays.

Arrays of primitive objects such as doubles and chars are stored in a straightforward way. Consider the following example.

>> x = rand(1, 2, 3);
>> save('out.mat', 'x', '-v7.3');

Our MAT-file consists of a single 🔢 dataset x:

julia> read(h["x"])
1×2×3 Array{Float64, 3}:
[:, :, 1] =
 0.815771  0.804843

[:, :, 2] =
 0.360404  0.913814

[:, :, 3] =
 0.288316  0.165002

The only exception to this logic is if our array has size zero in some dimension. In MATLAB, an ‘empty’ array of size 3 by 0 is different from an array of size 0 by 3. Yet we evidently cannot distinguish them by their content, since there is no content! Here’s what MATLAB does instead.

>> x = rand(9001, 0, 9002);
>> save('out.mat', 'x', '-v7.3');
julia> read(h["x"])
3-element Vector{UInt64}:
 0x0000000000002329
 0x0000000000000000
 0x000000000000232a

If at least one of the sizes of an array is zero, MATLAB does a ‘fallback’ by saving the size of the array in the form of 64-bit integers.Note 1 The type that the elements would have had if there were elements can be inferred from the MATLAB_class attribute. This allows us to distinguish, for instance, a 3×0 character array from a 3×0 array of doubles.

Now what would happen if we take say a 1×2×3 array of objects?

We define the following simple class which we’ll use throughout this post.

classdef MyClass
   properties
       Foo
   end
   methods
       function obj = MyClass(x)
           obj.Foo = x;
       end
   end
end

Now consider the following:

x = [MyClass(9001) MyClass(9002) MyClass(9003) MyClass(9004) MyClass(9005) MyClass(9006)];
x = reshape(x, 1, 2, 3);
save('out.mat', 'x', '-v7.3');

So far, our classes had always been represented by a length-six vector of integers, of which the last two encode object ID and class ID. So should we now expect a 6×1×2×3 array of integers instead?

julia> read(h["x"])
12×1 Matrix{UInt32}:
 0xdd000000
 0x00000003
 0x00000001
 0x00000002
 0x00000003
 0x00000001
 0x00000002
 0x00000003
 0x00000004
 0x00000005 
 0x00000006
 0x00000001

This is altogether different. For convenience, I’ve indicated in mint a new component which we haven’t elaborated on before. This component indicates the size of the array of objects. Specifically, it tells us that we’re dealing with a 3-dimensional array of size 1×2×3. After that, the object ID of each object in the array is listed in column-major order, and finally, we specify the class ID of the objects.

Until this point, objects have always been represented with six integers, of which the first four seemed fixed. In hindsight, you’ll be able to recognise that the second, third and fourth integer simply told us that the object was a 2-dimensional array of size 1×1.

Before moving on to the next section, I leave you with one example. Can you guess how MATLAB encodes the following? For a hint, see Note 2.

>> x = MyClass.empty; % This returns a 0×0 array of objects of type MyClass
>> save('out.mat', 'x', '-v7.3');

Default values

Consider the following class.

classdef MyClassWithDefault
   properties
       Bar = MyClass(9001)
   end
end

As you’d probably expect, if I create a variable x = MyClassWithDefault(), the field Bar will obtain the default value MyClass(9001).

Let’s take a look at the MAT-file that we get upon saving x.

🗂️ HDF5.File: (read-only) out.mat
├─ 📂 #refs#
│  ├─ 🔢 a
│  │  ├─ 🏷️ MATLAB_class
│  │  └─ 🏷️ MATLAB_empty
│  ├─ 🔢 b
│  │  ├─ 🏷️ H5PATH
│  │  └─ 🏷️ MATLAB_class
│  ├─ 🔢 c
│  │  ├─ 🏷️ H5PATH
│  │  └─ 🏷️ MATLAB_class
│  ├─ 🔢 d
│  │  ├─ 🏷️ H5PATH
│  │  └─ 🏷️ MATLAB_class
│  ├─ 🔢 e
│  │  ├─ 🏷️ H5PATH
│  │  └─ 🏷️ MATLAB_class
│  ├─ 🔢 f
│  │  ├─ 🏷️ H5PATH
│  │  ├─ 🏷️ MATLAB_class
│  │  └─ 🏷️ MATLAB_empty
│  ├─ 📂 g
│  │  ├─ 🏷️ H5PATH
│  │  ├─ 🏷️ MATLAB_class
│  │  └─ 🔢 Bar
│  │     ├─ 🏷️ H5PATH
│  │     └─ 🏷️ MATLAB_class
│  └─ 🔢 h
│     ├─ 🏷️ H5PATH
│     ├─ 🏷️ MATLAB_class
│     └─ 🏷️ MATLAB_empty
├─ 📂 #subsystem#
│  └─ 🔢 MCOS
│     ├─ 🏷️ MATLAB_class
│     └─ 🏷️ MATLAB_object_decode
└─ 🔢 x
   ├─ 🏷️ MATLAB_class
   └─ 🏷️ MATLAB_object_decode

Interestingly, /#refs#/g is now a 📂 group rather than a 🔢 dataset. Here’s what’s inside /#refs#:

Dataset MATLAB_class Value
/#refs#/a "canonical empty"
0x0000000000000000
0x0000000000000000
/#refs#/b "uint8" 256×1 Matrix{UInt8}
(Content omitted)
/#refs#/c "double"
9001.0
/#refs#/d "int32"
0
0
0
/#refs#/e "cell" Ref to /#refs#/f
Ref to /#refs#/g
Ref to /#refs#/h
/#refs#/f "struct"
0x0000000000000001
0x0000000000000000
/#refs#/g "struct"
"Bar" => [
 0xdd000000
 0x00000002
 0x00000001
 0x00000001
 0x00000002
 0x00000002
]
/#refs#/h "struct"
0x0000000000000001
0x0000000000000000

/#refs#/g/Bar encodes the default value of the field Bar of the first class. As we learned in part 2, the six UInt32’s are MATLAB’s way of telling you that this default value is itself an object, of object ID 2 and class ID 2. In turn, the value of the field Foo of this object is stored in /#refs#/c.

As you may expect, /#refs#/b is what ties things together.

00000000: 0300 0000 0400 0000 5000 0000 8000 0000  ........P.......
00000010: 8000 0000 c800 0000 e800 0000 0001 0000  ................
00000020: 0000 0000 0000 0000 4d79 436c 6173 7357  ........MyClassW
00000030: 6974 6844 6566 6175 6c74 0042 6172 0046  ithDefault.Bar.F
00000040: 6f6f 004d 7943 6c61 7373 0000 0000 0000  oo.MyClass......
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 0000 0000 0100 0000 0000 0000 0000 0000  ................
00000070: 0000 0000 0400 0000 0000 0000 0000 0000  ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000090: 0000 0000 0000 0000 0100 0000 0000 0000  ................
000000a0: 0000 0000 0000 0000 0100 0000 0200 0000  ................
000000b0: 0200 0000 0000 0000 0000 0000 0000 0000  ................
000000c0: 0200 0000 0100 0000 0000 0000 0000 0000  ................
000000d0: 0000 0000 0000 0000 0100 0000 0300 0000  ................
000000e0: 0100 0000 0000 0000 0000 0000 0000 0000  ................
000000f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

The red component still lists all the sizes that we identified in part 2, the green component lists of the names of the classes and fields, and the blue component lists which of these names are classes. But something interesting happens in the cyan and yellow components. Using the shorthand that we introduced here, it’s tempting to represent the components as follows:

100011 200021 0000 1-310

This is not correct. The cyan and yellow components are separated by two rather than four zeros, so the third and fourth zero aren’t separators. Instead, they are ‘empty’ triplets which specifically serve to indicate that the value at this object is default. So our shorthand representation should be

100011 200021 00 0-0 1-310

Now let’s turn to the next MAT-file.

>> x = MyClassWithDefault();
>> x.Bar.Foo = 9002;
>> save('out.mat', 'x', '-v7.3');

We again start with an object x = MyClassWithDefault(), but this time, we change the value of x.Bar.Foo from the default 9001 to 9002. Then you may expect to get the same MAT-file, but with a 9002 in /#refs#/c.

If only things were that easy.

🗂️ HDF5.File: (read-only) out.mat
├─ 📂 #refs#
│  ├─ 🔢 a
│  │  ├─ 🏷️ MATLAB_class
│  │  └─ 🏷️ MATLAB_empty
│  ├─ 🔢 b
│  │  ├─ 🏷️ H5PATH
│  │  └─ 🏷️ MATLAB_class
│  ├─ 🔢 c
│  │  ├─ 🏷️ H5PATH
│  │  └─ 🏷️ MATLAB_class
│  ├─ 🔢 d
│  │  ├─ 🏷️ H5PATH
│  │  └─ 🏷️ MATLAB_class
│  ├─ 🔢 e
│  │  ├─ 🏷️ H5PATH
│  │  └─ 🏷️ MATLAB_class
│  ├─ 🔢 f
│  │  ├─ 🏷️ H5PATH
│  │  └─ 🏷️ MATLAB_class
│  ├─ 🔢 g
│  │  ├─ 🏷️ H5PATH
│  │  └─ 🏷️ MATLAB_class
│  ├─ 🔢 h
│  │  ├─ 🏷️ H5PATH
│  │  ├─ 🏷️ MATLAB_class
│  │  └─ 🏷️ MATLAB_empty
│  ├─ 🔢 i
│  │  ├─ 🏷️ H5PATH
│  │  ├─ 🏷️ MATLAB_class
│  │  └─ 🏷️ MATLAB_empty
│  └─ 📂 j
│     ├─ 🏷️ H5PATH
│     ├─ 🏷️ MATLAB_class
│     └─ 🔢 Bar
│        ├─ 🏷️ H5PATH
│        └─ 🏷️ MATLAB_class
├─ 📂 #subsystem#
│  └─ 🔢 MCOS
│     ├─ 🏷️ MATLAB_class
│     └─ 🏷️ MATLAB_object_decode
└─ 🔢 x
   ├─ 🏷️ MATLAB_class
   └─ 🏷️ MATLAB_object_decode

There are now ten members of /#refs# rather than eight. Inside, we find the following:

Dataset MATLAB_class Value
/#refs#/a "canonical empty"
0x0000000000000000
0x0000000000000000
/#refs#/b "uint8" 312×1 Matrix{UInt8}
(Content omitted)
/#refs#/c "double"
9002.0
/#refs#/d "uint32"
0xdd000000
0x00000002
0x00000001
0x00000001
0x00000002
0x00000001
/#refs#/e "double"
9001.0
/#refs#/f "int32"
0
0
0
/#refs#/g "cell" Ref to /#refs#/h
Ref to /#refs#/i
Ref to /#refs#/j
/#refs#/h "struct"
0x0000000000000001
0x0000000000000000
/#refs#/i "struct"
0x0000000000000001
0x0000000000000000
/#refs#/j "struct"
"Bar" => [
 0xdd000000
 0x00000002
 0x00000001
 0x00000001
 0x00000003
 0x00000001
]

Take a good look at the values and compare them to the previous table.

Here’s /#refs#/b:

00000000: 0300 0000 0400 0000 5000 0000 8000 0000  ........P.......
00000010: 8000 0000 e000 0000 1801 0000 3801 0000  ............8...
00000020: 0000 0000 0000 0000 4261 7200 466f 6f00  ........Bar.Foo.
00000030: 4d79 436c 6173 7300 4d79 436c 6173 7357  MyClass.MyClassW
00000040: 6974 6844 6566 6175 6c74 0000 0000 0000  ithDefault......
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 0000 0000 0300 0000 0000 0000 0000 0000  ................
00000070: 0000 0000 0400 0000 0000 0000 0000 0000  ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000090: 0000 0000 0000 0000 0200 0000 0000 0000  ................
000000a0: 0000 0000 0000 0000 0100 0000 0300 0000  ................
000000b0: 0100 0000 0000 0000 0000 0000 0000 0000  ................
000000c0: 0200 0000 0100 0000 0100 0000 0000 0000  ................
000000d0: 0000 0000 0000 0000 0300 0000 0200 0000  ................
000000e0: 0000 0000 0000 0000 0100 0000 0100 0000  ................
000000f0: 0100 0000 0100 0000 0100 0000 0200 0000  ................
00000100: 0100 0000 0000 0000 0100 0000 0200 0000  ................
00000110: 0100 0000 0200 0000 0000 0000 0000 0000  ................
00000120: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000130: 0000 0000 0000 0000                      ........

Next, I’ll show you the cyan and yellow components in compact form juxtaposed against the previous MAT-file.

Previous: 100011 200021        00 0-0 1-310 
New:      200013 100021 100032 00 1-111 1-210 1-212

There are now three sextets and triplets. From left to right, the three sextets correspond to:

  1. The object x.
  2. The object x.Bar.
  3. The default value of x.Bar, which happens to be an object.

The corresponding three triplet correspond to:

  1. The value of x.Bar.
  2. The value of x.Bar.Foo.
  3. The value of the Foo field of the default value of x.Bar.

For the sake of clarity we briefly look at one more example. Consider the following scenario.

classdef MyClassWithManyDefaults
   properties
       Bar1 = 9001
       Bar2 = 9002
       Bar3 = 9003
   end
end
>> x = MyClassWithManyDefaults();
>> x.Bar3 = 9004;
>> save('out.mat', 'x', '-v7.3');

When we do this, we find all the default values of MyClassWithManyDefaults in the 📂 group /#refs#/g, and we additionally find the value 9004 stored in /#refs#/c. Perhaps surprisingly, the cyan and yellow components are as follows.

100011 00 1-110

Since none of our objects have a default value, there’s no 0-0 — yet MATLAB leaves implicit the fact that x has three fields rather than one; instead, this has to be inferred from the default values as stored in /#refs#/g.

I’ll leave you with one more example to think about for yourself. The following example is perfectly legal in MATLAB.

classdef MyClassWithRandomDefault
   properties
       Bar = rand(3, 3)
   end
end

This class has a default value for its field Bar, but the default value isn’t really fixed. So suppose I initialise an object of type MyClassWithRandomDefault, and I replace its field Bar with a freshly generated 3-by-3 matrix of random numbers. Will MATLAB consider it the default value or will it store both the initial value and the replaced one?

Constant and dependent values

We can be rather brief here. Constant and dependent values don’t get stored.

To illustrate, consider the following example.

classdef MyClassWithConstant
   properties
       Foo1
   end
   properties (Constant)
       Foo2 = MyClass(9002)
   end
end
>> x = MyCLassWithConstant();
>> x.Foo1 = 9001;
>> save('out.mat', 'x', '-v7.3');

Looking through /#refs#, we see no reference to the value 9002. Moreover, when we look at /#refs#/b, we find the following.

00000000: 0300 0000 0200 0000 4800 0000 6800 0000  ........H...h...
00000010: 6800 0000 9800 0000 b000 0000 c000 0000  h...............
00000020: 0000 0000 0000 0000 466f 6f31 004d 7943  ........Foo1.MyC
00000030: 6c61 7373 5769 7468 436f 6e73 7461 6e74  lassWithConstant
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000050: 0000 0000 0000 0000 0000 0000 0200 0000  ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000080: 0100 0000 0000 0000 0000 0000 0000 0000  ................
00000090: 0100 0000 0100 0000 0000 0000 0000 0000  ................
000000a0: 0100 0000 0100 0000 0100 0000 0000 0000  ................
000000b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

From the perspective of /#refs#/b, MyClassWithConstant has only a single field, Foo1; the constant field Foo2 isn’t referenced anywhere in the file.

Constant values disappear even in more complex cases:

classdef MyClassWithConstant2
   properties (Constant)
       Foo = {MyClass(9001), 'Hello, world!', rand(100, 100)};
   end
end

When saving an object of class MyClassWithConstant2, the object will show up but it will appear to have no content whatsoever.

Our discussion on constant properties applies verbatim to dependent properties. They neither get referenced nor stored in the MAT-file.

Properties with specified type and/or size

When defining a class, MATLAB allows you to specify the type and size of the properties. For example:

classdef MyClassWithFixedSizeDouble
   properties
       Bar (1, 2, :, 4) double
   end
end

The field Bar of this class must be a 4-dimensional array of chars, where the size of the array along the first, second, and fourth dimension is restricted to 1, 2, and 4, respectively.

x = MyClassWithFixedSizeDouble();
x.Bar = rand(1, 2, 3, 4);
save('out.mat', 'x', '-v7.3');

Brief aside: When we run MyClassWithFixedSizeDouble() without any arguments, MATLAB calls double() without any arguments 1×2×3×4 times, giving you a zero array in x.Bar.

Here’s what we find under /#refs:

Dataset MATLAB_class Value
/#refs#/a "canonical empty"
0x0000000000000000
0x0000000000000000
/#refs#/b "uint8" 192×1 Matrix{UInt8}
(Content omitted)
/#refs#/c "double" 1×2×3×4 Array{Float64, 4}
(Content omitted)
/#refs#/d "int32"
0
0
/#refs#/e "cell" Ref to /#refs#/f
Ref to /#refs#/g
/#refs#/f "struct"
0x0000000000000001
0x0000000000000000
/#refs#/g "struct"
"Bar" => [
 0x0000000000000001
 0x0000000000000002
 0x0000000000000000
 0x0000000000000004
]

Our table looks very similar to the objects with default values we studied earlier. Indeed, this is not a coincidence. The general pattern is as follows.

When a field, say Bar, is constrained to be of type D and of size n1×n2×n3×..., MATLAB will give Bar a default value. This default value is an array of size n1×n2×n3×... where each element is D().Note 3 If the size is unconstrained in some direction, the size of the default value in that direction will be zero. But, as we discussed in the preliminary, arrays that are of size zero in some direction cannot be represented by their content because there is no content, and so in that case, MATLAB falls back to representing the array in terms of its size. This is why /#refs#/g/Bar has value [1, 2, 0, 4].

So to be clear: If we had required the field Bar to have size 1×2×3×4, /#refs#/g/Bar would contain a 4-dimensional array consisting of 24 zeros, and not [1, 2, 3, 4].

Two exceptional cases should be considered.

I should point out that there are other ways to implicitly specify the type of a property. For instance, consider the following.

classdef MyClassWithSmallNumber
   properties
      Bar {mustBeLessThanOrEqual(Bar, 2)}
   end
end

It is implied by {mustBeLessThanOrEqual(Bar, 2)} that Bar must be a number. As such, MyClassWithSmallNumber will have a 0×0 default constructor in the same way that it would have if we had declared Bar to be double.

Before closing off, here’s the last example for you to think about.

classdef MyClassWithClass
   properties
       Bar MyClass
   end
   methods
       function obj = MyClassWithClass(x)
           obj.Bar = x;
       end
   end
end
>> x = MyClassWithClass(MyClass(9001));
>> save('out.mat', 'x', '-v7.3');

The default value of the field Bar resides in /#refs#/i/Bar. Based on the information in this post, can you guess what it looks like?

In the next post, we’ll use our knowledge to analyse how some common built-in classes are saved. We will particularly spend some time on strings, as they will reveal new phenomena that we haven’t yet encountered.

Footnotes

  1. We’ve seen that /#refs#/a, if it exists, is of type "canonical empty" and contains the data
    0x0000000000000000
    0x0000000000000000
    We now understand that this literally how MATLAB encodes an array of size 0×0.
  2. Hint: No fallback is required for objects.
  3. D requires a zero-argument default constructor for this to work. If it doesn’t have one, and you call C(), MATLAB will give you an error about not being able to construct an object of class D.