MATLAB language pet peeves
MATLAB is a proprietary programming language targeted toward linear algebra and rapid development. Its design dates back from the 1980s and has evolved over the decades. But having used many other programming languages in my career, the pain points of MATLAB show up clearly.
- Weakly typed Booleans
-
The
logical
type represents a Boolean value, which is separate from thenumeric
type. However, the logical valuesfalse
andtrue
behave exactly as the numeric values0
and1
(respectively) in most computations:disp(true + true); % Prints 2 disp(false * 5); % Prints 0
(This behavior is the same in C, C++, JavaScript, Python; but disallowed in Java, C#.)
An important difference between
logical
andnumeric
types shows up when we take a vector subscript of an array/matrix: array = -2 : +2; sub0 = [true, true, true, true, true]; sub1 = [1, 1, 1, 1, 1]; disp(array(sub0)); % Equals [-2, -1, 0, 1, 2] disp(array(sub1)); % Equals [-2, -2, -2, -2, -2] sub2 = [false, true, false, true, true]; sub3 = [0, 1, 0, 1, 1]; disp(array(sub2)); % Equals [-1, 1, 2] disp(array(sub3)); % Runtime error
- Minimum 2 dimensions
-
x = 8; y = [2, 7]; z = [1, 4; 5, 0]; w = cat(3, [0 1; 2 3], [4 5; 6 7]); disp(size(x)); % Equals [1 1] disp(size(y)); % Equals [1 2] disp(size(z)); % Equals [2 2] disp(size(w)); % Equals [2 2 2]
Scalar numbers are matrices, and vectors are matrices; in other words, every numerical object has at least 2 dimensions. The extra degenerate dimensions are impure, and can lead to silent mistakes such as unintentionally extending along a dimension. Also, for many applications that only need 1-dimensional arrays, there is a needless distinction between row vectors and column vectors. Other linear algebra systems like NumPy for Python don’t suffer from this problem, and can work with 0-dimensional, 1D, 2D, 3D, etc. arrays.
- Nesting means concatenation
-
x = [1, 2]; x = [x, [3, 4]]; disp(x); % Equals [1 2 3 4] y = [5; 6]; y = [y; [7; 8]]; disp(y); % Equals [5; 6; 7; 8]
Syntactically nesting a matrix into another one yields a new matrix representing their concatenation. This can be used as an idiom for appending or prepending one or more elements to a matrix. But this is conceptually impure, and other languages like Python and JavaScript will give you a properly nested array if you tried to use this syntax.
- Matrix extension
-
temp = [5]; temp(3) = 9; disp(temp); % Prints [5 0 9]
Setting an element beyond the bounds of a vector/matrix will silently extend it. This makes it easy to mask mistakes in index calculations. JavaScript also has this same misfeature. Java and Python will crash consistently. C/C++ will exhibit undefined behavior (may work correctly, may corrupt data or crash).
- End-relative indexing
-
temp = [15, 36, 27, 98]; disp(temp(2 : end-1)); % Equals [36, 27] range = 2 : end-1; % Syntax error disp(temp(range)); % Will not work
Indexing a vector/matrix relative to the end (without using a length function) is convenient. But this use of the keyword
end
is confined to subscripts and cannot be used as a standalone value. - Irregular indexing syntax
-
temp = [1 2; 3 4]; disp(temp(:, :)); % Prints [1 2; 3 4] function result = foo result = [5 6; 7 8]; end disp(foo()(:, :)); % Error: ()-indexing must appear last in an index expression. disp((foo)(:, :)); disp((foo())(:, :)); % Error: Unbalanced or unexpected parenthesis or bracket.
The matrix indexing/slicing syntax can’t be used everywhere; sometimes you need to introduce an intermediate variable. This might be similar to the distinction between lvalues and rvalues in C and C++. This could also suggest that the MATLAB language isn’t designed properly with a recursive grammar.
- Setting function values
-
[a, b] = function(c, d) a = c; b = d; % Cannot write: return [c, d]; end
A function returns values by assigning output variables, instead of returning an expression’s value. This is reminiscent of old languages like Pascal and Visual Basic, where the value to return is assigned to the function’s name. Newer languages like C, Java, Python, Ruby, etc. do not do this.
- Multiple function syntaxes
-
A zero-argument function can be declared without parentheses:
function foo end function foo() % Equivalent end
A zero-result function can have no result array:
function foo end function [] = foo % Equivalent end
A one-result function can have no result array brackets:
function result = foo end function [result] = foo % Equivalent end
The examples above contrast with how in most languages, there is only one way to declare a function with a certain number of arguments and return values. (One exception is C++, where a zero-argument function can optionally be declared with
void
as the list of arguments.) - Conflated variables and functions
-
A zero-argument (nullary) function can be invoked without parentheses:
function result = foo() result = 6; end disp(foo); % Prints 6 disp(foo()); % Prints 6
A variable can be “invoked” with pointless parentheses:
a = 5; b = {3; 0}; c = datetime('today'); disp(a); disp(a()); % Same disp(b); disp(b()); % Same disp(c); disp(c()); % Same
Most other languages make strong distinctions between the function itself and the value returned by a function. Also, no other language I know of allows a non-function-type variable to be “called” with parentheses.
One consequence of this is that in MATLAB code,
today
andnow
are effectively magic variables whose values change even without explicitly reassigning them. Or are they actually I/O functions, who knows? - Conflated calls and indexing
-
Calling a function looks exactly the same as indexing into a vector/
matrix: function result = bar(a, b) result = a + b; end qux = [4 5; 6 7]; disp(bar(0, 1)); % Function call disp(qux(2, 3)); % Matrix subscript
Whereas most popular languages use parentheses (
()
) for function invocations and square brackets ([]
) for array subscripts. - Output-sensitive function calls
-
MATLAB functions can behave differently and return different values depending on the number of output elements that the caller wants to receive. An example is
polyfit
:xs = [3 1 4 1 5 9]; ys = [2 7 1 8 2 8]; deg = 2; [p0] = polyfit(xs, ys, deg); [p1, S1] = polyfit(xs, ys, deg); [p2, S2, mu2] = polyfit(xs, ys, deg); % p0 and p1 are the same, but different from p2
In most other languages, the way that the caller receives the result of a function call cannot affect the behavior of the function being called. One would normally expect the result
[p0]
to be a prefix of[p1, S1]
, which in turn should be a prefix of[p2, S2, mu2]
, but this is not the case in MATLAB. - Commands vs. functions
-
mkdir Hello mkdir('World');
Some features are available as commands with unquoted strings (like
Command Arg1 Arg2 ...
), whereas some features are available as functions taking proper expressions as values (likeFunction(Arg1, Arg2, ...)
). This distinction of syntax is arbitrary and jarring. - Can’t pass by reference
-
function foo(m) m(1, 1) = 5; end mat = [0 1; 2 3]; disp(mat(1, 1)); % Prints 0 foo(mat); disp(mat(1, 1)); % Still prints 0
When passing a matrix into a function, a copy is made. It’s not possible to change a value and make it visible to the caller, except by returning a new value and having the caller explicitly reassign a variable. Although the lack of pass-by-reference reduces confusion, it makes it harder to modularize algorithms that need to update data in place.
- Strings are weird
-
u = 'abc'; v = ['abc']; % Identical to u w = [v, 'def']; % Equal to 'abcdef' x = ['ghi'; 'jkl']; % OK, column of 2 strings y = ['mn'; 'opqr']; % Error: Mismatched row lengths z = {'mn'; 'opqr'}; % OK, cell array
Strings essentially behave like row vectors of characters, not like atomic values. This is why a list of strings is usually chosen to be represented as a cell array, not as a matrix.
- Cell arrays are weird
-
a = cell(2, 2); a{1, 1} = 5; disp(a{1, 1}); % 1*1 matrix, i.e. [5] disp(a(1, 1)); % 1*1 cell array, i.e. {[5]}
Cell arrays are a recursive/nestable heterogeneous data structure, unlike flat matrices of numbers. Indexing into a cell array requires some care to distinguish between getting a sub-array versus getting an element value.
- Dummy overloading arguments
-
Calculating the maximum (or minimum) of a matrix
M
along a certain dimensiond
uses the syntaxmax(M, [], d)
, with[]
being a constant dummy argument. This contrasts with other summarization functions likesum
, which uses the notationsum(M, d)
to operate on a certain dimension. The dummy argument is necessary to disambiguate from the 2-argument formmax(A, B)
, which calculates the maximum of each pair of elements between matricesA
andB
. - Semicolon suppresses printing
-
Many kinds of statements will print the value of the statement if it doesn’t end in a semicolon. For example, variable/array assignments and function calls are printed. Adding a semicolon suppresses the print, and it is rare to print values except when designing and debugging code. Although the semicolons make MATLAB code look more like C, they are a form of syntactic salt to work around the annoying default behavior of printing almost every value. A similar example can be found in Windows batch scripts, where every command is printed unless it’s prefixed with
@
. - 1-based array indexing
-
temp = [99, 47]; disp(temp(1)); % Prints 99
A few languages oriented to beginners will count arrays from 1, such as BASIC, Lua, and Mathematica. Most serious languages count from 0, including C, C++, C#, D, Java, JavaScript, Python, Ruby, assembly, and the list goes on. There are many great reasons to start indexes at 0. One of them is that if the array is represented by a pointer to its head element, then stepping forward by 0 slots will give you the head element. Another reason is that multi-dimensional indexing is like
pixels[y * width + x]
, but in 1-based indexing the code would be likepixels[(y - 1) * width + x]
. - Inclusive end index
-
temp = [31, 41, 59, 26, 53]; disp(temp(2 : 4)); % Equals [41, 59, 26]
For ranges, using inclusive end indexes looks more natural but makes length calculations harder. The number of elements in
array(start : end)
isend - start + 1
. Python takes the opposite approach where end indexes are exclusive, which makes it easy to write ranges likearray[start : start + len]
. - Optional commas between columns
-
Space-separated data is too easy to misuse. For example,
[x y]
and[x -y]
are both row vectors of length 2, but[x-y]
and[x - y]
each only have a single element. Separating data with spaces causes subtle mistakes in other languages too:// C code example const char *strings[] = { "Alpha", "Bravo" // Forgot comma "Charlie", }; // So the array is effectively // {"Alpha", "BravoCharlie"} # Python code example strings = [ "Alpha", "Bravo" # Forgot comma "Charlie", ] # So the array is effectively # ["Alpha", "BravoCharlie"]
- Standard library in global namespace
-
Although MATLAB has packages and namespaces, the enormous set of all standard library functions is dumped into the global namespace. In a sufficiently large user application, it is not hard to declare a variable whose name shadows an existing function.
Other languages like C and JavaScript suffer from a bloated global namespace. Python has a large number of functions and types defined in the global namespace, but they don’t grow over time because new work goes into modules.
- No compound assignment operators
-
x = 0; x = x + 1; x = x - 2; x = x * 5; matrixLongName = [1 2; 3 4]; fooLongName = 5; barLongName = 3; matrixLongName(fooLongName-barLongName, barLongName*14) = ... matrixLongName(fooLongName-barLongName, barLongName*14) / 6;
MATLAB doesn’t have compound assignment operators, which increases the redundancy of common code idioms. Major languages like C, C++, C#, Java, JavaScript, Python, Ruby all have operators like
+=
,*=
,&=
, etc. (Also the C family has operators like++
as a shorthand for+= 1
.) - Esoteric operators
-
Some infrequently used operations have single-character operators that are too convenient. But they hurt code readability and consume syntax elements that could be used for better things. For example,
\
meansmldivide()
, and'
meanstranspose()
. - No index-of function
-
temp = [16, 39, 25, 84, 50, 47]; i = indexof(temp, 25); % Hypothetical function disp(i); % Should print 3 j = find(temp == 25); % Actual code disp(j); % Prints 3
Finding the index of a value in an array is a basic task, yet there is no direct way to do it. Instead,
find()
with==
serves as a replacement. - Integer to datetime
-
k = 736829; dt = datetime(datestr(k)); disp(dt); % Prints 14-May-2017
Converting from an integer to a datetime object is needlessly painful. The standard library provides functions for many of the other conversions (e.g. datetime to int, datetime to string), but not a direct conversion from integer to datetime. Instead, we have to take an indirect route by turning it into a string and then parsing the string.
- Date format strings
-
The format strings for the functions
datetime()
,datenum()
, anddatestr()
are inconsistent. For example, a two-digit month is denoted as'MM'
indatetime()
but'mm'
indatestr()
anddatenum()
. - CSV file I/O madness
-
csvwrite()
is limited to writing 5 significant base-10 digits. This low precision makes the output only suitable for display, not for saving values that will be read by future computations.The functions of
dlmread()
anddlmwrite()
essentially replicate and exceed the functionality ofcsvread()
andcsvwrite()
, respectively.The 4 aforementioned functions all use 0-based indexing for row and column ranges, unlike the 1-based convention of MATLAB in general.
csvread()
anddlmread()
can only read CSV data from a file, not from a string in memory. Hence there is no convenient way to download a CSV file from the web and parse it. The workaround is either to save the CSV text to a temporary file and read it with a convenient function, or to keep the string in memory and manually parse each line usingtextscan()
.importdata()
can parse text tables that have a mix of character data and numeric data, and return both types of data separately. By contrast,dlmread()
can skip text rows and columns, but only parse and the remaining numeric data and return that. Once again, a more general function appears to supersede the features of an existing function.
- Tables are slow
-
tic; n = 300; x = zeros(n, 4); for i = 1 : n for j = 1 : 4 x(i, j) = i * j; end end toc; % About 10 milliseconds % Similar run time if matrix is replaced by cell array tic; n = 300; t = zeros(n, 1); y = table(t, t, t, t); for i = 1 : n for j = 1 : 4 y{i, j} = i * j; end end toc; % About 1000 milliseconds
In one application I worked on, MATLAB’s tables were a tidy way to organize the data and avoid coding errors compared to matrices with raw column number indexing. But after tentatively rewriting the code to use a table, the performance was abysmal with more than 10× slowdown. So I ended up using a structure of columns instead, which provide some degree of organization and no loss of performance.
- Additional notes
-
These examples were tested on MATLAB R2015b. But the links to online documentation are always current (year at the time of writing).
MATLAB was the language used in the most courses of my undergraduate computer science program at the University of Toronto – not Java, Python, or C. This was due to my curriculum being oriented toward numerical analysis and a bit of machine learning.
Despite my repeated exposure to MATLAB at school, you can probably guess that it’s one of my least favorite languages. And since I cannot justify paying for a license, MATLAB is simply not in my daily toolbox. I only get sporadic exposure to it through work for clients.
A blog post with a similar theme to mine but different details: Nikolaus Rath: MATLAB is a terrible programming language
A forum thread: MATLAB Central: What frustrates you about MATLAB?
A blog post about MATLAB’s impact on developers: neuroplausible: I hate Matlab: How an IDE, a language, and a mentality harm