Understanding matrices in a tweet

May 18, 2024 by Dhrumil

This is the first piece in the series of posts/tweets I will write about mathematical concepts related to ML. Fortunately, a tweet is not restricted to just 140 characters now. The idea behind writing these concepts as tweets is that it forces you to write in a natural human language. You cannot write equations or complicated representations, just pure thoughts, so that it is easily understandable. I may use images occasionally to talk about things visually, though.

Let us talk about quantifying things around us. There are many ways we can quantify or measure something. The simplest form of measuring a quantity is called a scalar. For example, the number of apples in a basket, or the population of a city, or the weight of a human body. All of these are real-life concepts that can be quantified by just a numerical value. It could be a real number or an integer. Scalars help us talk about real-life concepts using precise numerical values.

But not everything around us can be measured with just a single numerical value. Sometimes you need additional information or context. Let's say you want to quantify your performance in a video game. The video game itself has several levels (like Level 1, Level 2, and so on). Let's say you earned 350 points for Level 1, 200 points for Level 2, and 250 points for Level 3. The number of points you earned in each level is a scalar quantity, but your performance across all the levels of the game by itself is a concept of its own. So the way you measure it is by representing all the points you earned at all levels, in a particular order, like: [350, 200, 250]. Note that the order in which you represent the numbers matters a lot. [350, 200, 250] is not the same as [350, 250, 200]. Now we know that your performance with a video game can be quantified by a collection of scalar quantities. We can call this collection of scalar values vectors. Vectors help us represent quantities on a complex higher-dimensional space, adding more context to the metric. For example, you can represent a color of a pixel via a vector like [255, 120, 300] representing the RGB values. So vectors are a collection a scalar quantities in a particular order. Each quantity or item can have multiple vectors representing different levels of information. For example, a pixel color representation can be [255, 120, 300] as a vector containing RGB values, and the same pixel can have a color value represented by [255, 120, 300, 0.5] denoting the RGB and a saturation value. Both vectors belong to the same pixel, but represent information in different dimensions.

Now let's say you want to quantify not just one pixel, but 1000 pixels of your LCD screen. How do you do that? Simple. You have 1000 unique vectors for each pixel, each representing an RGB value for a unique pixel. This collection of vectors is called a matrix. Note that the matrix does not represent a single pixel, but 1000 such pixels, belonging to the same LCD screen. Matrices (plural for matrix) are 2-dimensional and have rows and columns, just like tables. This matrix representing these 1000 pixels will have 1000 rows representing each pixel, and 3 columns representing the RGB values for each pixel.

[
250, 200, 100
250, 234, 56
54, 90, 101
...
...
101, 65, 90
]

But your LCD doesn't just have 1000 pixels, right? It has 1980 x 1080 pixels in a 2-D plane. Hence, to represent all the pixels on your screen, you need an additional dimension. A matrix with more than 2 dimensions (more than just rows and columns) is called a tensor. Generally, every type of quantity can be called a tensor. A scalar is a 0-dimensional tensor, a vector is a 1-dimensional tensor, a matrix is a 2-dimensional tensor, and tensors in general can have any number of dimensions. Each new dimension added to a tensor provides us with additional information and gives us an additional context for that item.

Matrices are quite fundamental to machine learning. Consider a table or a matrix where the rows represent different houses in a city block and the columns represent: the number of rooms, the area in sq. km, and the number of storeys for a house. Collectively, this matrix represents housing data for that city block. You can add more rows to this matrix, thus adding more instances or observations of houses from that city block. You can also add more columns to represent unique features for each house (like whether the house has a garage or the size of the kitchen). Machine learning algorithms work with multiple such matrices, each representing unique information, and mathematically operate them together to discover new information. Just like you can add two numbers to get a combined aggregate value, you can add/multiply or operate two matrices together to get a new combined value. One of the most important things to note about matrices is that the order in which the scalars within a vector and vectors within a matrix is super important. Changing the order will tamper with the true information that the matrix represents.

The idea for this post was to set up a base for the reader to intuitively understand why we have matrices the way we have them. In the next post, I will talk about something that is fundamental and one of the most important concepts upon which all of modern AI technology is built: matrix multiplication.