How matrix multiplication leads to artificial intelligence

Okay, so if you have a basic background in machine learning and linear algebra, you know that matrices are involved with artificial neural networks. But why do we multiply matrices and what comes out of it?

Consider an example of two matrices A and B. Matrix A has rows representing the food recipes and columns representing the ingredients involved with those recipes. Matrix B has rows representing several ingredients and columns representing their nutritional information. For simplicity, the matrices A and B are illustrated in the image below.

Matrix A Matrix B

We all know that the formula for matrix multiplication is Matrix multiplication formula

If we use the formula for multiplying our two matrices, we get a third matrix with the following information.

Matrix C

The third matrix essentially represents food recipes to their respective nutritional information, thus giving us a new piece of information that shows some relational mapping between food recipes and their related nutritional information. The core principle to note here is that when we multiply matrices, we are taking information in a particular dimensional space and producing newer information in a different dimensional space. Our original matrices represent information in a certain way. By multiplying them, we transform this information, aiming to create new representations in a different 'space' where underlying patterns or relationships might become clearer and thus more 'learnable' for an AI system.

Just because you can multiply two matrices does not mean that their multiplication will result in a highly informative matrix. In this example, we deliberately named the columns of matrix A with the same names as the rows of matrix B, but that won't always be the case. So is there a way to know if a particular matrix multiplication will result in useful information or not? Not really, but some useful metrics or properties of matrices directly relate to the information within them. A rank of a matrix tells you the unique number of linearly independent rows or columns it has. Put simply, it just represents the number of unique dimensions across which the matrix includes unique information. Our matrix A had a certain rank, but if we were to add more unique recipes to it and more unique ingredients in detail, the resultant matrix would probably have a higher rank than the current one. When you multiply two matrices though, the rank of the matrix can be equal to or less than the two matrices, but can never exceed them. For example, in the case of matrices A and B, the number of fundamentally distinct nutritional profiles you find for your recipes cannot be greater than the number of fundamentally distinct ingredient combinations present in your recipes (defined by matrix A's structure), nor can it be greater than the number of fundamentally distinct nutritional patterns offered by the ingredients themselves (defined by matrix B's structure). Essentially, the "variety" (or rank) of nutritional outcomes is capped by the initial "variety" available in how your recipes are composed and how your ingredients are nutritionally defined.

So as of now, we have one important insight:

Multiplying two matrices produces new information in a completely new dimensional space. Multiplication brings a transformation to the previously known information that can potentially help us learn the inherent patterns in a clearer way.

But how does all of this help create artificial intelligence? What is intelligence anyway? One common way to define it is the ability to learn patterns and relationships from information and then use that learning to make predictions or generate new, relevant information. The large language models (LLMs) we use today seem "intelligent" because they've learned complex patterns from vast amounts of text data and can generate coherent and contextually appropriate new text.

In a neural network, these learned patterns are primarily stored in what are called weights. We can think of the collection of these weights, especially for a specific layer, as forming a kind of "transformation instruction set". A neural network is structured into layers, and each layer contains multiple interconnected processing units, often called neurons. Collectively, these weights can be thought of as an "intelligence matrix".

When information (say, from the previous layer or the initial input) flows into a layer, it undergoes a transformation. This transformation typically involves a matrix multiplication: the input data (represented as a vector or matrix) is multiplied by the layer's weight matrix. After this, a bias value is often added, and then an activation function is applied to each resulting element. This entire operation produces the output for that layer, which then becomes the input for the next. So, a single neuron contributes to this process by performing a weighted sum of its inputs, adding a bias, and applying an activation, but the collective action of all neurons in a layer is efficiently handled using matrix multiplication.

The matrix multiplication step transforms the input data into a new representation. The dimensions of this new representation (the output of the layer) can be smaller, larger, or the same as the input, depending on the design of the weight matrix for that layer. So, while sometimes dimensions are reduced to focus on key features, it's not always about going into "lower and lower dimensions." Instead, it's about transforming the information into successively more abstract or useful representations that help the network understand intricate relationships.

As you train a neural network, you are essentially fine-tuning these weight values across all layers. Through many cycles of computation and adjustment, the goal is to arrive at a set of weights that allows the network to accurately capture the underlying patterns in the data it was fed. This highly tuned set of weights is what enables the neural network to make good predictions or generate insightful outputs.

Thus, using simple mathematical tools such as matrix multiplication and its properties, you can create systems that are kind of intelligent.