Teaching Computers the “Vibe”
In my last post, we saw how AI breaks down language into tokens, so basically turning words into a set of numbers. But here´s the problem: To a computer, the number for “dog” and the number for “cat” are just two different integers. It has no idea that they are both pets or why they should be connected to each other. To solve this issue, we use vector embeddings which allow the AI to understand semantic similarity.
Are you confused? That´s okay, I was to. Let´s start with the basics. What exactly is an embedding? At its simplest, an embedding is a dense list of numbers, so a vector, that can represent unstructured data like text or images. Compared to just tokens, it carries the meaning of the data. So, when two words have a similar meaning, their vectors in this high-dimensional space are mathematically close to each other. As an example, the vector of “dog” will be closer to the vector of “cat” than it will be to the vector of “airplane” because dog and cat are both pets and therefore closer related.
Very cool, right? However, it wasn´t always like this. Before we had modern embeddings, we used a method called TF-IDF which stands for “term frequency- inverse document frequency”. It counted only how often a word appeared which made it great for basic searches, but it disregarded grammar and order completely. Then, with Word2Vec, there was a major breakthrough. The philosophy was: “You shall know a word by the company it keeps.” So it learned word relationships looking at their neighbors in a sentence which allowed the model to actually learn meaning for the first time.
Now that we have embeddings, we can actually perform math on them! This is called semantic arithmetic. Because vectors capture relationships, researchers found that if you for example take the vector for “King” and subtract “Man” and then add “Woman”, the resulting vector will be…you guessed it: “Queen”. This really proves that we are so far that the model does not just memorize words, but it really understands concepts like gender and royalty as mathematical directions.
This works because AI embeddings have so many dimensions, and every dimension combined allows the AI to create those unique mathematical signatures for every concept. Why does it matter, you might ask? Embeddings are the foundation of everything we do with AI today. They are the hidden underlying power of recommendation systems, like when amazon.com again suggests you a great new product? Yeah, that´s based on embeddings. Also, they are used for many more topics like image recognition or fraud detection. Most importantly, they are the first step in the retrieval-augmented generation pipeline. Without embeddings, the AI could not search through a company´s SharePoint to find the right answer because it would not know which chunks of text are relevant to the actual user question.
You now understood the importance of embeddings which take the integers of tokenization and give them semantic meanings. In my next post, you´ll learn how businesses use these embeddings to build “RAGS to RICHES” systems that can chat with their own private data.

















