A list of 100 machine learning terms to know

Some of the words and phrases you will mostly commonly come across as you go deeper into the world of machine learning, chatbots, generative tools and other forms of artificial intelligence.

Natural Language Processing (NLP): This is the field of study that focuses on the interaction between computers and humans through natural language. The goal of NLP is to read, decipher, understand, and make sense of human language in a valuable way
Artificial Intelligence (AI): AI refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions
Machine Learning (ML): ML is a type of AI that provides systems the ability to automatically learn and improve from experience without being explicitly programmed
Transformers: This is a type of ML model introduced in a paper titled “Attention is All You Need”. Transformers have been particularly effective in NLP tasks, and the GPT models (including ChatGPT) are based on the Transformer architecture
Attention Mechanism: In the context of ML, attention mechanisms help models focus on specific aspects of the input data. They are a key part of Transformer models
Fine-tuning: This is a process of taking a pre-trained model (like GPT) and training it further on a specific task. In the case of ChatGPT, it’s fine-tuned on a dataset of conversations
Tokenization: In NLP, tokenization is the process of breaking down text into words, phrases, symbols, or other meaningful elements called tokens
Sequence-to-Sequence Models: These are types of ML models that transform an input sequence into an output sequence. ChatGPT can be viewed as a kind of sequence-to-sequence model, where the input sequence is a conversation history and the output sequence is the model’s response
Function Calling: In the context of programming, a function call is the process of invoking a function that has been previously defined. In the context of AI like ChatGPT, function calling can refer to using the model’s “generate” or “complete” functions to produce a response
API: An API, or Application Programming Interface, is a set of rules and protocols for building and interacting with software applications. OpenAI provides an API that developers can use to interact with ChatGPT
Prompt Engineering: This refers to the practice of crafting effective prompts to get the desired output from language models like GPT
Context Window: This refers to the number of recent tokens (input and output) that the model considers when generating a response
Deep Learning: This is a subfield of ML that focuses on algorithms inspired by the structure and function of the brain, called artificial neural networks
Neural Networks: In AI, these are computing systems with interconnected nodes, inspired by biological neural networks, which constitute the brain of living beings
BERT (Bidirectional Encoder Representations from Transformers): This is a Transformer-based machine learning technique for NLP tasks developed by Google. Unlike GPT, BERT is bidirectional, making it ideal for tasks that require understanding context from both the left and the right of a word
Supervised Learning: This is a type of machine learning where the model is trained on a labeled dataset, i.e., a dataset where the correct output is known
Unsupervised Learning: In contrast to supervised learning, unsupervised learning involves training a model on a dataset where the correct output is not known
Semi-Supervised Learning: This is a machine learning approach where a small amount of the data is labeled, and the large majority is unlabeled. This approach combines aspects of both supervised and unsupervised learning
Reinforcement Learning: This is a type of machine learning where an agent learns to make decisions by taking actions in an environment to achieve a goal. The agent receives rewards or penalties for the actions it takes, and it learns to maximize the total reward over time
Generative Models: These are models that can generate new data instances that resemble the training data. ChatGPT is an example of a generative model
Discriminative Models: In contrast to generative models, discriminative models learn the boundary between classes in the training data. They are typically used for classification tasks
Backpropagation: This is a method used in artificial neural networks to calculate the gradient of the loss function with respect to the weights in the network
Loss Function: In ML, this is a method of evaluating how well a specific algorithm models the given data. If the predictions deviate too much from the actual results, loss function would cough up a very large number. It’s used during the training phase to update the weights
Overfitting: This happens when a statistical model or ML algorithm captures the noise of the data. It occurs when the model is too complex relative to the amount and noise of the training data
Underfitting: This is the opposite of overfitting. It occurs when the model is too simple to capture the underlying structure of the data
Regularisation: This is a technique used to prevent overfitting by adding a penalty term to the loss function
Hyperparameters: These are the parameters of the learning algorithm itself, not derived through training, that need to be set before training starts
Epoch: One complete pass through the entire training dataset
Batch Size: The number of training examples in one forward/backward pass (one epoch consists of multiple batches)
Learning Rate: This is a hyperparameter that determines the step size at each iteration while moving toward a minimum of a loss function
Activation Function: In a neural network, the activation function determines whether a neuron should be activated or not by calculating the weighted sum and adding bias
ReLU (Rectified Linear Unit): This is a type of activation function that is used in the hidden layers of a neural network. It outputs the input directly if it is positive, else, it will output zero
Sigmoid Function: This is an activation function that maps34. Softmax Function: This is an activation function used in the output layer of a neural network for multi-class classification problems. It converts a vector of numbers into a vector of probabilities, where the probabilities sum up to one
Bias and Variance: Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm. Variance is error due to too much complexity in the learning algorithm
Bias Node: In neural networks, a bias node is an additional neuron added to each pre-output layer that stores the value of one
Gradient Descent: This is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient
Stochastic Gradient Descent (SGD): This is a variant of gradient descent, where instead of using the entire data set to compute the gradient at each step, you use only one example
Adam Optimizer: Adam is a replacement optimization algorithm for stochastic gradient descent for training deep learning models
Data Augmentation: This is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data
Transfer Learning: This is a research problem in ML that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem
Multilayer Perceptron (MLP): This is a class of feedforward artificial neural network that consists of at least three layers of nodes: an input layer, a hidden layer, and an output layer
Convolutional Neural Networks (CNNs): These are deep learning algorithms that can process structured grid data like an image, and are used in image recognition and processing
Recurrent Neural Networks (RNNs): These are a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows them to use their internal state (memory) to process sequences of inputs
Long Short-Term Memory (LSTM): This is a special kind of RNN, capable of learning long-term dependencies, and is used in deep learning because of its promising performance
Encoder-Decoder Structure: This is a type of neural network design pattern. In an encoder-decoder structure, the encoder processes the input data and the decoder takes the output of the encoder and produces the final output
Word Embedding: This is the collective name for a set of language modeling and feature learning techniques in NLP where words or phrases from the vocabulary are mapped to vectors of real numbers
Embedding Layer: This is a layer in a neural network that turns positive integers (indexes) into dense vectors of fixed size, typically used to find word embeddings
Beam Search: This is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set
Temperature (in the context of AI models): This is a parameter in language models like GPT-3 that controls the randomness of predictions by scaling the logits before applying softmaxZero-Shot Learning: This refers to the ability of a machine learning model to understand and act upon tasks that it has not seen during training
Autoregressive Models: This is a type of random process where future values are a linear function of its past values, plus some noise term. ChatGPT is an example of an autoregressive model
Zero-Shot Learning: This refers to the ability of a machine learning model to understand and act upon tasks that it has not seen during training
One-Shot Learning: This is a concept in machine learning where the learning algorithm is required to classify objects based on a single example of each new class
Few-Shot Learning: This55. Language Model: A type of model used in NLP that can predict the next word in a sequence given the words that precede it
Perplexity: A metric used to judge the quality of a language model. Lower perplexity values indicate better language model performance
Named Entity Recognition (NER): An NLP task that identifies named entities in text, such as names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
Sentiment Analysis: An NLP task that determines the emotional tone behind words to gain an understanding of the attitudes, opinions, and emotions of a speaker or writer
Dialog Systems: Systems that can converse with human users in natural language. ChatGPT is an example of a dialog system
Seq2Seq Models: Models that convert sequences from one domain (e.g., sentences in English) to sequences in another domain (e.g., the same sentences translated to French)
Data Annotation: The process of labeling or categorizing data, often used to create training data for machine learning models
Pre-training: The first phase in training large language models like GPT-3, where the model learns to predict the next word in a sentence. This phase is unsupervised and uses a large corpus of text
Knowledge Distillation: A process where a smaller model is trained to reproduce the behavior of a larger model (or an ensemble of models), with the aim of creating a model with comparable predictive performance but lower computational complexity
Capsule Networks (CapsNets): A type of artificial neural network that can better model hierarchical relationships, and are better suited to tasks that require understanding of spatial hierarchies between features
Bidirectional LSTM (BiLSTM): A variation of the LSTM that can improve model performance on sequence classification problems
Attention Models: Models that can focus on specific information to improve the results of complex tasks
Self-Attention: A method in attention models where the model checks each word in the input sequence for all the other words to better understand their impact on the sentence
Transformer Models: Models that use self-attention mechanisms, often used in understanding the context of words in a sentence
Generative Pre-training Transformer (GPT): A large transformer-based language model with billions of parameters, trained on a large corpus of text from the internet
Multimodal Models: AI models that can understand inputs from different data types like text, image, sound, etc.
Datasets: Collections of data. In machine learning, datasets are used to train and test models
Training Set: The portion of the dataset used to train a machine learning mode
Validation Set: The portion of the dataset used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters
Test Set: The portion of the dataset used to provide an unbiased evaluation of a final model fit on the training dataset
Cross-Validation: A resampling procedure used to evaluate machine learning models on a limited data sample
Word2Vec: A group of related models that are used to produce word embeddings
GloVe (Global Vectors for Word Representation): An unsupervised learning algorithm for obtaining vector representations for words
TF-IDF (Term Frequency-Inverse Document Frequency): A numerical statistic that reflects how important a word is to a document in a collection or corpus
Bag of Words (BoW): A representation of text that describes the occurrence of words within80. n-grams: Contiguous sequences of n items from a given sample of text or speech. When working with text, an n-gram could be a sequence of words, letters, or even sentences
Skip-grams: A variant of n-grams where the components (words, letters) need not be consecutive in the text under consideration, but may leave gaps that are skipped over
Levenshtein Distance: A string metric for measuring the difference between two sequences, also known as edit distance. The Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other
Part-of-Speech Tagging (POS Tagging): The process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context
Stop Words: Commonly used words (such as “a”, “an”, “in”) that a search engine has been programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a search quer
Stemming: The process of reducing inflected (or sometimes derived) words to their word stem, base or root form
Lemmatization: Similar to stemming, but takes into consideration the morphological analysis of the words. The lemma, or dictionary form of a word, is used instead of just stripping suffixes
Word Sense Disambiguation: The ability to identify the meaning of words in context in a computational manner. This is a challenging problem in NLP because it’s difficult for a machine to understand context in the way a human can
Syntactic Parsing: The process of analysing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar
Semantic Analysis: The process of understanding the meaning of a text, including its literal meaning and the meaning that the speaker or writer intends to convey
Pragmatic Analysis: Understanding the text in terms of the actions that the speaker or writer intends to perform with the text
Topic Modeling: A type of statistical model used for discovering the abstract “topics” that occur in a collection of documents
Latent Dirichlet Allocation (LDA): A generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar
Sentiment Score: A measure used in sentiment analysis that reflects the emotional tone of a text. The score typically ranges from -1 (very negative) to +1 (very positive)
Entity Extraction: The process of identifying and classifying key elements from text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
Coreference Resolution: The task of finding all expressions that refer to the same entity in a text. It is an important step for a lot of higher level NLP tasks that involve natural language understanding such as document summarization, question answering, and information extraction
Chatbot: A software application used to conduct an on-line chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent
Turn-taking: In the context of conversation, turn-taking is the manner in which orderly conversation is normally carried out. In a chatbot or conversational AI, it refers to the model’s ability to understand when to respond and when to wait for more input
Anaphora Resolution: This is a task of coreference resolution that focuses on resolving what a particular pronoun or a noun phrase refers to
Conversational Context: The context in which a conversation is taking place. This includes the broader situation, the participants’ shared knowledge, and the rules and conventions of conversation
Paraphrasing: The process of restating the meaning of a text using different words. This can be useful in NLP for tasks like data augmentation, or for improving the diversity of chatbot responses
Document Summarization: The process of shortening a text document with software, in order to create a summary with the major points of the original document. It is an important application of NLP that can be used to condense large amounts of information
Automatic Speech Recognition (ASR): Technology that converts spoken language into written text. This can be used for voice command applications, transcription services, and more
Text-to-Speech (TTS): The process of creating synthetic speech by converting text into spoken voice output

Contact Mondatum if you would like advice, guidance and support on how to get the best out of machine learning and generative AI tools like Midjourney, Stable Diffusion and ChatGPT – https://www.mondatum.com/contact

Source: Geeky Gadgets

AI Artificial_Intelligence chatbot Generative_AI Generative_AI_art machine_learning ML

A list of 100 machine learning terms to know

RELATED INSIGHTS

Careers advice if you’re interested in AI

AI In Business In 2026: Creative & Media Industries Edition