Back

A list of 100 machine learning terms to know


Some of the words and phrases you will mostly commonly come across as you go deeper into the world of machine learning, chatbots, generative tools and other forms of artificial intelligence.

  1. Natural Language Processing (NLP): This is the field of study that focuses on the interaction between computers and humans through natural language. The goal of NLP is to read, decipher, understand, and make sense of human language in a valuable way
  2. Artificial Intelligence (AI): AI refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions
  3. Machine Learning (ML): ML is a type of AI that provides systems the ability to automatically learn and improve from experience without being explicitly programmed
  4. Transformers: This is a type of ML model introduced in a paper titled “Attention is All You Need”. Transformers have been particularly effective in NLP tasks, and the GPT models (including ChatGPT) are based on the Transformer architecture
  5. Attention Mechanism: In the context of ML, attention mechanisms help models focus on specific aspects of the input data. They are a key part of Transformer models
  6. Fine-tuning: This is a process of taking a pre-trained model (like GPT) and training it further on a specific task. In the case of ChatGPT, it’s fine-tuned on a dataset of conversations
  7. Tokenization: In NLP, tokenization is the process of breaking down text into words, phrases, symbols, or other meaningful elements called tokens
  8. Sequence-to-Sequence Models: These are types of ML models that transform an input sequence into an output sequence. ChatGPT can be viewed as a kind of sequence-to-sequence model, where the input sequence is a conversation history and the output sequence is the model’s response
  9. Function Calling: In the context of programming, a function call is the process of invoking a function that has been previously defined. In the context of AI like ChatGPT, function calling can refer to using the model’s “generate” or “complete” functions to produce a response
  10. API: An API, or Application Programming Interface, is a set of rules and protocols for building and interacting with software applications. OpenAI provides an API that developers can use to interact with ChatGPT
  11. Prompt Engineering: This refers to the practice of crafting effective prompts to get the desired output from language models like GPT
  12. Context Window: This refers to the number of recent tokens (input and output) that the model considers when generating a response
  13. Deep Learning: This is a subfield of ML that focuses on algorithms inspired by the structure and function of the brain, called artificial neural networks
  14. Neural Networks: In AI, these are computing systems with interconnected nodes, inspired by biological neural networks, which constitute the brain of living beings
  15. BERT (Bidirectional Encoder Representations from Transformers): This is a Transformer-based machine learning technique for NLP tasks developed by Google. Unlike GPT, BERT is bidirectional, making it ideal for tasks that require understanding context from both the left and the right of a word
  16. Supervised Learning: This is a type of machine learning where the model is trained on a labeled dataset, i.e., a dataset where the correct output is known
  17. Unsupervised Learning: In contrast to supervised learning, unsupervised learning involves training a model on a dataset where the correct output is not known
  18. Semi-Supervised Learning: This is a machine learning approach where a small amount of the data is labeled, and the large majority is unlabeled. This approach combines aspects of both supervised and unsupervised learning
  19. Reinforcement Learning: This is a type of machine learning where an agent learns to make decisions by taking actions in an environment to achieve a goal. The agent receives rewards or penalties for the actions it takes, and it learns to maximize the total reward over time
  20. Generative Models: These are models that can generate new data instances that resemble the training data. ChatGPT is an example of a generative model
  21. Discriminative Models: In contrast to generative models, discriminative models learn the boundary between classes in the training data. They are typically used for classification tasks
  22. Backpropagation: This is a method used in artificial neural networks to calculate the gradient of the loss function with respect to the weights in the network
  23. Loss Function: In ML, this is a method of evaluating how well a specific algorithm models the given data. If the predictions deviate too much from the actual results, loss function would cough up a very large number. It’s used during the training phase to update the weights
  24. Overfitting: This happens when a statistical model or ML algorithm captures the noise of the data. It occurs when the model is too complex relative to the amount and noise of the training data
  25. Underfitting: This is the opposite of overfitting. It occurs when the model is too simple to capture the underlying structure of the data
  26. Regularisation: This is a technique used to prevent overfitting by adding a penalty term to the loss function
  27. Hyperparameters: These are the parameters of the learning algorithm itself, not derived through training, that need to be set before training starts
  28. Epoch: One complete pass through the entire training dataset
  29. Batch Size: The number of training examples in one forward/backward pass (one epoch consists of multiple batches)
  30. Learning Rate: This is a hyperparameter that determines the step size at each iteration while moving toward a minimum of a loss function
  31. Activation Function: In a neural network, the activation function determines whether a neuron should be activated or not by calculating the weighted sum and adding bias
  32. ReLU (Rectified Linear Unit): This is a type of activation function that is used in the hidden layers of a neural network. It outputs the input directly if it is positive, else, it will output zero
  33. Sigmoid Function: This is an activation function that maps34. Softmax Function: This is an activation function used in the output layer of a neural network for multi-class classification problems. It converts a vector of numbers into a vector of probabilities, where the probabilities sum up to one
  34. Bias and Variance: Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm. Variance is error due to too much complexity in the learning algorithm
  35. Bias Node: In neural networks, a bias node is an additional neuron added to each pre-output layer that stores the value of one
  36. Gradient Descent: This is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient
  37. Stochastic Gradient Descent (SGD): This is a variant of gradient descent, where instead of using the entire data set to compute the gradient at each step, you use only one example
  38. Adam Optimizer: Adam is a replacement optimization algorithm for stochastic gradient descent for training deep learning models
  39. Data Augmentation: This is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data
  40. Transfer Learning: This is a research problem in ML that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem
  41. Multilayer Perceptron (MLP): This is a class of feedforward artificial neural network that consists of at least three layers of nodes: an input layer, a hidden layer, and an output layer
  42. Convolutional Neural Networks (CNNs): These are deep learning algorithms that can process structured grid data like an image, and are used in image recognition and processing
  43. Recurrent Neural Networks (RNNs): These are a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows them to use their internal state (memory) to process sequences of inputs
  44. Long Short-Term Memory (LSTM): This is a special kind of RNN, capable of learning long-term dependencies, and is used in deep learning because of its promising performance
  45. Encoder-Decoder Structure: This is a type of neural network design pattern. In an encoder-decoder structure, the encoder processes the input data and the decoder takes the output of the encoder and produces the final output
  46. Word Embedding: This is the collective name for a set of language modeling and feature learning techniques in NLP where words or phrases from the vocabulary are mapped to vectors of real numbers
  47. Embedding Layer: This is a layer in a neural network that turns positive integers (indexes) into dense vectors of fixed size, typically used to find word embeddings
  48. Beam Search: This is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set
  49. Temperature (in the context of AI models): This is a parameter in language models like GPT-3 that controls the randomness of predictions by scaling the logits before applying softmaxZero-Shot Learning: This refers to the ability of a machine learning model to understand and act upon tasks that it has not seen during training
  50. Autoregressive Models: This is a type of random process where future values are a linear function of its past values, plus some noise term. ChatGPT is an example of an autoregressive model
  51. Zero-Shot Learning: This refers to the ability of a machine learning model to understand and act upon tasks that it has not seen during training
  52. One-Shot Learning: This is a concept in machine learning where the learning algorithm is required to classify objects based on a single example of each new class
  53. Few-Shot Learning: This55. Language Model: A type of model used in NLP that can predict the next word in a sequence given the words that precede it
  54. Perplexity: A metric used to judge the quality of a language model. Lower perplexity values indicate better language model performance
  55. Named Entity Recognition (NER): An NLP task that identifies named entities in text, such as names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
  56. Sentiment Analysis: An NLP task that determines the emotional tone behind words to gain an understanding of the attitudes, opinions, and emotions of a speaker or writer
  57. Dialog Systems: Systems that can converse with human users in natural language. ChatGPT is an example of a dialog system
  58. Seq2Seq Models: Models that convert sequences from one domain (e.g., sentences in English) to sequences in another domain (e.g., the same sentences translated to French)
  59. Data Annotation: The process of labeling or categorizing data, often used to create training data for machine learning models
  60. Pre-training: The first phase in training large language models like GPT-3, where the model learns to predict the next word in a sentence. This phase is unsupervised and uses a large corpus of text
  61. Knowledge Distillation: A process where a smaller model is trained to reproduce the behavior of a larger model (or an ensemble of models), with the aim of creating a model with comparable predictive performance but lower computational complexity
  62. Capsule Networks (CapsNets): A type of artificial neural network that can better model hierarchical relationships, and are better suited to tasks that require understanding of spatial hierarchies between features
  63. Bidirectional LSTM (BiLSTM): A variation of the LSTM that can improve model performance on sequence classification problems
  64. Attention Models: Models that can focus on specific information to improve the results of complex tasks
  65. Self-Attention: A method in attention models where the model checks each word in the input sequence for all the other words to better understand their impact on the sentence
  66. Transformer Models: Models that use self-attention mechanisms, often used in understanding the context of words in a sentence
  67. Generative Pre-training Transformer (GPT): A large transformer-based language model with billions of parameters, trained on a large corpus of text from the internet
  68. Multimodal Models: AI models that can understand inputs from different data types like text, image, sound, etc.
  69. Datasets: Collections of data. In machine learning, datasets are used to train and test models
  70. Training Set: The portion of the dataset used to train a machine learning mode
  71. Validation Set: The portion of the dataset used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters
  72. Test Set: The portion of the dataset used to provide an unbiased evaluation of a final model fit on the training dataset
  73. Cross-Validation: A resampling procedure used to evaluate machine learning models on a limited data sample
  74. Word2Vec: A group of related models that are used to produce word embeddings
  75. GloVe (Global Vectors for Word Representation): An unsupervised learning algorithm for obtaining vector representations for words
  76. TF-IDF (Term Frequency-Inverse Document Frequency): A numerical statistic that reflects how important a word is to a document in a collection or corpus
  77. Bag of Words (BoW): A representation of text that describes the occurrence of words within80. n-grams: Contiguous sequences of n items from a given sample of text or speech. When working with text, an n-gram could be a sequence of words, letters, or even sentences
  78. Skip-grams: A variant of n-grams where the components (words, letters) need not be consecutive in the text under consideration, but may leave gaps that are skipped over
  79. Levenshtein Distance: A string metric for measuring the difference between two sequences, also known as edit distance. The Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other
  80. Part-of-Speech Tagging (POS Tagging): The process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context
  81. Stop Words: Commonly used words (such as “a”, “an”, “in”) that a search engine has been programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a search quer
  82. Stemming: The process of reducing inflected (or sometimes derived) words to their word stem, base or root form
  83. Lemmatization: Similar to stemming, but takes into consideration the morphological analysis of the words. The lemma, or dictionary form of a word, is used instead of just stripping suffixes
  84. Word Sense Disambiguation: The ability to identify the meaning of words in context in a computational manner. This is a challenging problem in NLP because it’s difficult for a machine to understand context in the way a human can
  85. Syntactic Parsing: The process of analysing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar
  86. Semantic Analysis: The process of understanding the meaning of a text, including its literal meaning and the meaning that the speaker or writer intends to convey
  87. Pragmatic Analysis: Understanding the text in terms of the actions that the speaker or writer intends to perform with the text
  88. Topic Modeling: A type of statistical model used for discovering the abstract “topics” that occur in a collection of documents
  89. Latent Dirichlet Allocation (LDA): A generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar
  90. Sentiment Score: A measure used in sentiment analysis that reflects the emotional tone of a text. The score typically ranges from -1 (very negative) to +1 (very positive)
  91. Entity Extraction: The process of identifying and classifying key elements from text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
  92. Coreference Resolution: The task of finding all expressions that refer to the same entity in a text. It is an important step for a lot of higher level NLP tasks that involve natural language understanding such as document summarization, question answering, and information extraction
  93. Chatbot: A software application used to conduct an on-line chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent
  94. Turn-taking: In the context of conversation, turn-taking is the manner in which orderly conversation is normally carried out. In a chatbot or conversational AI, it refers to the model’s ability to understand when to respond and when to wait for more input
  95. Anaphora Resolution: This is a task of coreference resolution that focuses on resolving what a particular pronoun or a noun phrase refers to
  96. Conversational Context: The context in which a conversation is taking place. This includes the broader situation, the participants’ shared knowledge, and the rules and conventions of conversation
  97. Paraphrasing: The process of restating the meaning of a text using different words. This can be useful in NLP for tasks like data augmentation, or for improving the diversity of chatbot responses
  98. Document Summarization: The process of shortening a text document with software, in order to create a summary with the major points of the original document. It is an important application of NLP that can be used to condense large amounts of information
  99. Automatic Speech Recognition (ASR): Technology that converts spoken language into written text. This can be used for voice command applications, transcription services, and more
  100. Text-to-Speech (TTS): The process of creating synthetic speech by converting text into spoken voice output

Contact Mondatum if you would like advice, guidance and support on how to get the best out of machine learning and generative AI tools like Midjourney, Stable Diffusion and ChatGPT – https://www.mondatum.com/contact

Source: Geeky Gadgets



RELATED INSIGHTS