Some of the words and phrases you will mostly commonly come across as you go deeper into the world of machine learning, chatbots, generative tools and other forms of artificial intelligence.
- Natural Language Processing (NLP): This is the field of study that focuses on the interaction between computers and humans through natural language. The goal of NLP is to read, decipher, understand, and make sense of human language in a valuable way
- Artificial Intelligence (AI): AI refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions
- Machine Learning (ML): ML is a type of AI that provides systems the ability to automatically learn and improve from experience without being explicitly programmed
- Transformers: This is a type of ML model introduced in a paper titled “Attention is All You Need”. Transformers have been particularly effective in NLP tasks, and the GPT models (including ChatGPT) are based on the Transformer architecture
- Attention Mechanism: In the context of ML, attention mechanisms help models focus on specific aspects of the input data. They are a key part of Transformer models
- Fine-tuning: This is a process of taking a pre-trained model (like GPT) and training it further on a specific task. In the case of ChatGPT, it’s fine-tuned on a dataset of conversations
- Tokenization: In NLP, tokenization is the process of breaking down text into words, phrases, symbols, or other meaningful elements called tokens
- Sequence-to-Sequence Models: These are types of ML models that transform an input sequence into an output sequence. ChatGPT can be viewed as a kind of sequence-to-sequence model, where the input sequence is a conversation history and the output sequence is the model’s response
- Function Calling: In the context of programming, a function call is the process of invoking a function that has been previously defined. In the context of AI like ChatGPT, function calling can refer to using the model’s “generate” or “complete” functions to produce a response
- API: An API, or Application Programming Interface, is a set of rules and protocols for building and interacting with software applications. OpenAI provides an API that developers can use to interact with ChatGPT
- Prompt Engineering: This refers to the practice of crafting effective prompts to get the desired output from language models like GPT
- Context Window: This refers to the number of recent tokens (input and output) that the model considers when generating a response
- Deep Learning: This is a subfield of ML that focuses on algorithms inspired by the structure and function of the brain, called artificial neural networks
- Neural Networks: In AI, these are computing systems with interconnected nodes, inspired by biological neural networks, which constitute the brain of living beings
- BERT (Bidirectional Encoder Representations from Transformers): This is a Transformer-based machine learning technique for NLP tasks developed by Google. Unlike GPT, BERT is bidirectional, making it ideal for tasks that require understanding context from both the left and the right of a word
- Supervised Learning: This is a type of machine learning where the model is trained on a labeled dataset, i.e., a dataset where the correct output is known
- Unsupervised Learning: In contrast to supervised learning, unsupervised learning involves training a model on a dataset where the correct output is not known
- Semi-Supervised Learning: This is a machine learning approach where a small amount of the data is labeled, and the large majority is unlabeled. This approach combines aspects of both supervised and unsupervised learning
- Reinforcement Learning: This is a type of machine learning where an agent learns to make decisions by taking actions in an environment to achieve a goal. The agent receives rewards or penalties for the actions it takes, and it learns to maximize the total reward over time
- Generative Models: These are models that can generate new data instances that resemble the training data. ChatGPT is an example of a generative model
- Discriminative Models: In contrast to generative models, discriminative models learn the boundary between classes in the training data. They are typically used for classification tasks
- Backpropagation: This is a method used in artificial neural networks to calculate the gradient of the loss function with respect to the weights in the network
- Loss Function: In ML, this is a method of evaluating how well a specific algorithm models the given data. If the predictions deviate too much from the actual results, loss function would cough up a very large number. It’s used during the training phase to update the weights
- Overfitting: This happens when a statistical model or ML algorithm captures the noise of the data. It occurs when the model is too complex relative to the amount and noise of the training data
- Underfitting: This is the opposite of overfitting. It occurs when the model is too simple to capture the underlying structure of the data
- Regularisation: This is a technique used to prevent overfitting by adding a penalty term to the loss function
- Hyperparameters: These are the parameters of the learning algorithm itself, not derived through training, that need to be set before training starts
- Epoch: One complete pass through the entire training dataset
- Batch Size: The number of training examples in one forward/backward pass (one epoch consists of multiple batches)
- Learning Rate: This is a hyperparameter that determines the step size at each iteration while moving toward a minimum of a loss function
- Activation Function: In a neural network, the activation function determines whether a neuron should be activated or not by calculating the weighted sum and adding bias
- ReLU (Rectified Linear Unit): This is a type of activation function that is used in the hidden layers of a neural network. It outputs the input directly if it is positive, else, it will output zero
- Sigmoid Function: This is an activation function that maps34. Softmax Function: This is an activation function used in the output layer of a neural network for multi-class classification problems. It converts a vector of numbers into a vector of probabilities, where the probabilities sum up to one
- Bias and Variance: Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm. Variance is error due to too much complexity in the learning algorithm
- Bias Node: In neural networks, a bias node is an additional neuron added to each pre-output layer that stores the value of one
- Gradient Descent: This is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient
- Stochastic Gradient Descent (SGD): This is a variant of gradient descent, where instead of using the entire data set to compute the gradient at each step, you use only one example
- Adam Optimizer: Adam is a replacement optimization algorithm for stochastic gradient descent for training deep learning models
- Data Augmentation: This is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data
- Transfer Learning: This is a research problem in ML that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem
- Multilayer Perceptron (MLP): This is a class of feedforward artificial neural network that consists of at least three layers of nodes: an input layer, a hidden layer, and an output layer
- Convolutional Neural Networks (CNNs): These are deep learning algorithms that can process structured grid data like an image, and are used in image recognition and processing
- Recurrent Neural Networks (RNNs): These are a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows them to use their internal state (memory) to process sequences of inputs
- Long Short-Term Memory (LSTM): This is a special kind of RNN, capable of learning long-term dependencies, and is used in deep learning because of its promising performance
- Encoder-Decoder Structure: This is a type of neural network design pattern. In an encoder-decoder structure, the encoder processes the input data and the decoder takes the output of the encoder and produces the final output
- Word Embedding: This is the collective name for a set of language modeling and feature learning techniques in NLP where words or phrases from the vocabulary are mapped to vectors of real numbers
- Embedding Layer: This is a layer in a neural network that turns positive integers (indexes) into dense vectors of fixed size, typically used to find word embeddings
- Beam Search: This is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set
- Temperature (in the context of AI models): This is a parameter in language models like GPT-3 that controls the randomness of predictions by scaling the logits before applying softmaxZero-Shot Learning: This refers to the ability of a machine learning model to understand and act upon tasks that it has not seen during training
- Autoregressive Models: This is a type of random process where future values are a linear function of its past values, plus some noise term. ChatGPT is an example of an autoregressive model
- Zero-Shot Learning: This refers to the ability of a machine learning model to understand and act upon tasks that it has not seen during training
- One-Shot Learning: This is a concept in machine learning where the learning algorithm is required to classify objects based on a single example of each new class
- Few-Shot Learning: This55. Language Model: A type of model used in NLP that can predict the next word in a sequence given the words that precede it
- Perplexity: A metric used to judge the quality of a language model. Lower perplexity values indicate better language model performance
- Named Entity Recognition (NER): An NLP task that identifies named entities in text, such as names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
- Sentiment Analysis: An NLP task that determines the emotional tone behind words to gain an understanding of the attitudes, opinions, and emotions of a speaker or writer
- Dialog Systems: Systems that can converse with human users in natural language. ChatGPT is an example of a dialog system
- Seq2Seq Models: Models that convert sequences from one domain (e.g., sentences in English) to sequences in another domain (e.g., the same sentences translated to French)
- Data Annotation: The process of labeling or categorizing data, often used to create training data for machine learning models
- Pre-training: The first phase in training large language models like GPT-3, where the model learns to predict the next word in a sentence. This phase is unsupervised and uses a large corpus of text
- Knowledge Distillation: A process where a smaller model is trained to reproduce the behavior of a larger model (or an ensemble of models), with the aim of creating a model with comparable predictive performance but lower computational complexity
- Capsule Networks (CapsNets): A type of artificial neural network that can better model hierarchical relationships, and are better suited to tasks that require understanding of spatial hierarchies between features
- Bidirectional LSTM (BiLSTM): A variation of the LSTM that can improve model performance on sequence classification problems
- Attention Models: Models that can focus on specific information to improve the results of complex tasks
- Self-Attention: A method in attention models where the model checks each word in the input sequence for all the other words to better understand their impact on the sentence
- Transformer Models: Models that use self-attention mechanisms, often used in understanding the context of words in a sentence
- Generative Pre-training Transformer (GPT): A large transformer-based language model with billions of parameters, trained on a large corpus of text from the internet
- Multimodal Models: AI models that can understand inputs from different data types like text, image, sound, etc.
- Datasets: Collections of data. In machine learning, datasets are used to train and test models
- Training Set: The portion of the dataset used to train a machine learning mode
- Validation Set: The portion of the dataset used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters
- Test Set: The portion of the dataset used to provide an unbiased evaluation of a final model fit on the training dataset
- Cross-Validation: A resampling procedure used to evaluate machine learning models on a limited data sample
- Word2Vec: A group of related models that are used to produce word embeddings
- GloVe (Global Vectors for Word Representation): An unsupervised learning algorithm for obtaining vector representations for words
- TF-IDF (Term Frequency-Inverse Document Frequency): A numerical statistic that reflects how important a word is to a document in a collection or corpus
- Bag of Words (BoW): A representation of text that describes the occurrence of words within80. n-grams: Contiguous sequences of n items from a given sample of text or speech. When working with text, an n-gram could be a sequence of words, letters, or even sentences
- Skip-grams: A variant of n-grams where the components (words, letters) need not be consecutive in the text under consideration, but may leave gaps that are skipped over
- Levenshtein Distance: A string metric for measuring the difference between two sequences, also known as edit distance. The Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other
- Part-of-Speech Tagging (POS Tagging): The process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context
- Stop Words: Commonly used words (such as “a”, “an”, “in”) that a search engine has been programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a search quer
- Stemming: The process of reducing inflected (or sometimes derived) words to their word stem, base or root form
- Lemmatization: Similar to stemming, but takes into consideration the morphological analysis of the words. The lemma, or dictionary form of a word, is used instead of just stripping suffixes
- Word Sense Disambiguation: The ability to identify the meaning of words in context in a computational manner. This is a challenging problem in NLP because it’s difficult for a machine to understand context in the way a human can
- Syntactic Parsing: The process of analysing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar
- Semantic Analysis: The process of understanding the meaning of a text, including its literal meaning and the meaning that the speaker or writer intends to convey
- Pragmatic Analysis: Understanding the text in terms of the actions that the speaker or writer intends to perform with the text
- Topic Modeling: A type of statistical model used for discovering the abstract “topics” that occur in a collection of documents
- Latent Dirichlet Allocation (LDA): A generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar
- Sentiment Score: A measure used in sentiment analysis that reflects the emotional tone of a text. The score typically ranges from -1 (very negative) to +1 (very positive)
- Entity Extraction: The process of identifying and classifying key elements from text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
- Coreference Resolution: The task of finding all expressions that refer to the same entity in a text. It is an important step for a lot of higher level NLP tasks that involve natural language understanding such as document summarization, question answering, and information extraction
- Chatbot: A software application used to conduct an on-line chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent
- Turn-taking: In the context of conversation, turn-taking is the manner in which orderly conversation is normally carried out. In a chatbot or conversational AI, it refers to the model’s ability to understand when to respond and when to wait for more input
- Anaphora Resolution: This is a task of coreference resolution that focuses on resolving what a particular pronoun or a noun phrase refers to
- Conversational Context: The context in which a conversation is taking place. This includes the broader situation, the participants’ shared knowledge, and the rules and conventions of conversation
- Paraphrasing: The process of restating the meaning of a text using different words. This can be useful in NLP for tasks like data augmentation, or for improving the diversity of chatbot responses
- Document Summarization: The process of shortening a text document with software, in order to create a summary with the major points of the original document. It is an important application of NLP that can be used to condense large amounts of information
- Automatic Speech Recognition (ASR): Technology that converts spoken language into written text. This can be used for voice command applications, transcription services, and more
- Text-to-Speech (TTS): The process of creating synthetic speech by converting text into spoken voice output
Contact Mondatum if you would like advice, guidance and support on how to get the best out of machine learning and generative AI tools like Midjourney, Stable Diffusion and ChatGPT – https://www.mondatum.com/contact
Source: Geeky Gadgets