Essential AI Glossary: Understand Key Terms and Concepts

Dive into the world of AI with our A-Z glossary. From ‘Algorithm’ to ‘Zero-shot’, unravel the mysteries of AI terms. What will you discover?

The field of artificial intelligence is evolving at a rapid pace, with new technologies, concepts, and terminology emerging constantly. For professionals, students, and anyone with an interest in AI, keeping up with these advances can feel overwhelming. Our AI Glossary aims to demystify key terms and provide accessible explanations of important AI concepts.

With clear definitions and insightful context, this glossary breaks down the jargon and complex ideas that can make AI seem impenetrable. We cut through the complexity to offer straightforward descriptions tailored to varying levels of technical knowledge. Readers can utilise this resource as a reference point to grasp fundamentals, as well as a launch pad for further learning.

Approachable yet comprehensive, the glossary comprises important terminology and concepts needed to participate in our increasingly AI-driven world. We invite you to explore key aspects of artificial intelligence through clear explanations and real-world examples. Whether you are new to AI or an experienced practitioner, we hope you will find this a valuable reference for expanding your understanding of this rapidly evolving field.

The glossary aims to illuminate rather than intimidate. Let us be your guide as you navigate the fascinating landscape of artificial intelligence.

A to Z Artificial Intelligence Glossary

A

Accuracy:

A measure of the proportion of correct predictions made by a machine learning model, expressed as a percentage.

Action:

The decision or choice made by an agent in a given state, which can influence the subsequent state and reward.

Activation Distribution:

The distribution of the output values of a neuron or layer, which can affect the representational capacity and convergence of the network.

Activation Function:

A non-linear function applied to the output of a neuron in a neural network, used to introduce non-linearity and improve the expressiveness and performance of the model.

AdaBoost:

A specific type of boosting that assigns higher weights to misclassified examples and trains subsequent models on these weighted examples in order to improve their performance.

Adagrad:

An adaptive learning rate optimization algorithm that adjusts the learning rate for each parameter based on the historical gradient updates, which can improve performance on sparse or noisy datasets.

Adam:

An adaptive learning rate optimization algorithm that combines the advantages of momentum and Adagrad, by using moving averages of the gradients and their square to adjust the learning rate, which can improve convergence and robustness.

Agent:

The entity or algorithm that is responsible for taking actions in an environment in order to achieve a goal.

Anchor:

A pre-defined bounding box that is used as a reference for generating candidate regions in an image.

Anomaly Detection:

A type of machine learning task in which the goal is to identify rare or unusual events or patterns in data, which may indicate errors, fraud, or other types of anomalies.

Audio Augmentation:

A type of data augmentation that involves applying transformations or perturbations to audio data, such as time stretching, pitch shifting, or adding noise.

Autoencoder:

A type of neural network that can be used for unsupervised anomaly detection, by learning a compressed representation of the input data and identifying anomalies as data points that do not fit the learned representation well.

Average Pooling:

A type of pooling operation that calculates the average value from a rectangular region of the input.

B

Backpropagation:

A method for training deep neural networks that involves adjusting the weights of the network based on the error between the predicted output and the actual output.

Bag of Words with Term Frequency-Inverse Document Frequency (TF-IDF):

A popular representation of text data that captures the frequency of words in a document, while also taking into account how common or rare the words are across the entire corpus.

Bag of Words:

A representation of text that counts the frequency of each word in a document.

Bagging:

A technique used for ensemble learning that involves training multiple models on different subsets of the training data, and combining their predictions by taking the average or majority vote.

Batch Gradient Descent:

A variant of gradient descent that computes the gradient of the cost function over the entire training dataset at each iteration, which can be slower and more memory-intensive but can converge to a more accurate solution.

Batch Normalization:

A regularization technique that involves normalizing the inputs to each layer of a neural network, in order to reduce the internal covariate shift and improve training stability.

Bayesian Optimization:

A hyperparameter tuning technique that involves constructing a probabilistic model of the performance of the model as a function of the hyperparameters, and using this model to guide the search for optimal hyperparameters.

Bias-Variance Tradeoff:

A fundamental tradeoff in machine learning between the ability of a model to fit the training data well, or have low bias, and the ability of the model to generalize well to new data, or have low variance.

Binarization:

A technique for converting an image into a binary format, where each pixel is either black or white, in order to separate text from the background.

Binary Classification:

A type of classification that involves predicting one of two possible outcomes, such as “yes” or “no”.

Binary Cross-Entropy:

A common loss function used for binary classification tasks that measures the difference between the predicted probabilities of the two classes and the true labels.

Binning:

A type of discretization that involves grouping continuous features into discrete bins or intervals based on their values.

Biometric:

A unique physical or behavioral characteristic of a person that can be used for identification, such as a fingerprint, iris pattern, or facial features.

Boosting:

A technique used for ensemble learning that involves training multiple models sequentially, where each subsequent model focuses on improving the performance of the previous model by giving more weight to misclassified examples.

C

Categorical Cross-Entropy:

A common loss function used for multi-class classification tasks that measures the difference between the predicted probabilities of each class and the true labels.

Character Segmentation:

The process of identifying individual characters in an image of text, in order to recognize each character separately.

Classification:

A type of machine learning task in which the goal is to predict a categorical or discrete output, such as a label or a class.

Cluster Center:

A point or vector that represents the “center” of a cluster in feature space.

Clustering Validity:

A measure of how well a clustering algorithm identifies meaningful and useful clusters in the data, rather than random or spurious clusters.

Clustering:

A type of machine learning task in which the goal is to group similar data points or objects together into clusters or subgroups.

Confusion Matrix:

A matrix that summarizes the number of true positives, true negatives, false positives, and false negatives in a classification task.

Convolutional Layer:

A layer in a CNN that performs convolutional operations on the input data.

Convolutional Neural Network (CNN):

A type of neural network that is commonly used for image classification and object detection tasks.

Cost Function:

A function used to measure the difference between the predicted and true output of a machine learning model, often used as the objective function for optimization algorithms.

Cross-Validation:

A technique used to evaluate the performance of a machine learning model on a limited dataset by splitting the data into training and validation sets, and repeatedly training and evaluating the model on different splits.

Cutout:

A data augmentation technique that involves randomly masking out or “cutting out” rectangular regions of an image, in order to encourage the model to focus on other parts of the image.

Cycle-Consistent Adversarial Networks (CycleGANs):

A type of GAN that can be used for image-to-image translation and data augmentation by learning to map between different domains of image data.

D

Data Augmentation:

A technique used to increase the size and diversity of a dataset by applying transformations or perturbations to the input data, in order to improve the generalization performance of machine learning models.

Data Cleaning:

The process of identifying and correcting or removing errors, inconsistencies, or missing values in a dataset.

Data Preprocessing:

The process of cleaning, transforming, and preparing raw data for machine learning tasks.

Data Transformation:

The process of converting raw data into a format that is suitable for machine learning algorithms, such as scaling, normalization, or one-hot encoding.

Decision Boundary:

The boundary or threshold between different classes in a classification problem, which separates one class from another.

Deep Learning:

A subfield of machine learning that focuses on developing models with many layers, allowing for more complex representations of input data.

Deep Reinforcement Learning:

A type of reinforcement learning that uses deep neural networks to approximate the policy or value function, and is capable of learning complex and high-dimensional tasks.

Density-Based Clustering:

A clustering algorithm that involves identifying clusters as high-density regions in the data space, separated by low-density regions.

Dimensionality Reduction:

A technique used to reduce the dimensionality of a dataset, in order to make clustering more efficient or to visualize the data in a lower-dimensional space.

Dimensionality Reduction:

A technique used to reduce the number of input features for machine learning tasks, in order to improve performance, reduce noise, and facilitate visualization.

Discretization:

The process of converting continuous features into discrete or categorical features, in order to make them more suitable for machine learning algorithms.

Doc2Vec:

A technique for representing entire documents as dense vectors, based on the distribution of words in the document.

Domain Adaptation:

The process of adapting a pretrained model to a different domain or distribution of data, in order to improve performance on a target task.

Domain Knowledge:

The knowledge or expertise of a subject matter expert or domain specialist, which can be used to create or select relevant features for machine learning tasks.

Dropout Layer:

A layer in a neural network that applies dropout to the input or output, or both, of the layer, in order to reduce overfitting and improve the robustness of the model.

Dropout Mask:

A binary mask that randomly selects which neurons or connections to drop out during training, and scales the remaining activations to maintain the expected activation.

Dropout Rate:

The probability of dropping out a neuron or connection during training, typically set between 0.2 and 0.5.

Dropout:

A regularization technique in which randomly selected nodes in a neural network are ignored during training, to prevent overfitting.

E

Early Stopping:

A regularization technique that involves monitoring the performance of a machine learning model on a validation set during training, and stopping the training process when the performance stops improving or starts to degrade.

Elastic Net Regularization:

A combination of L1 and L2 regularization that can be used to balance the benefits of both types of regularization.

ELU (Exponential Linear Unit):

A variant of ReLU that introduces a negative exponential term for negative input values, which can improve the robustness and performance of the model.

Embedded Methods:

Feature selection methods that incorporate feature selection into the training process of a machine learning algorithm, in order to select the most relevant features for the task.

Embedding:

A mathematical representation of a face that is derived from the facial features, and can be used to compare and identify faces.

Encoder-Decoder Architecture:

A type of neural network architecture that consists of an encoder network that downsamples an input image, followed by a decoder network that upsamples the resulting feature map to generate a segmentation mask.

Encoding:

The process of converting categorical or nominal features into a numerical format that is suitable for machine learning algorithms, such as one-hot encoding or label encoding.

Ensemble Learning:

A machine learning technique that involves combining multiple models to improve the generalization performance and accuracy of a prediction task.

Ensemble of Networks:

A technique that involves training multiple neural networks with different random dropout masks, and averaging their predictions at test time, which can further improve the performance and robustness of the model.

Ensemble Size:

The number of models used in an ensemble, which can affect the bias-variance tradeoff and the generalization performance of the ensemble.

Entity:

A real-world object or concept that has a name, such as a person, organization, or location.

Environment:

The external system or process that an agent interacts with in order to achieve a goal.

F

F1 Score:

A weighted harmonic mean of precision and recall, which provides a balanced measure of the model’s performance.

Face Alignment:

The process of normalizing the orientation and position of a face in an image, in order to improve the accuracy of subsequent face recognition tasks.

Face Detection:

The process of identifying and locating one or more faces in an image or video.

Face Recognition:

The process of using machine learning techniques to identify and verify the identity of a person based on their facial features.

Facial Landmarks:

Specific points on a face, such as the corners of the eyes and the tip of the nose, that can be used to identify and track facial features.

Faster R-CNN:

A type of object detection model that uses a separate network for region proposal and object detection.

Feature Engineering:

The process of creating new features or modifying existing features in order to improve the performance of machine learning models.

Feature Extraction:

A transfer learning approach that involves using a pretrained model as a fixed feature extractor, and training a new classifier on top of the extracted features.

Feature Importance:

A measure of the relative importance or relevance of each input feature to a machine learning task, which can be used for feature selection and interpretation.

Feature Scaling:

The process of rescaling input features to a common scale in order to improve the performance of machine learning algorithms.

Feature Selection:

The process of selecting a subset of input features for machine learning tasks, in order to improve performance and reduce the risk of overfitting.

Feedforward Neural Network:

A type of neural network in which the information flows in one direction, from input to output, without any loops or feedback connections.

Filter Methods:

Feature selection methods that rely on statistical tests or correlation measures to evaluate the relevance of input features to a machine learning task.

Fine-Tuning:

The process of taking a pretrained model and adapting it to a new task by training it on a smaller, task-specific dataset.

G

Gated Recurrent Unit (GRU):

A simplified version of an LSTM layer that uses fewer parameters.

Gazetteer:

A list or database of named entities, such as a list of city names or a database of company names, that can be used to improve the accuracy of NER models.

Generalization Performance:

The ability of a machine learning model to perform well on new or unseen data, beyond the training data used to optimize its parameters.

Generative Adversarial Networks (GANs):

A type of neural network architecture that can be used for data augmentation by learning to generate new examples that are similar to the training data.

Global Interpretability:

The ability to explain the overall behavior or decision-making process of a machine learning model across the entire input space.

GloVe:

Another popular word embedding technique that is based on factorizing the co-occurrence matrix of words in a corpus.

GPT (Generative Pre-trained Transformer):

A type of language generation model that uses a transformer-based architecture to generate high-quality text data.

Gradient Boosting:

A specific type of boosting that involves training decision trees sequentially, where each subsequent tree is trained on the residuals of the previous tree and aims to correct its mistakes.

Gradient Descent:

An optimization algorithm that is commonly used in regression tasks to minimize the difference between predicted and actual values.

Gradient-Based Optimization:

A hyperparameter tuning technique that involves using gradient descent or other optimization algorithms to directly optimize the hyperparameters of a machine learning model.

Gradient:

The derivative of a function with respect to its parameters, used to compute the direction of steepest descent and the rate of change at a given point.

Grid Search:

A hyperparameter tuning technique that involves systematically searching over a pre-defined space of hyperparameters and evaluating the performance of the model for each combination.

H

Hidden Layer:

Layers in a feedforward neural network that are not directly connected to the input or output layers.

Hidden State:

The output of a recurrent layer that is passed to the next time-step.

Hierarchical Clustering:

A clustering algorithm that involves building a tree-like structure of nested clusters, which can be visualized as a dendrogram.

Hinge Loss:

A loss function commonly used for support vector machine (SVM) models that penalizes misclassifications with a linearly increasing penalty as the margin between the predicted and true output decreases.

Huber Loss:

A loss function that is a hybrid of MSE and absolute error loss, which is less sensitive to outliers than MSE.

Hyperparameter Tuning:

The process of optimizing the hyperparameters of a machine learning model in order to achieve the best performance on a given task.

Hyperparameters:

Parameters of a machine learning model that are set before training and affect the learning process, such as the learning rate, number of hidden layers, number of neurons in each layer, regularization strength, or batch size.

I

Image Augmentation:

A type of data augmentation that involves applying transformations or perturbations to image data, such as rotation, scaling, flipping, cropping, or adding noise.

Image Processing:

The process of using machine learning techniques to analyze and manipulate digital images.

Image Segmentation:

The process of dividing an image into multiple segments, each of which corresponds to a different object or region of interest.

Individual Conditional Expectation (ICE) Plot:

A variation of the PDP that shows the relationship between a specific input feature and the output of a machine learning model for each individual instance in the dataset.

Inertia:

A measure of the within-cluster sum of squares, which can be used to evaluate the quality of a clustering.

Input Layer:

The layer of a feedforward neural network that receives the input data.

Instance Segmentation:

A type of image segmentation that identifies and assigns a unique label to each instance of an object in an image, such as “car 1”, “car 2”, etc.

Interaction Terms:

New features that are created by combining two or more existing features, in order to capture interactions or relationships between features.

Intersection over Union (IoU):

A measure of overlap between two bounding boxes, used to evaluate the accuracy of object detection models.

Inverted Dropout:

A variant of dropout that scales the activations during training and testing, to ensure that the expected activation is the same at both times and prevent the model from relying on the dropout noise.

Isolation Forest:

A commonly used algorithm for unsupervised anomaly detection that involves randomly partitioning data points into isolation trees, and identifying anomalies as data points with a short average path length to the root node.

J

K

K-Means Clustering:

A commonly used clustering algorithm that involves iteratively partitioning data points into k clusters, where k is the number of clusters specified by the user.

Kernel (Filter):

A small matrix that is applied to the input data during convolutional operations.

KL Divergence:

A loss function used for measuring the difference between two probability distributions, often used in training generative models.

L

L1 Regularization:

A type of regularization that adds a penalty term to the training objective of a machine learning algorithm, which encourages sparsity in the model weights and can be used for feature selection.

L2 Regularization:

A type of regularization that adds a penalty term to the training objective of a machine learning algorithm, which encourages small weights and can be used to prevent overfitting.

Language Generation:

The process of using machine learning models to generate text that is similar in style and content to human-written text.

Leaky ReLU:

A variant of ReLU that adds a small non-zero slope to the negative part of the input, which can prevent dead neurons and improve the performance on certain tasks.

Learning Rate Schedule:

A strategy used to anneal or adjust the learning rate during training, based on heuristics or performance metrics, in order to improve the convergence and stability of the optimization algorithm.

Learning Rate:

A hyperparameter used in gradient descent and other optimization algorithms that controls the step size or scaling factor of the gradient update at each iteration.

Lemmatization:

A more sophisticated form of stemming that takes into account the context and part of speech of the word being processed.

Lexicon-Based Approach:

A technique for performing sentiment analysis that uses pre-defined lists of words or phrases that are associated with positive or negative sentiment.

LIME:

A method for local model interpretability that involves generating perturbations of the input data and fitting a local linear model to the resulting predictions.

Linear Regression:

A type of regression that involves fitting a straight line to a set of data points in order to make predictions.

Local Interpretability:

The ability to explain the predictions made by a machine learning model for a specific input instance or subset of instances.

Long Short-Term Memory (LSTM):

A type of recurrent layer that is designed to better capture long-term dependencies in sequential data.

Loss Function:

A mathematical function that measures the difference between the predicted output of a machine learning model and the true output, and is used as a measure of the model’s performance during training.

M

Machine Learning Approach:

A technique for performing sentiment analysis that involves training a machine learning model to recognize patterns in text that are associated with positive or negative sentiment.

Mask:

A binary image that indicates which pixels in an input image belong to a particular segment or object.

Max Pooling:

A type of pooling operation that selects the maximum value from a rectangular region of the input.

Mean Squared Error (MSE):

A commonly used loss function in regression tasks that measures the average squared difference between predicted and actual values.

Meta-Learning:

A type of transfer learning that involves training a machine learning model to learn how to learn, by adapting to a series of related tasks.

Mini-Batch Gradient Descent:

A variant of gradient descent that computes the gradient of the cost function over a small random subset of the training data at each iteration, which balances the advantages and disadvantages of batch and stochastic gradient descent.

Mixup:

A data augmentation technique that involves linearly interpolating between pairs of training examples and their corresponding labels, in order to create new examples that lie on the line segment between them.

Model Complexity:

The degree of flexibility or expressiveness of a machine learning model, often controlled by the number of parameters or the depth of the architecture.

Model Evaluation:

The process of assessing the performance of machine learning models on a given task.

Model Explainability:

A broader term that encompasses both model interpretability and model transparency, which refers to the ability to understand and explain the behavior of a machine learning model to a human audience.

Model Interpretability:

The degree to which the internal workings of a machine learning model can be understood and explained.

Momentum:

A hyperparameter used in gradient descent and other optimization algorithms that controls the influence of the previous gradient update on the current update, which can help accelerate convergence and avoid local minima.

Multi-Class Classification:

A type of machine learning task in which the model is trained to predict one of three or more possible outcomes, such as positive, negative, or neutral sentiment.

Multi-Label Classification:

A type of classification that involves predicting multiple possible outcomes for a single input, such as predicting the presence of multiple objects in an image.

Multi-Task Learning:

A machine learning approach that involves training a single model on multiple tasks, in order to improve performance on all tasks.

Multiple Regression:

A type of regression that involves predicting a numerical output based on multiple input variables or features.

N

N-grams:

A technique for capturing the context of words in text data by considering sequences of N consecutive words.

Named Entity Recognition (NER):

A subtask of natural language processing that involves identifying and classifying named entities in text data, such as people, places, and organizations.

Named Entity Type:

A category or class of named entities, such as person, organization, or location.

Named Entity:

A word or phrase that refers to a specific entity in the real world, such as a person, organization, or location.

Natural Language Generation (NLG):

A subfield of language generation that focuses on generating text that is grammatically and semantically correct.

Negative Log-Likelihood (NLL):

A loss function commonly used in maximum likelihood estimation to measure the difference between the predicted and true probability distributions.

Neural Network Depth:

The number of layers in a neural network.

Neural Network:

A type of machine learning model that consists of layers of interconnected nodes, or neurons, that perform simple computations and collectively learn to approximate complex functions, inspired by the structure and function of the human brain.

Non-Maximum Suppression (NMS):

A technique for removing redundant object detections by keeping only the highest-scoring detections and suppressing others that are too close or overlap with higher-scoring detections.

Normalization:

A type of feature scaling that rescales input features to a range of 0 to 1, in order to improve the performance of machine learning algorithms that rely on distance or similarity measures.

O

Object Detection:

The process of identifying and localizing objects in an image.

Object Localization:

The process of identifying the location of an object in an image or video.

One-Hot Encoding:

A technique for representing categorical data, such as words or phrases, as binary vectors.

One-Shot Learning:

A type of machine learning task in which the model is trained on a small number of examples per class, in order to identify new instances of the same class with high accuracy.

Optical Character Recognition (OCR):

The process of using machine learning techniques to recognize and convert printed or handwritten text in images or documents into machine-readable text.

Optimization Algorithm:

A family of algorithms used for finding the minimum of a function, including gradient descent, stochastic gradient descent, and their variants.

Output Layer:

The layer of a feedforward neural network that produces the final output of the model.

Overfitting:

A phenomenon in which a regression model performs well on the training data but poorly on new, unseen data, due to memorizing noise or outliers in the training data.

P

Padding:

The process of adding extra border pixels around the input data to ensure that the output of the convolution operation has the same dimensions as the input.

Partial Dependence Plot (PDP):

A graphical representation of the relationship between a specific input feature and the output of a machine learning model, while holding all other input features constant.

Pixel:

The smallest unit of an image that can be displayed or manipulated.

Polarity:

The positive or negative nature of the sentiment expressed in a piece of text.

Policy:

The strategy or rule that an agent uses to select actions based on the current state.

Polynomial Features:

New features that are created by raising existing features to a power or by multiplying multiple features together, in order to capture nonlinear relationships between features.

Polynomial Regression:

A type of regression that involves fitting a polynomial function to a set of data points in order to make predictions.

Pooling Layer:

A layer in a CNN that reduces the dimensions of the input by aggregating nearby values.

Precision-Recall Curve:

A graphical representation of the trade-off between precision and recall for different threshold values in a binary classification task.

Precision:

A measure of the proportion of true positive predictions made by a machine learning model, among all positive predictions.

Preprocessing:

The process of cleaning and enhancing an image or document before performing OCR, in order to improve the accuracy of text recognition.

Pretrained Model:

A machine learning model that has been trained on a large dataset for a specific task, and can be used as a starting point for transfer learning.

Principal Component Analysis (PCA):

A type of dimensionality reduction technique that involves transforming the input features into a lower-dimensional space that captures the most important variation in the data.

Q

Q-Learning:

A commonly used algorithm for reinforcement learning that involves learning an action-value function by iteratively updating estimates of the expected cumulative reward for each action in each state.

R

R-squared:

A measure of how well a regression model fits the data, ranging from 0 to 1.

Random Forest:

A specific type of ensemble model that uses bagging to train multiple decision trees on different subsets of the training data, and combines their predictions by taking the majority vote.

Random Search:

A hyperparameter tuning technique that involves randomly sampling from a pre-defined distribution of hyperparameters and evaluating the performance of the model for each sample.

Recall:

A measure of the fraction of correctly identified positive instances out of all actual positive instances.

Receiver Operating Characteristic (ROC) Curve:

A graphical representation of the trade-off between true positive rate and false positive rate for different threshold values in a binary classification task.

Recurrent Layer:

A layer in an RNN that contains loops to allow information to persist across time-steps.

Recurrent Neural Network (RNN):

A type of neural network that is commonly used for language generation tasks, as it is able to capture the sequential structure of language.

Recursive Feature Elimination (RFE):

A type of wrapper method that recursively eliminates the least relevant features from a model and evaluates the resulting performance, in order to select the most important features.

Region Proposal:

A method for generating candidate regions in an image that may contain objects.

Regression:

A type of machine learning task in which the goal is to predict a continuous value or numerical output, such as a price or a quantity.

Regularization:

A technique used to prevent overfitting in regression models, by adding a penalty term to the loss function that encourages simpler or more generalizable models.

Reinforcement Learning:

A type of machine learning task in which an agent learns to take actions in an environment in order to maximize a reward signal.

ReLU (Rectified Linear Unit):

A popular activation function used in the hidden layers of neural networks, which sets the output to zero for negative input values and keeps the output linear for positive input values.

Reward:

The feedback signal that an agent receives for its actions, which can be positive or negative and can influence the agent’s future behavior.

RGB (Red, Green, Blue):

A color model used in digital image processing, where each color is represented as a combination of three primary colors.

Rule-Based Approach:

A technique for performing NER that relies on a set of predefined rules or patterns to identify named entities in text.

S

Semantic Segmentation:

A type of image segmentation that assigns a semantic label to each pixel in an image, such as “road”, “sky”, or “person”.

Semi-Supervised Anomaly Detection:

A type of anomaly detection that uses labeled data to train a model to identify normal behavior, and then uses unsupervised methods to identify anomalies that deviate significantly from the normal behavior.

Sentiment Analysis:

The process of using machine learning techniques to identify the sentiment or emotional tone of a piece of text, such as a review or social media post.

Sentiment:

The emotional tone or attitude expressed in a piece of text, such as positive, negative, or neutral.

Sequence Generation:

A type of text generation that involves generating a sequence of words or tokens based on a starting point or prompt.

SHAP:

A method for global and local model interpretability that involves computing Shapley values, which represent the contribution of each input feature to the difference between the model’s prediction for a given instance and the average prediction across all instances.

Siamese Network:

A type of neural network that is commonly used for face recognition, as it is able to learn similarities and differences between pairs of faces.

Sigmoid:

A common activation function used in the output layer of binary classification tasks, which maps the output to a probability between 0 and 1.

Silhouette Score:

A measure of the similarity of a data point to its own cluster compared to other clusters, which can be used to evaluate the quality of a clustering.

Single Shot Detector (SSD):

A type of object detection model that uses a single network to predict object class and location.

Softmax:

An activation function used in the output layer of multi-class classification tasks, which maps the output to a probability distribution over the classes.

Stacking:

A technique used for ensemble learning that involves training multiple models and using their predictions as input features for a higher-level model, which learns how to combine them to make the final prediction.

Standardization:

A type of feature scaling that rescales input features to have zero mean and unit variance, in order to improve the performance of machine learning algorithms that rely on Gaussian distributions.

State:

The current condition or situation of an environment, which can influence the rewards that an agent receives for its actions.

Stemming:

The process of reducing words to their root form, by removing suffixes and prefixes.

Stochastic Gradient Descent (SGD):

A variant of gradient descent that randomly samples a subset of the training data at each iteration to estimate the gradient, which can be faster and more memory-efficient for large datasets.

Stop Words:

Common words that are often removed from text during preprocessing, as they do not carry much meaning (e.g. “a”, “the”, “and”).

Stride:

The step size used to move the kernel across the input data during convolutional operations.

Subjectivity:

The degree to which a piece of text expresses a personal opinion or feeling, as opposed to objective information.

Supervised Anomaly Detection:

A type of anomaly detection that uses labeled data to train a model to identify anomalies based on known examples of normal and abnormal behavior.

Support Vector Machine (SVM):

A type of algorithm that is commonly used for binary and multi-class classification tasks, by identifying the decision boundary that maximizes the margin between different classes.

Swish:

A recently proposed activation function that combines the non-linearity of ReLU and the smoothness of sigmoid, which has been shown to improve the performance of neural networks on certain tasks.

T

Tagging:

The process of assigning a label to each word in a text document based on its part of speech or named entity type.

Tanh:

A common activation function used in the hidden layers of neural networks, which is similar to the sigmoid but maps the output to a range between -1 and 1.

Term Frequency-Inverse Document Frequency (TF-IDF):

A technique for weighting the importance of words in a document based on how frequently they appear in the document and how rare they are in the overall corpus.

Test Data:

The subset of the dataset used to evaluate the performance of a machine learning model, consisting of input-output pairs that the model has not seen during training.

Test Set:

The subset of data used to evaluate the final performance of a machine learning model, after it has been trained and validated.

Text Augmentation:

A type of data augmentation that involves applying transformations or perturbations to text data, such as synonym replacement, word insertion, or random deletion.

Text Generation:

A type of language generation that involves generating text from scratch, rather than filling in the blanks or completing a task.

Text Preprocessing:

The process of cleaning and transforming raw text data into a format that is suitable for machine learning models.

Text Representation:

The process of converting raw text data into a numerical format that can be used as input to machine learning models.

Tokenization:

The process of splitting text into individual tokens, such as words or phrases.

Training Data:

The subset of the dataset used to train a machine learning model, consisting of input-output pairs used to adjust the model’s parameters and optimize its performance.

Training Set:

The subset of data used to train a machine learning model.

Transfer Learning:

The process of leveraging knowledge gained from one machine learning task to improve performance on a different but related task.

Triplet Loss:

A loss function commonly used for training siamese networks or other similarity-based models, that encourages the model to learn representations that place similar inputs close together and dissimilar inputs far apart.

U

U-Net:

A specific type of encoder-decoder architecture that is commonly used for biomedical image segmentation.

Underfitting:

A phenomenon that occurs when a machine learning model is too simple or lacks expressiveness, resulting in poor performance on both the training and test data.

Univariate Selection:

A type of filter method that evaluates the relevance of each input feature independently, using statistical tests such as ANOVA or chi-squared tests.

Unsupervised Anomaly Detection:

A type of anomaly detection that does not require labeled data, and instead relies on identifying patterns of data that deviate significantly from the norm or expected behavior.

V

Validation Data:

The subset of the training data used to evaluate the performance of a machine learning model during training, to detect and prevent overfitting.

Validation Set:

The subset of data used to evaluate the performance of a machine learning model during the training process, and to tune hyperparameters.

Value Function:

A function that estimates the expected cumulative reward that an agent can achieve from a given state, and can be used to evaluate and improve the agent’s policy.

Vanishing Gradient Problem:

A problem that can occur during backpropagation in deep neural networks when the gradient becomes too small to be useful for adjusting the weights.

Video Augmentation:

A type of data augmentation that involves applying transformations or perturbations to video data, such as cropping, flipping, or adding noise.

W

Weight Decay:

A type of L2 regularization that adds a penalty term to the training objective of a machine learning algorithm, which encourages small weights and can be used to prevent overfitting.

Word Embedding:

A technique for representing words as dense vectors in a high-dimensional space, such that similar words are close together in the vector space.

Word2Vec:

A popular word embedding technique that is based on predicting the context in which words occur in text.

Wrapper Methods:

Feature selection methods that evaluate the performance of a machine learning algorithm using different subsets of input features, and select the subset that results in the best performance.

X

XGBoost:

A popular implementation of gradient boosting that uses a number of advanced techniques, such as regularization, parallel processing, and tree pruning, to improve performance and scalability.

Y

Z

Zero-Shot Learning:

A type of transfer learning that involves training a machine learning model to recognize new objects or patterns without any labeled examples, by leveraging knowledge about related concepts or attributes.