Unsupervised, Semi-Supervised, Supervised Learning

Semi-Supervised:

In computer science, semi-supervised learning is a class of machine learning techniques that make use of both labeled and unlabeled data for training - typically a small amount of labeled data with a large amount of unlabeled data. Semi-supervised learning falls between unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data). Many machine-learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce considerable improvement in learning accuracy. The acquisition of labeled data for a learning problem often requires a skilled human agent to manually classify training examples. The cost associated with the labeling process thus may render a fully labeled training set infeasible, whereas acquisition of unlabeled data is relatively inexpensive. In such situations, semi-supervised learning can be of great practical value.

One example of a semi-supervised learning technique is co-training, in which two or possibly more learners are each trained on a set of examples, but with each learner using a different, and ideally independent, set of features for each example.

An alternative approach is to model the joint probability distribution of the features and the labels. For the unlabelled data the labels can then be treated as 'missing data'. It is common to use the EM algorithm to maximise the likelihood of the model.

Supervised:

Supervised learning can generate models of two types. Most commonly, supervised learning generates a global model that maps input objects to desired outputs. In some cases, however, the map is implemented as a set of local models (such as in case-based reasoning or the nearest neighbor algorithm).

In order to solve a given problem of supervised learning (e.g. learning to recognize handwriting) one has to consider various steps:

  1. Determine the type of training examples. Before doing anything else, the engineer should decide what kind of data is to be used as an example. For instance, this might be a single handwritten character, an entire handwritten word, or an entire line of handwriting.
  2. Gathering a training set. The training set needs to be characteristic of the real-world use of the function. Thus, a set of input objects is gathered and corresponding outputs are also gathered, either from human experts or from measurements.
  3. Determine the input feature representation of the learned function. The accuracy of the learned function depends strongly on how the input object is represented. Typically, the input object is transformed into a feature vector, which contains a number of features that are descriptive of the object. The number of features should not be too large, because of the curse of dimensionality; but should be large enough to accurately predict the output.
  4. Determine the structure of the learned function and corresponding learning algorithm. For example, the engineer may choose to use artificial neural networks or decision trees.
  5. Complete the design. The engineer then runs the learning algorithm on the gathered training set. Parameters of the learning algorithm may be adjusted by optimizing performance on a subset (called a validation set) of the training set, or via cross-validation. After parameter adjustment and learning, the performance of the algorithm may be measured on a test set that is separate from the training set.

Another term for supervised learning is classification. A wide range of classifiers are available, each with its strengths and weaknesses. Classifier performance depend greatly on the characteristics of the data to be classified. There is no single classifier that works best on all given problems, this is also referred to as the 'No free lunch theorem'. Various empirical tests have been performed to compare classifier performance and to find the characteristics of data that determine classifier performance. Determining a suitable classifier for a given problem is however still more an art than a science.

The most widely used classifiers are the Neural Network (Multi-layer Perceptron), Support Vector Machines, k-Nearest Neighbors, Gaussian Mixture Model, Gaussian, Naive Bayes, Decision Tree and RBF classifiers.

Empirical risk minimization

The goal of supervised learning of a global model is to find a function g, given a set of points of the form (x, g(x)).

It is assumed that the set of points for which the behavior of g is known is an independent and identically-distributed random variables sample drawn according to an unknown probability distribution p of a larger, possibly infinite, population. Furthermore, one assumes the existence of a task-specific loss function L of type

L: Y/times Y /to /Bbb{R}^+

where Y is the codomain of g and L maps into the nonnegative real numbers (further restrictions may be placed on L). The quantity L(z, y) is the loss incurred by predicting z as the value of g at a given point when the true value is y.

The risk associated with a function f is then defined as the expectation of the loss function, as follows:

R(f) = /sum_i L(f(x_i), g(x_i)) /; p(x_i)

if the probability distribution p is discrete (the analogous continuous case employs a definite integral and a probability density function).

The goal is now to find a function f* among a fixed subclass of functions for which the risk R(f*) is minimal.

However, since the behavior of g is generally only known for a finite set of points (x1y1), ..., (xnyn), one can only approximate the true risk, for example with the empirical risk:

/tilde{R}_n(f) = /frac{1}{n} /sum_{i=1}^n L(f(x_i), y_i)

Selecting the function f* that minimizes the empirical risk is known as the principle of empirical risk minimization. Statistical learning theory investigates under what conditions empirical risk minimization is admissible and how good the approximations can be expected to be.

Active Learning

There are situations in which unlabeled data is abundant but labeling data is expensive. In such a scenario the learning algorithm can actively query the user/teacher for labels. This type of supervised learning is called active learning. Since the learner chooses the examples, the number of examples to learn a concept can often be much lower than the number required in normal supervised learning. With this approach there is a risk that the algorithm might focus on unimportant or even invalid examples 

Unsupervised:

Unsupervised learning is a type of machine learning where manual labels of inputs are not used. It is distinguished from supervised learning approaches which learn how to perform a task, such as classification or regression, using a set of human prepared examples.

One form of unsupervised learning is clustering, which is sometimes not probabilistic. Adaptive resonance theory (ART) allows the number of clusters to vary with problem size and lets the user control the degree of similarity between members of the same clusters by means of a user-defined constant called the vigilance parameter. ART networks are also used for many pattern recognition tasks, such as automatic target recognition and seismic signal processing. The first version of ART was "ART1", developed by Carpenter and Grossberg(1988).

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章