| Computational Intelligence - Learning with Neural Methods on Structured Data |
|
Neural Gas is an unsupervised learning algorithm which provides
a density based prototype representation of given data. The key idea is to calculate a proximity ranking of all prototypes to a given training pattern, and to adopt the prototype locations into the direction of this pattern by steps proportional to an amount of exponential rank: the closest prototype is adapted most down to the farthest which is changed least. This update is performed many times for all training patterns. Unlike the SOM above, there is no topology restriction, like a reduction to a two-dimensional grid.
A 2D toy example is demonstrated by the animation on the left, for which the
3841 letter pixels are approximated by 50 prototypes (some animation
frames are skipped).
But what, if data live in non-Euclidean space, thus proximity has
to be defined differently. Or what, if data have a discrete
structure without intrinsic natural order, like nominal symbol
sequences?
My approach is to instead of representing the data to learn
the data context: convert symbols into a unary code, which can
be interpreted as the stimulation of specialized neurons. Now
classify this activity in the presence of previously trained
internal activations of the neurons, which is a recursive evaluation
scheme: for each external stimulus exists an however good or
bad ranking of internal neural activity. This activity is then
taken as context to be again classified by the neurons, leading to
a new internal activation to be classified, and so on.
One benefit is a conversion of data structure into an
activation space like [0;1]N to which the Euclidean
norm or more sophisticated measures like the Kullback Leibler
divergence can be applied. Another benefit relates to the kind
of data representation: prototypes cannot take impossible intermediate
interpolation states between the training data, because not the
structure itself but its elements' activation contexts are learned. Questions of interests are: What's a good proportion of external and internal activations? Which complexity of sequences can be represented? How deep has recursion to be carried out? And there are many more, like efficient coding of large nets and reduction of runtime complexity. |