|
|
|
Since age 15 or so Prof. Jürgen Schmidhuber's
(IDSIA &
USI &
SUPSI &
TUM)
main scientific ambition has been to build an
optimal scientist,
then retire. In 2028 they will force him to retire anyway. By then he shall
be able to buy hardware providing more
raw computing power
than his brain. Will the proper self-improving software lag far
behind? If so he'd be surprised. This optimism is driving his
research on mathematically sound, general purpose universal
learning machines and
Artificial Intelligence,
in particular, the New AI
which is relevant not only for
robotics but also for
physics
and music.
| |
State-of- the-art Artificial Recurrent Neural Networks
(1989-2009).
Most work in machine learning focuses on machines
with reactive
behavior. RNNs, however, are more general sequence processors
inspired by human brains. They have adaptive
feedback connections and are
in principle as powerful as any computer.
Until recently, however, RNNs could not learn to look far
back into the past. But our novel RNN called "Long Short-Term
Memory" (LSTM) overcomes the fundamental problems of traditional
RNNs, and efficiently learns to solve many previously unlearnable tasks,
including: Recognition of certain context sensitive languages;
Reinforcement learning in partially observable worlds;
Metalearning of fast online learning algorithms;
Music composition;
Aspects of speech recognition; Time series prediction. For example,
our LSTM RNN outperform all other known methods on the difficult
problem of recognizing unsegmented cursive handwriting;
they won several recent handwriting competitions.
They learn through
gradient descent and / or
evolution.
Gödel machine:
An old dream of computer scientists is to build an optimally
efficient universal problem solver. The
Gödel machine
can be implemented on a traditional computer and solves
any given computational problem in an optimal fashion inspired by Kurt
Gödel's celebrated self-referential formulas (1931).
It starts with an axiomatic description of itself,
and we may plug in any utility function, such as the expected
future reward of a robot.
Using an efficient proof searcher,
the Gödel machine will rewrite any part of its software
(including the proof searcher)
as soon as it has found
a proof that this will improve its future performance,
given the utility function and the typically limited computational resources.
Self-rewrites are globally optimal (no local maxima!) since provably none
of all the alternative rewrites and proofs (those that could be found by
continuing the proof search) are worth waiting for.
Summary. FAQ.
Optimal Ordered Problem Solver.
OOPS solves one task after another, through search for
solution- computing programs. The incremental method optimally
exploits solutions to earlier tasks when possible - compare principles
of Levin's optimal universal search.
OOPS can temporarily rewrite its own search procedure, efficiently
searching for faster search methods (metasearching or
metalearning).
It is applicable to problems of optimization or prediction.
Talk slides.
Super Omegas and Generalized Kolmogorov Complexity and
Algorithmic Probability.
Kolmogorov's (left) complexity K(x) of a bitstring x is the length of the
shortest program that computes x and halts. Solomonoff's
algorithmic probability of x is the probability of guessing
a program for x. Chaitin's Omega is the halting probability
of a Turing machine with random input (Omega is known as
the "number of wisdom" because it compactly encodes all mathematical truth).
Schmidhuber generalized
all of this
to non-halting but converging programs. This led to
the shortest possible formal descriptions and to non-enumerable but limit-computable
measures and Super Omegas, and even has consequences for computable universes and
optimal inductive inference. Slides.
Universal Learning Algorithms.
There is a theoretically optimal way of
predicting the future, given the past.
It can be used to define an optimal (though
noncomputable)
rational agent that maximizes
its expected reward in almost arbitrary environments sampled
from computable probability distributions.
This work represents the first mathematically
sound theory of universal artificial intelligence - most
previous work on AI was either heuristic or very limited.
Speed Prior.
Occam's Razor: prefer simple solutions to complex ones.
But what exactly does "simple" mean? According to tradition something
is simple if it has a short description or program, that is,
it has low Kolmogorov complexity.
This leads to Solomonoff's & Levin's miraculous
probability measure which yields optimal though noncomputable predictions,
given past observations. The Speed Prior
is different though: it is a new simplicity measure based on
the fastest way of describing objects, not the shortest.
Unlike the traditional one, it leads to near-optimal computable predictions,
and provokes unusual prophecies concerning the future of our universe.
Talk slides.
In the Beginning was the Code.
In 1996 Schmidhuber wrote the first paper
about all possible computable universes. His
`Great Programmer'
is consistent with Zuse's
thesis (1967) of computable physics, against which there is no
physical evidence, contrary to common belief. If everything is computable, then
which exactly is our universe's program? It turns out that the simplest program
computes all universes,
not just ours. Later work (2000) on
Algorithmic Theories of Everything analyzed all the
universes with limit-computable probabilities as well as the very
limits of formal describability. This paper led to above-mentioned
generalizations of algorithmic information and
probability and Super Omegas as well as the
Speed Prior. The searchable
"everything" mailing list archive
contains numerous discussions of this and related work.
See also comments on Wolfram's 2002 book
and letter
on randomness in physics (Nature 439, 2006).
Talk slides.
Learning Robots.
Some hardwired robots achieve impressive feats.
But they do not learn like babies do.
Unfortunately, traditional
reinforcement learning algorithms
are limited to simple reactive behavior and do not
work well for
realistic robots.
Hence robot learning requires novel methods for
learning to identify important past events and memorize them until needed.
Schmidhuber's group is focusing on the above-mentioned
recurrent neural networks,
RNN evolution,
the Optimal Ordered Problem Solver,
and Policy Gradients.
They are collaborating with CSEM
on attentive sensing and hierarchical control,
with UniBW on robot cars, and with TUM-AM on
humanoids learning to walk.
New IDSIA projects
on developmental robotics with curious adaptive humanoids
and DLR's artificial hands
are starting in 2009.
| |
Artificial Evolution.
State-of-the-art methods for network evolution
co-evolve all
neurons in parallel (excellent results in various
applications).
EVOLINO
outperforms previous methods on several
supervised learning tasks, and yields
the first recurrent support vector machines.
Probabilistic
incremental program evolution evolves
computer programs through probabilistic templates instead
of program populations (first approach to evolving entire
soccer team strategies from scratch).
As an undergrad Schmidhuber also implemented the first
genetic programming system with
loops and variable length code (1987, see below).
Interestingness & Active Exploration & Artificial Curiosity & Theory of Surprise
(1990-2009).
Schmidhuber's
curious learning agents
like to go where they
expect to learn
something. They are driven by intrinsic motivation,
losing interest in both predictable and unpredictable things.
A basis for much of the recent work in Developmental Robotics since 2004.
According to Schmidhuber's formal theory of creativity,
art and science and humor are just
by-products of the desire to create / discover more data that is
predictable or compressible in hitherto unknown ways!
Learning attentive vision.
Humans and other biological systems use sequential gaze shifts for
pattern recognition. This can be much more efficient than
fully parallel approaches to vision. In 1990 we built an
artificial fovea controlled by an adaptive neural controller. Without
a teacher, it learns to find targets
in a visual scene, and to track moving targets.
Reinforcement Learning
in partially observable worlds.
Just like humans, reinforcement learners are supposed to
maximize expected pleasure and
minimize expected pain. Most traditional work is limited to
reactive mappings from sensory inputs to actions.
Our approaches (1989-2003) for
partially observable environments
are more general: they learn how to use memory and internal states,
sometimes through evolution of RNN.
The first universal reinforcement learner
is optimal if we ignore computation time,
and here
is one that is optimal if we don't.
Non- linear ICA.
Pattern recognition works better on non- redundant
data with independent components. Schmidhuber's
Predictability Minimization
(1992) was the first non-linear neural algorithm for learning to encode
redundant inputs in this way. It is based on co-evolution of predictors and
feature detectors
that fight each other: the detectors try to extract features
that make them unpredictable. His neural
history compressors
(1991) compactly encode sequential data. And
Lococode unifies regularization
and unsupervised learning.
Metalearning Machines / Learning to Learn / Self- Improvement.
Can we construct metalearning algorithms that learn better
learning algorithms? This question has been a main drive of
Schmidhuber's research since his 1987
diploma thesis.
In 1993 he introduced
self-referential weight matrices, and in
1994 self-modifying policies trained by the
"success- story algorithm"
(talk slides). His first bias-optimal metalearner
was the above-mentioned
Optimal Ordered Problem Solver (2002),
and the ultimate metalearner is the
Gödel machine (2003).
Financial Forecasting.
Our most lucrative neural network application
employs a second-order method
for finding the simplest model of stock market training data.
Automatic Subgoal Generators and Hierarchical Learning.
There is no teacher providing useful intermediate subgoals
for our reinforcement learning systems. In the early 1990s
Schmidhuber introduced
gradient-based
(pictures)
adaptive subgoal generators; later also
discrete ones.
Program Evolution and Genetic Programming.
As an undergrad Schmidhuber used Genetic Algorithms
to evolve
computer
programs on a Symbolics LISP machine at
SIEMENS AG.
Two years later this was still novel: In 1987 he published
world's 2nd paper
on "Genetic Programming" (the first was Cramer's in 1985)
and the first paper on Meta- Genetic
Programming.
Learning Economies
with Credit Conservation.
In the late 1980s Schmidhuber developed the first credit- conserving
reinforcement learning system based on market
principles, and also the
first neural one.
Fast weights instead of recurrent nets.
A slowly changing feedforward neural net learns to quickly
manipulate short- term memory
in quickly changing synapses of
another net. More fast weights.
Evolution of fast weight control.
Neural Heat Exchanger.
Like a physical heat exchanger,
but with neurons instead of liquid.
Perceptions warm up, expectations cool down.
Complexity- Based Theory of Beauty.
In 1997 Schmidhuber claimed: among several patterns classified as "comparable"
by some subjective observer, the subjectively most beautiful
is the one with the simplest description, given the
observer's particular method for encoding and memorizing it.
Exemplary applications include
low- complexity faces
and Low- Complexity Art,
the computer- age equivalent of minimal art (Leonardo, 1997).
A low- complexity artwork both
`looks right' and is computable
by a short program; a typical observer should be
able to see its simplicity.
Artificial Ants.
IDSIA's Artificial Ant Algorithms are multiagent
optimizers that use local search techniques
and communicate via artificial
pheromones that evaporate over time. They broke several important
benchmark world records. This work got numerous
reviews in journals such as
Nature, Science, Scientific American, TIME,
NY Times, Spiegel, etc. It led to an IDSIA
spin-off company called
ANTOPTIMA.
| |
|
Computer history speedup.
Schmidhuber's law: each new breakthrough
comes twice as fast - Omega point around 2040.
Is history converging? Again? (2006)
Highlights of robot car history
Resilient, self-modeling machines discussed
here (Science 316, p 688),
First Pow(d)ered Flight (Nature 421 p 689),
Colossus (Nature 441 p 25),
Telephone (Science 319, p. 1759).
Brave New World of Scientific Publishing (2001)
Men who left their mark:
Einstein (general relativity, 1915),
Zuse (first computer, 1935-41),
Goedel (limits of math and computation, 1931),
Turing (Turing machine, 1936: Nature
429, p 501),
Gauss (mathematician of the millennium),
Leibniz (inventor of the bit),
Schickard (father of the computer age),
Darwin (Nature vol 452, p 530),
Haber & Bosch (Haber-Bosch process, 1913:
most influential invention of the 20th century),
Archimedes (greatest scientist ever?)
| |
|
|