First published: 2018/01/02 (2 years ago) Abstract: Although deep learning has historical roots going back decades, neither the
term "deep learning" nor the approach was popular just over five years ago,
when the field was reignited by papers such as Krizhevsky, Sutskever and
Hinton's now classic (2012) deep network model of Imagenet. What has the field
discovered in the five subsequent years? Against a background of considerable
progress in areas such as speech recognition, image recognition, and game
playing, and considerable enthusiasm in the popular press, I present ten
concerns for deep learning, and suggest that deep learning must be supplemented
by other techniques if we are to reach artificial general intelligence.
Deep Learning has a number of shortcomings.
(1)Requires lot of data: Humans can learn abstract concepts with far less training data compared to current deep learning. E.g. If we are told who an “Adult” is, we can answer questions like how many adults are there in home?, Is he an adult? etc. without much data. Convolution networks can solve translational invariance but requires lot more data to identify other translations or more filters or different architectures.
(2)Lack of transfer: Most of claims of Deep RL helping in transfer is ambiguous. Consider Deepmind claim of concept learning in Breakout such as digging a tunnel through a wall which was soon proved false by Vicarious experiments that added wall in middle and increased Y coordinate of paddle. Current attempt of transfer is based on correlations between trained sequences and test scenario, which is bound to fail when current scenario is tweaked.
(3)Hierarchical structure not learnt: Deep learning learns correlations which are non-hierarchical in nature. So sentences like “Salman Khan, who was excellent driver, died in a car accident” can never be represented as major clause(Salman Khan) and minor clause(who was excellent driver) format. Subtleties like these cannot be captured by RNN even though hierarchical RNN tries to capture obvious hierarchies like (letters -> words -> sentences). If hierarchies were captured in Deep RL, transfer would have been easy in Breakout which is not the case.
(4)Poor inference in language: Sentences that have subtle differences like “John promised Mary to leave” and “John promised to leave Mary” are treated as same by deep learning. This causes major problems during inferencing because questions related to combining various sentences fail.
(5)Not transparent: Why the neural network made the decision in a certain way can help in debuggability and prove to be beneficial in medical diagnosis systems where it is critical to reason out methodology.
(6)No priors and commonsense reasoning: Humans function with commonsense reasoning(If A is dad of B, A is elder to B) and priors(physics laws). Deep Learning does not tailor to incorporate this. With heavy interest in end to end learning from raw data, such attempts have been discouraged.
(7)Deep Learning is correlation not causation: Causality or analogical reasoning or any abstract concepts of left brain is not dealt by deep learning. (8)Lacks generalization outside training distribution: Fails to incorporate scenario in which nature of data is varying. E.g. Stock prediction. (9)Easily fooled: E.g. Parking signs mistaken for refrigerators, turtle mistaken as rifle.
This can be addressed by:
(1)Unsupervised learning: Build systems that can set their own goals, use abstract knowledge(priors, affordances as objects can be used in any way etc) and solve problem at high level(like symbolic AI).
(2)Symbolic AI - Deep Learning does what primary sensory cortex does of taking raw inputs and converting it into low level representation. Symbolic AI builds abstract concepts like causal, analogical reasoning which is what prefrontal Cortex does. Humans make decisions based on these abstract concepts.
First published: 2018/01/11 (2 years ago) Abstract: Current machine learning systems operate, almost exclusively, in a
statistical, or model-free mode, which entails severe theoretical limits on
their power and performance. Such systems cannot reason about interventions and
retrospection and, therefore, cannot serve as the basis for strong AI. To
achieve human level intelligence, learning machines need the guidance of a
model of reality, similar to the ones used in causal inference tasks. To
demonstrate the essential role of such models, I will present a summary of
seven tasks which are beyond reach of current machine learning systems and
which have been accomplished using the tools of causal modeling.
Paper overviews importance of Causality in AI and highlights important aspects of it. Current state of AI deals with only association/curve fitting of data without need of a model. But this is far from human-like intelligence who have a mental representation that is manipulated from time-to-time using data and queried with What If? questions. To incorporate this, one needs to add two more layers on top of curve fitting module which are interventions(What if I do this?) and counterfactuals(What if I had done this?). Interventions are represented by P(y|do(x)) where do(x) is action 'x' performed leading to change in behavior of certain variables, thereby making previous data useless for its estimation. Counterfactuals are represented by P(y(x)|x',y') where x',y' are observed and goal is to determine probability of y given x. Pearl suggests use of Structural Causal Models(SCM) for interventions and counterfactuals. SCM takes a query(association, intervention or counterfactual) and graphical model(based on assumptions) to build a estimand(mathematical recipe). Estimand takes data and produces an estimate(answer) with confidence. Assumptions are fine tuned based on data. There are lot of advantages provided by Causal Models - (1)Graphical models make it easier to read the assumptions, thereby providing transparency. It also makes it easier to verify all dependencies encoded in data with the help of d-separation, thereby providing testability (2)Causal models help in mediation analysis that identify mechanisms that change cause to effect for explainability (3)Current transfer learning approaches are tried at association level but it cannot identify mechanisms that are affected by changes (4)Causality provides tools to recover causal relationships when data has missing attributes unlike statistical analysis that provide tools only when values are missing at random i.e. independent of other variables.
First published: 2016/09/18 (3 years ago) Abstract: Deep reinforcement learning (DRL) brings the power of deep neural networks to
bear on the generic task of trial-and-error learning, and its effectiveness has
been convincingly demonstrated on tasks such as Atari video games and the game
of Go. However, contemporary DRL systems inherit a number of shortcomings from
the current generation of deep learning techniques. For example, they require
very large datasets to work effectively, entailing that they are slow to learn
even when such datasets are available. Moreover, they lack the ability to
reason on an abstract level, which makes it difficult to implement high-level
cognitive functions such as transfer learning, analogical reasoning, and
hypothesis-based reasoning. Finally, their operation is largely opaque to
humans, rendering them unsuitable for domains in which verifiability is
important. In this paper, we propose an end-to-end reinforcement learning
architecture comprising a neural back end and a symbolic front end with the
potential to overcome each of these shortcomings. As proof-of-concept, we
present a preliminary implementation of the architecture and apply it to
several variants of a simple video game. We show that the resulting system --
though just a prototype -- learns effectively, and, by acquiring a set of
symbolic rules that are easily comprehensible to humans, dramatically
outperforms a conventional, fully neural DRL system on a stochastic variant of
DRL has lot of disadvantages like large data requirement, slow learning, difficult interpretation, difficult transfer, no causality, analogical reasoning done at a statistical level not at a abstract level etc. This can be overcome by adding a symbolic front end on top of DL layer before feeding it to RL agent. Symbolic front end gives advantage of smaller state space generalization, flexible predicate length and easier combination of predicate expressions. DL avoids manual creation of features unlike symbolic reasoning. Hence DL along with symbolic reasoning might be the way to progress for AGI. State space reduction in symbolic reasoning is carried out by using object interactions(object positions and object types) for state representation. Although certain assumptions are made in the process such as objects of same type behave similarly etc, one can better understand causal relations in terms of actions, object interactions and reward by using symbolic reasoning.
Broadly, pipeline consists of (1)CNN layer - Raw pixels to representation (2)Salient pixel identification - Pixels that have activations in CNN above a certain threshold (3)Identify objects of similar kind by using activation spectra of salient pixels (4)Identify similar objects in consecutive time steps to track object motion using spatial closeness(as objects can move only by a small distance in consecutive frames) and similar neighbors(different type of objects can be placed close to each other and spatial closeness alone cannot identify similar objects) (4)Building symbolic interactions by using relative object positions for all pairs of objects located within a certain maximal distance. Relative object position is necessary to capture object dynamics. Maximal distance threshold is required to make the learning quicker eventhough it may reach a locally optimal policy (4)RL agent uses object interactions as states in Q-Learning update. Instead of using all object interactions in a frame as one state, number of states are further reduced by considering interactions between two types to be independent of other types and doing a Q-Learning update separately for each type pair. Intuitive explanation for doing so is to look at a frame as a set of independent object type interactions. Action choice at a state is then the one that maximizes sum of Q values across all type pairs.
Results claim that using DRL with symbolic reasoning, transfer in policies can be observed by first training on evenly spaced grid world and using it for randomly spaced grid world with a performance close to 70% contrary to DQN that achieves 50% even after training for 1000 epochs with epoch length of 100.