Pavan Ravishankar's profile - ShortScience.org

arxiv.org
scholar.google.com

Deep Learning: A Critical Appraisal
Gary Marcus
arXiv e-Print archive - 2018 via Local arXiv
Keywords: cs.AI, cs.LG, stat.ML, 97R40, I.2.0; I.2.6
more

[link] Summary by Pavan Ravishankar 5 years ago

Deep Learning has a number of shortcomings.

(1)Requires lot of data: Humans can learn abstract concepts with far less training data compared to current deep learning. E.g. If we are told who an “Adult” is, we can answer questions like how many adults are there in home?, Is he an adult? etc. without much data. Convolution networks can solve translational invariance but requires lot more data to identify other translations or more filters or different architectures.

(2)Lack of transfer: Most of claims of Deep RL helping in transfer is ambiguous. Consider Deepmind claim of concept learning in Breakout such as digging a tunnel through a wall which was soon proved false by Vicarious experiments that added wall in middle and increased Y coordinate of paddle. Current attempt of transfer is based on correlations between trained sequences and test scenario, which is bound to fail when current scenario is tweaked.

(3)Hierarchical structure not learnt: Deep learning learns correlations which are non-hierarchical in nature. So sentences like “Salman Khan, who was excellent driver, died in a car accident” can never be represented as major clause(Salman Khan) and minor clause(who was excellent driver) format. Subtleties like these cannot be captured by RNN even though hierarchical RNN tries to capture obvious hierarchies like (letters -> words -> sentences). If hierarchies were captured in Deep RL, transfer would have been easy in Breakout which is not the case.

(4)Poor inference in language: Sentences that have subtle differences like “John promised Mary to leave” and “John promised to leave Mary” are treated as same by deep learning. This causes major problems during inferencing because questions related to combining various sentences fail.

(5)Not transparent: Why the neural network made the decision in a certain way can help in debuggability and prove to be beneficial in medical diagnosis systems where it is critical to reason out methodology.

(6)No priors and commonsense reasoning: Humans function with commonsense reasoning(If A is dad of B, A is elder to B) and priors(physics laws). Deep Learning does not tailor to incorporate this. With heavy interest in end to end learning from raw data, such attempts have been discouraged.

(7)Deep Learning is correlation not causation: Causality or analogical reasoning or any abstract concepts of left brain is not dealt by deep learning. (8)Lacks generalization outside training distribution: Fails to incorporate scenario in which nature of data is varying. E.g. Stock prediction. (9)Easily fooled: E.g. Parking signs mistaken for refrigerators, turtle mistaken as rifle.

This can be addressed by:
(1)Unsupervised learning: Build systems that can set their own goals, use abstract knowledge(priors, affordances as objects can be used in any way etc) and solve problem at high level(like symbolic AI).

(2)Symbolic AI - Deep Learning does what primary sensory cortex does of taking raw inputs and converting it into low level representation. Symbolic AI builds abstract concepts like causal, analogical reasoning which is what prefrontal Cortex does. Humans make decisions based on these abstract concepts.

arxiv.org
arxiv-vanity.com
scholar.google.com

Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution
Judea Pearl
arXiv e-Print archive - 2018 via Local arXiv
Keywords: cs.LG, cs.AI, stat.ML
more

[link] Summary by Pavan Ravishankar 5 years ago

Paper overviews importance of Causality in AI and highlights important aspects of it. Current state of AI deals with only association/curve fitting of data without need of a model. But this is far from human-like intelligence who have a mental representation that is manipulated from time-to-time using data and queried with What If? questions. To incorporate this, one needs to add two more layers on top of curve fitting module which are interventions(What if I do this?) and counterfactuals(What if I had done this?). Interventions are represented by P(y|do(x)) where do(x) is action 'x' performed leading to change in behavior of certain variables, thereby making previous data useless for its estimation. Counterfactuals are represented by P(y(x)|x',y') where x',y' are observed and goal is to determine probability of y given x. Pearl suggests use of Structural Causal Models(SCM) for interventions and counterfactuals. SCM takes a query(association, intervention or counterfactual) and graphical model(based on assumptions) to build a estimand(mathematical recipe). Estimand takes data and produces an estimate(answer) with confidence. Assumptions are fine tuned based on data. There are lot of advantages provided by Causal Models - (1)Graphical models make it easier to read the assumptions, thereby providing transparency. It also makes it easier to verify all dependencies encoded in data with the help of d-separation, thereby providing testability (2)Causal models help in mediation analysis that identify mechanisms that change cause to effect for explainability (3)Current transfer learning approaches are tried at association level but it cannot identify mechanisms that are affected by changes (4)Causality provides tools to recover causal relationships when data has missing attributes unlike statistical analysis that provide tools only when values are missing at random i.e. independent of other variables.

arxiv.org
scholar.google.com

Towards Deep Symbolic Reinforcement Learning
Marta Garnelo and Kai Arulkumaran and Murray Shanahan
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.AI, cs.LG
more

[link] Summary by Pavan Ravishankar 5 years ago

DRL has lot of disadvantages like large data requirement, slow learning, difficult interpretation, difficult transfer, no causality, analogical reasoning done at a statistical level not at a abstract level etc. This can be overcome by adding a symbolic front end on top of DL layer before feeding it to RL agent. Symbolic front end gives advantage of smaller state space generalization, flexible predicate length and easier combination of predicate expressions. DL avoids manual creation of features unlike symbolic reasoning. Hence DL along with symbolic reasoning might be the way to progress for AGI. State space reduction in symbolic reasoning is carried out by using object interactions(object positions and object types) for state representation. Although certain assumptions are made in the process such as objects of same type behave similarly etc, one can better understand causal relations in terms of actions, object interactions and reward by using symbolic reasoning.

Broadly, pipeline consists of (1)CNN layer - Raw pixels to representation (2)Salient pixel identification - Pixels that have activations in CNN above a certain threshold (3)Identify objects of similar kind by using activation spectra of salient pixels (4)Identify similar objects in consecutive time steps to track object motion using spatial closeness(as objects can move only by a small distance in consecutive frames) and similar neighbors(different type of objects can be placed close to each other and spatial closeness alone cannot identify similar objects) (4)Building symbolic interactions by using relative object positions for all pairs of objects located within a certain maximal distance. Relative object position is necessary to capture object dynamics. Maximal distance threshold is required to make the learning quicker eventhough it may reach a locally optimal policy (4)RL agent uses object interactions as states in Q-Learning update. Instead of using all object interactions in a frame as one state, number of states are further reduced by considering interactions between two types to be independent of other types and doing a Q-Learning update separately for each type pair. Intuitive explanation for doing so is to look at a frame as a set of independent object type interactions. Action choice at a state is then the one that maximizes sum of Q values across all type pairs.

Results claim that using DRL with symbolic reasoning, transfer in policies can be observed by first training on evenly spaced grid world and using it for randomly spaced grid world with a performance close to 70% contrary to DQN that achieves 50% even after training for 1000 epochs with epoch length of 100.

Pavan Ravishankar

sciscore: 3