Learning to Reason: End-to-End Module Networks for Visual Question Answering Learning to Reason: End-to-End Module Networks for Visual Question Answering
Paper summary A modular neural architecture for visual question answering. A seq2seq component predicts the sequence of neural modules (eg find() and compare()) based on the textual question, which are then dynamically combined and trained end-to-end. Achieves good results on three separate benchmarks that focus on reasoning about the image. https://i.imgur.com/iOkSh8y.png
doi.ieeecomputersociety.org
sci-hub
scholar.google.com
Learning to Reason: End-to-End Module Networks for Visual Question Answering
Hu, Ronghang and Andreas, Jacob and Rohrbach, Marcus and Darrell, Trevor and Saenko, Kate
International Conference on Computer Vision - 2017 via Local Bibsonomy
Keywords: dblp


Summary by Marek Rei 10 months ago
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and