* Supervised semantic parsers
* First must map questiosn into logical forms and this requires data with manually labeled semantic forms
* all we really care about is resulting denotation for a given input, so are free to choose how we represent logical forms
* introduce new semantic representation: dependency-based compositional semantics
* represent logical forms as DCS trees where nodes represent predicates (State, Country, Genus, ...) and edges represent relations
* such a form allows for a transparency between syntactics and semantics and hence a streamlined framework for program induction
* denotation at root node
* trees mirror syntactic dependency structure, facilitating parsing but also enable efficient computation of denotations defined on a given tree
* to handle divergence between syntactic and semantic scope in some more complicated expressions, mark nodes low in tree with *mark* relation (E, Q, or C) and then invoke it higher up with *execute* relation to create desired semantic scope
* discriminative semantic parsing model placing a log-linear distribution over the set of permissible DCS trees given an utterance
Read this because it was cited by Zhang et al. 2017 and the title looked interesting. The setting is machine translation where you have pairs in one domain (europarl) but need to do translation in another (biomed). They quantify how much of the performance loss is due to vocabulary differences in the two domains. First, an oracle is created which using both domains to train. Second, an OOV oracle is created that removes words that their mining approach could not possibly find, to see what the essential limit of their approach is.
Their approach, then, uses non-parallel domain texts to create word similarity matrices for new terms. They compute a context based similarity matrix first. This involves creating feature vectors for each word based on the contexts in which it appears, and then computing similarity between all word pairs. Then they create an orthography-based similarity matrix using character n-grams within each word as a feature vector and computing similarity between all word pairs. They sum these matrices to get similarity matrix. They build a bipartite graph between existing word pairs where word in source language is connected to its translation words with edge weighted by unigram translation probability. This graph is reduced to single pairs (one edge between word pairs) with the Hungarian algorithm, and they use CCA to estimate the projection using these training pairs (one projection for each language). These projections are then applied to _all_ words to get new word representations. They then explore a few ways to integrate these new scores with existing MT systems, and find that they don't get improvement just by calling them scores, they also need to add features indicating when they are real scores and when they are mined scores.