[link]
The paper written by an international collaboration between UPenn and CUNI presents an original parsing approach that expands the conventional projective treebased method to include the nonprojective dependencies in text. Authors represent words and their relationships as vertices and edges of a complete directed graph respectively, and then employ ChuLiuEdmonds algorithm to find the maximum spanning trees allowing nonprojective parses. Importantly, they demonstrated better accuracy for Czech language and higher efficiency (O(n^2)) compared to Eisner’s projective parsing (O(n^3)). Generality and unexpected asymptotical computational simplicity of the proposed approach attracted numerous researchers and led to the best student paper award on HLT/EMNLP conference in 2005. Conventionally, finding a maximum projective spanning tree (MST) corresponds to finding a maximum dependency tree, which could be solved by ChuLiuEdmonds or Eisner algorithms. To find the solution, authors define a score s(x,y) = sum[w * f(i,j)], where (i,j) is an edge in a dependency tree y corresponding to a sentence {x}, f is a feature vector, and w is the weight vector that needs to be optimized. To extend the procedure to nonprojective trees, authors apply a greedy ChuLiuEdmonds algorithm, where for each vertex it leaves only the incoming edge with the highest score. If the resulting graph has cycles, they are replaced with vertices, and the algorithms iterates until it converges to a tree, which must be the MST. Tarjan’s efficient implementation of the algorithms guarantees O(n^2), which dispels the notion that nonprojective parsing is “harder” than projective parsing that takes O(n^3). To optimize weight vector w, authors employ factored version of Margin Infused Relaxed Algorithm (MIRA), which iteratively updates w for each sample (x_t,y_t) subject to minw_t+1 – w_t s.t. s(l, j) – s(k, k) >= 1 for (l,j) in y_t and (k,j) not in y_t. To evaluate the proposed approach, the paper shows dependency parsing results for Czech, using the Prague Dependency Treebank that has both projective parsing and nonprojective parsing tasks. The method achieves the state of the art performance for the dataset and promises an outstanding generality for numerous other languages and low computation cost. However, the paper vaguely defines the highdimensional feature vectors f(i,j) that must play a crucial role in the optimization of the weight vector w. Moreover, lower accuracy and completeness of the proposed method for English requires a closer look at the optimization procedure. Finally, as the authors mentioned in the paper, the nonprojective parsing does not generalize well to bigger substructures as the search becomes intractable.
Your comment:
