Retrieving Compositional Documents Using Position-Sensitive Word Mover’s Distance

by Martin Trapp, Marcin Skowron, Dietmar Schabus
Abstract:
Retrieving similar compositional documents which consist of ranked sub-documents, such as threads of healthcare web fora containing community voted comments, has become increasingly important. However, approaches for this task have not exploited the semantic relationships between words so far and therefore do not use the effective generalization property present in semantic word embeddings. In this work, we propose an extension of the Word Mover’s Distance for compositional documents consisting of ranked sub-documents. In particular, we derive a Position-sensitive Word Mover’s Distance, which allows to retrieve compositional documents based on the semantic properties of their sub-documents. Additionally, we introduce a novel benchmark dataset for this task, to facilitate other researchers to work on this relevant problem. The results obtained on the novel dataset and on the well-known MovieLense dataset indicate that our approach is well suited for retrieving compositional documents. We conclude that incorporating semantic relations between words and sensitivity to the position and presentation bias is crucial for effective retrieval of such documents.
Reference:
Martin Trapp, Marcin Skowron, Dietmar Schabus, “Retrieving Compositional Documents Using Position-Sensitive Word Mover’s Distance”, In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR), Amsterdam, The Netherlands, pp. 233-236, 2017.
Bibtex Entry:
@InProceedings{Trapp2017,
  author    = {Trapp, Martin and Skowron, Marcin and Schabus, Dietmar},
  title     = {Retrieving Compositional Documents Using Position-Sensitive Word Mover's Distance},
  booktitle = {Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR)},
  year      = {2017},
  series    = {ICTIR '17},
  pages     = {233--236},
  address   = {Amsterdam, The Netherlands},
  month     = oct,
  abstract  = {Retrieving similar compositional documents which consist of ranked sub-documents, such as threads of healthcare web fora containing community voted comments, has become increasingly important. However, approaches for this task have not exploited the semantic relationships between words so far and therefore do not use the effective generalization property present in semantic word embeddings. In this work, we propose an extension of the Word Mover's Distance for compositional documents consisting of ranked sub-documents. In particular, we derive a Position-sensitive Word Mover's Distance, which allows to retrieve compositional documents based on the semantic properties of their sub-documents. Additionally, we introduce a novel benchmark dataset for this task, to facilitate other researchers to work on this relevant problem. The results obtained on the novel dataset and on the well-known MovieLense dataset indicate that our approach is well suited for retrieving compositional documents. We conclude that incorporating semantic relations between words and sensitivity to the position and presentation bias is crucial for effective retrieval of such documents.},
  acmid     = {3121084},
  doi       = {10.1145/3121050.3121084},
  isbn      = {978-1-4503-4490-6},
  keywords  = {compositional documents, ranked documents, word embeddings, word mover's distance},
  numpages  = {4},
  url       = {https://dl.acm.org/authorize?N658192},
}