Contents

Contents

1 Introduction
1.1 Indexing
1.2 Searching
1.3 Motivation
1.4 Research Questions
1.5 Main Contributions
1.6 Overview of the Thesis
1.7 Origins
2 Background
2.1 Information Retrieval
2.2 Generative Language Modeling for IR
  2.2.1 Query Likelihood
  2.2.2 KL divergence
  2.2.3 Relation to Probabilistic Approaches
2.3 Query Modeling
  2.3.1 Translation Model
  2.3.2 Relevance Feedback
  2.3.3 Term Dependence Models
2.4 Language Modeling Variations
  2.4.1 Topic Models
  2.4.2 Concept Models
  2.4.3 Cluster-based Language Models
2.5 Linking Free Text to Concepts
  2.5.1 Natural Language Interfaces to Databases
  2.5.2 Ontology Matching
  2.5.3 Ontology Learning, Ontology Population, and Semantic Annotation
2.6 Summary
3 Experimental Methodology
3.1 Relevance
3.2 Evaluation
  3.2.1 Evaluation Measures
  3.2.2 Statistical Significance Testing
3.3 Test Collections
  3.3.1 TREC Robust 2004
  3.3.2 TREC Terabyte 2004–2006
  3.3.3 TREC Relevance Feedback 2008
  3.3.4 TREC Web 2009
  3.3.5 CLEF Domain-Specific 2007–2008
  3.3.6 TREC Genomics 2004–2006
3.4 Parameter Settings
3.5 Summary
4 Query Modeling Using Relevance Feedback
4.1 Estimating the Importance of Feedback Documents
  4.1.1 MLgen: A Generative Model
  4.1.2 Normalized Log-likelihood Ratio
  4.1.3 Models Related to MLgen and NLLR
4.2 Experimental Setup
4.3 Pseudo Relevance Feedback
  4.3.1 Results and Discussion
  4.3.2 Per-topic Results
  4.3.3 Number of Terms in the Query Models
4.4 Explicit Relevance Feedback
  4.4.1 Experimental Results
  4.4.2 Per-topic Results
  4.4.3 Number of Relevant Documents
  4.4.4 Number of Terms in the Query Models
  4.4.5 Upshot
4.5 Summary and Conclusions
5 Query Modeling Using Concepts
5.1 Conceptual Language Models
  5.1.1 Conceptual Query Modeling
  5.1.2 Generative Concept Models
5.2 Experimental Setup
  5.2.1 Parameter Estimation
  5.2.2 Complexity and Implementation
  5.2.3 Baselines
5.3 Results and Discussion
  5.3.1 Baselines
  5.3.2 Conceptual Language Models
5.4 Parameter Sensitivity Analysis
5.5 Summary and Conclusions
6 Linking Queries to Concepts
6.1 The Task
6.2 Approach
  6.2.1 Ranking Concepts
  6.2.2 Learning to Select Concepts
  6.2.3 Features Used
6.3 Experimental Setup
  6.3.1 Data
  6.3.2 Training Data
  6.3.3 Parameters
  6.3.4 Testing and Evaluation
6.4 Results
  6.4.1 Lexical Match
  6.4.2 Retrieval Only
  6.4.3 N-gram based Concept Selection
  6.4.4 Full Query-based Concept Selection
6.5 Discussion
  6.5.1 Inter-annotator Agreement
  6.5.2 Textual Concept Representations
  6.5.3 Robustness
  6.5.4 Feature Types
  6.5.5 Feature Selection
  6.5.6 Error Analysis
6.6 Summary and Conclusions
7 Query Modeling Using Linked Concepts
7.1 Linking queries to Wikipedia
7.2 Experimental Setup
7.3 Results and Discussion
7.4 Summary and Conclusions
8 Conclusions
8.1 Main Findings
8.2 Implications for Future Work
Bibliography
A Nomenclature
Samenvatting