Conference Publication Details
Mandatory Fields
Sutcliffe R.;White K.;Slattery D.;Gabbay I.;Mulcahy M.
CEUR Workshop Proceedings
Cross-language French-English question answering using the DLT system at CLEF 2006
2006
January
Published
1
()
Optional Fields
Question answering
The basic architecture of our factoid system is standard in nature and comprises query type identification, query analysis and translation, retrieval query formulation, document retrieval, text file parsing, named entity recognition and answer entity selection. Factoid classification into 69 query types is carried out using keywords. Associated with each type is a set of one or more Named Entities. Xelda is used to tag the French query for partof-speech and then shallow parsing is carried out over these in order to recognise thirteen different kinds of significant phrase. These were determined after a study of the constructions used in French queries together with their English counterparts. Our observations were that (1) Proper names usually only start with a capital letter with subsequent words un-capitalised, unlike English; (2) Adjective-Noun combinations either capitalised or not can have the status of compounds in French and hence need special treatment; (3) Certain noun-preposition-noun phrases are also of significance. The phrases are then translated into English by the engine WorldLingo and using the Grand Dictionnaire Terminologique, the results being combined. Each phrase has a weight assigned to it by the parser. A Boolean retrieval query is formulated consisting of an AND of all phrases in increasing order of weight. The corpus is indexed by sentence using Lucene. The Boolean query is submitted to the engine and if unsuccessful is re-submitted with the first (least significant) term removed. The process continues until the search succeeds. The documents (i.e. sentences) are retrieved and the NEs corresponding to the identified query type are marked. Significant terms from the query are also marked. Each NE is scored based on its distance from query terms and their individual weights. The answer returned is the highest-scoring NE. Temporarily Restricted Factoids are treated in the same way as Factoids. Definition questions are classified in three ways: organisation, person or unknown. This year Factoids had to be recognised automatically by an extension of the classifier. An IR query is formulated using the main term in the original question plus a disjunction of phrases depending on the identified type. All matching sentences are returned complete. Results this year were as follows: 32/150 (21%) of Factoids were R, 14/150 (9%) were X, 4/40 (10%) of Definitions were R and 2 List results were R (P@N = 0.2). Our ranking in Factoids relative to all thirteen runs was Fourth. However, scoring all systems over R&X together and including Definitions, our ranking would be Second Equal because we had more X scores than any other system. Last year our score on Factoids was 26/150 (17%) but the difference is probably the easier queries this year.
Grant Details