Question Analysis Output Format

This documentation describes the xml format of question analysis output.

Overview

Tag

Description

TOPIC_SET

Contains a meta data and a list of topics.

METADATA

Must include meta information on run id, and system description.

TOPIC

Each TOPIC is associated with QUESTION_ANALYSIS.

QUESTION_ANALYSIS

Contains an ANSWERTYPE and KEYTERMS extracted from the question.

ANSWERTYPE

By default, one of DEFINITION, BIOGRAPHY, RELATIONSHIP, EVENT, WHY, PERSON, ORGANIZATION, LOCATION, DATE are expected. You can expand the answer type with your original type, if you wish. SCORE is an optional field.

KEYTERM

This field stores (translated) key word from the question. Synonym/alias can also be added as KEYTERM. SCORE is optional but you are recommended to produce this value (preferably between 0 and 1).

For the definition of Run ID, refer to RunIDFormat.

DTD

<!DOCTYPE TOPIC_SET [
<!ELEMENT TOPIC_SET (METADATA,TOPIC*)>
<!ELEMENT METADATA (RUNID,DESCRIPTION)>
<!ELEMENT RUNID (#PCDATA)>
<!ELEMENT DESCRIPTION (#PCDATA)>
<!ELEMENT TOPIC (QUESTION_ANALYSIS)>
<!ATTLIST TOPIC ID CDATA #REQUIRED>
<!ELEMENT QUESTION_ANALYSIS (ANSWERTYPE,KEYTERMS)>
<!ELEMENT ANSWERTYPE (#PCDATA)>
<!ATTLIST ANSWERTYPE SCORE CDATA #IMPLIED>
<!ELEMENT KEYTERMS (KEYTERM*)>
<!ATTLIST KEYTERMS LANGUAGE (CS|CT|EN|JA) #REQUIRED>
<!ELEMENT KEYTERM (#PCDATA)>
<!ATTLIST KEYTERM SCORE CDATA #IMPLIED>
]>

Sample XML Format

<TOPIC_SET>
  <METADATA>
    <RUNID>TEAMX-EN-JA-01-T</RUNID>
    <DESCRIPTION>We used Support Vector Machine for answer type classification and NP chunking.</DESCRIPTION>
  </METADATA>
  
  <TOPIC ID="ACLIA2-JA-T0001">
    <QUESTION_ANALYSIS>
      <ANSWERTYPE SCORE="1.0">DEFINITION</ANSWERTYPE>
      <KEYTERMS LANGUAGE="JA">
        <KEYTERM SCORE="1.0">ファタハ</KEYTERM>
        <KEYTERM SCORE="0.1">組織</KEYTERM>
      </KEYTERMS>
    </QUESTION_ANALYSIS>
  </TOPIC>
  
  <TOPIC ID="ACLIA2-JA-T0002">
    <QUESTION_ANALYSIS>
      <ANSWERTYPE SCORE="1.0">DEFINITION</ANSWERTYPE>
      <KEYTERMS LANGUAGE="JA">
        <KEYTERM SCORE="1.0">もやもや病</KEYTERM>
        <KEYTERM SCORE="0.3">病気</KEYTERM>
      </KEYTERMS>
    </QUESTION_ANALYSIS>
  </TOPIC>
</TOPIC_SET>

QuestionAnalysisFormat (last edited 2010-01-02 19:32:59 by HidekiShima)