CCLQA Output Format

CCLQA participants will submit the result in the XML format described below.

Overview

Tag

Description

TOPIC_SET

Contains a meta data and a list of topics

METADATA

Must include meta information on run id, and system description. If the run is related with any question analysis run or IR4QA run, specify their run id.

TOPIC

Each TOPIC is associated with CCLQA_RESULT.

CCLQA_RESULT

Contains a ranked list of ANSWER_CANDIDATE

ANSWER_CANDIDATE

A ranked list of system responses. SCORE is optional but you are recommended to produce this value (preferably between 0 and 1).

For the definition of Run ID, refer to RunIDFormat.

It is recommended that ANSWER_CANDIDATE field be escaped with CDATA notation for some illegal characters.

DTD

<!DOCTYPE TOPIC_SET [
<!ELEMENT TOPIC_SET (METADATA,TOPIC*)>
<!ELEMENT METADATA (RUNID,DESCRIPTION,QUESTION_ANALYSIS_RUN?,IR4QA_RUN?)>
<!ELEMENT RUNID (#PCDATA)>
<!ELEMENT DESCRIPTION (#PCDATA)>
<!ELEMENT QUESTION_ANALYSIS_RUN (#PCDATA)>
<!ELEMENT IR4QA_RUN (#PCDATA)>
<!ELEMENT TOPIC (CCLQA_RESULT)>
<!ATTLIST TOPIC ID CDATA #REQUIRED>
<!ELEMENT CCLQA_RESULT (ANSWER_CANDIDATE+)>
<!ELEMENT ANSWER_CANDIDATE (#PCDATA)>
<!ATTLIST ANSWER_CANDIDATE RANK CDATA #REQUIRED>
<!ATTLIST ANSWER_CANDIDATE DOCID CDATA #REQUIRED>
<!ATTLIST ANSWER_CANDIDATE SCORE CDATA #IMPLIED>
]>

Sample XML Format

<TOPIC_SET>

  <METADATA>
    <RUNID>TEAMX-EN-JA-01-T</RUNID>
    <DESCRIPTION>We used IR4QA results from the team X. Answer extraction is based on the probabilistic model.</DESCRIPTION>
    <QUESTION_ANALYSIS_RUN>TEAMX-EN-JA-01-T</QUESTION_ANALYSIS_RUN>
    <IR4QA_RUN>TEAMX-EN-JA-02-T</IR4QA_RUN>
  </METADATA>

  <TOPIC ID="ACLIA2-JA-T0001">
    <CCLQA_RESULT>
      <ANSWER_CANDIDATE RANK="1" DOCID="JA-050912009" SCORE="1.00"><![CDATA[アラファト議長の最大支持基盤]]></ANSWER_CANDIDATE>
      <ANSWER_CANDIDATE RANK="2" DOCID="JA-030828004" SCORE="0.92"><![CDATA[PLOの主流派である]]></ANSWER_CANDIDATE>
    </CCLQA_RESULT>
  </TOPIC>

  <TOPIC ID="ACLIA2-JA-T0002">
    <CCLQA_RESULT>
      <ANSWER_CANDIDATE RANK="1" DOCID="JA-040512181" SCORE="1.00"><![CDATA[日本人に多発する脳血管疾患]]></ANSWER_CANDIDATE>
      <ANSWER_CANDIDATE RANK="2" DOCID="JA-020512181" SCORE="0.60"><![CDATA[原因不明で後遺症が残ることもある]]></ANSWER_CANDIDATE>
    </CCLQA_RESULT>
  </TOPIC>

</TOPIC_SET>

CCLQAFormat (last edited 2010-01-02 19:33:01 by HidekiShima)