閑話休題: XSLT2.0でテキストファイルをXML化する

■ 最初は「超」簡単な例.
<?xml version="1.0" encoding="UTF-8" ?>
The design of the Darwin Information Typing Architecture (DITA) is based on deriving multiple information types, or topic types, from a common, generic topic. This language reference describes the elements that comprise the topic DTD and its initial, information-typed descendents: concept, reference, task, and glossentry. It also describes the DITA map DTD and its current specialization (bookmap), as well as various topic and map based DITA domains.
This specification describes specific details of each element in the OASIS DITA language. The separate DITA Architectural Specification includes detailed information about DITA specialization, when to use each topic type, how topics and maps interact, details of complex behaviors such as conref and conditional processing, and many other best practices for working with DITA.
The elements that make up the DITA design represent a set of different authoring concerns, each of which is grouped into its own chapter. Major sections include:
<xsl:output encoding="UTF-8" indent="yes"/>
<xsl:template match="document">
    <xsl:variable name="inputText"
        <xsl:analyze-string select="$inputText" regex="\n">
                    <xsl:value-of select="."/>
<?xml version="1.0" encoding="UTF-8"?>
   <p>The design of the Darwin Information Typing Architecture (DITA) is based on deriving multiple information types, or topic types, from a common, generic topic. This language reference describes the elements that comprise the topic DTD and its initial, information-typed descendents: concept, reference, task, and glossentry. It also describes the DITA map DTD and its current specialization (bookmap), as well as various topic and map based DITA domains.</p>
   <p>This specification describes specific details of each element in the OASIS DITA language. The separate DITA Architectural Specification includes detailed information about DITA specialization, when to use each topic type, how topics and maps interact, details of complex behaviors such as conref and conditional processing, and many other best practices for working with DITA.</p>
   <p>The elements that make up the DITA design represent a set of different authoring concerns, each of which is grouped into its own chapter. Major sections include:</p>
きれいに<p>タグで囲まれて出力できました.<xsl:analyze-string select="$inputText" regex="\n">で、U+000Aを区切りにテキストを分割し、<xsl:non-matching-substring>でU+000A以外のテキストを<p>~</p>で囲んで出力しています.簡単ですね.
こんどは"," 区切りのCSVファイルを処理してみます.
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:template match="data">
    <xsl:variable name="inputText"
        <xsl:analyze-string select="$inputText" regex="\n">
                <xsl:analyze-string select="." regex="([^,]*),([^,]*),([^,]*)">
                                <xsl:value-of select="regex-group(1)"/>
                                <xsl:value-of select="regex-group(2)"/>
                                <xsl:value-of select="regex-group(3)"/>
<?xml version="1.0" encoding="UTF-8"?>
正規表現の書き方に慣れていないと戸惑いますが、こんなに簡単にテキストのXML化が出来るとは思いませんでした.XSLT 2.0はこの機能で非常にテキスト処理の可能性を広げたと思います.