Rating: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading...

Parsing XML using PHP : Good example

The following example illustrates how to use an external entity reference handler to include and parse other documents, as well as how PIs can be processed, and a way of determining “trust” for PIs containing code.

Consider the following XML’s

< ?xml version=’1.0′?>
< !DOCTYPE chapter SYSTEM “/just/a/test.dtd” [
<!ENTITY plainEntity “FOO entity”>
< !ENTITY systemEntity SYSTEM “xmltest2.xml”>
]>
<chapter>
<title>Title &plainEntity;</title>
<para>
<informaltable>
<tgroup cols=”3″>
<tbody>
<row><entry>a1</entry><entry morerows=”1″>b1</entry><entry>c1</entry></row>
<row><entry>a2</entry><entry>c2</entry></row>
<row><entry>a3</entry><entry>b3</entry><entry>c3</entry></row>
</tbody>
</tgroup>
</informaltable>
</para>
&systemEntity;
<section id=”about”>
<title>About this Document</title>
<para>
<!– this is a comment –>
< ?php echo ‘Hi!  This is PHP version ‘ . phpversion(); ?>
</para>
</section>
</chapter>

<?xml version=”1.0″?>
<!DOCTYPE foo [
<!ENTITY testEnt “test entity”>
]>
<foo>
<element attrib=”value”/>
&testEnt;
<?php echo “This is some more PHP code being executed.”; ?>
</foo>

The following code shows how we can parse the above XML file using PHP

< ?php
$file = “xmltest.xml”;

function trustedFile($file)
{
// only trust local files owned by ourselves
if (!eregi(“^([a-z]+)://”, $file)
&& fileowner($file) == getmyuid()) {
return true;
}
return false;
}

function startElement($parser, $name, $attribs)
{
echo “&lt;<font color=\”#0000cc\”>$name”;
if (count($attribs)) {
foreach ($attribs as $k => $v) {
echo ” <font color=\”#009900\”>$k</font>=\”<font color=\”#990000\”>$v</font>\””;
}
}
echo “&gt;”;
}

function endElement($parser, $name)
{
echo “&lt;/<font color=\”#0000cc\”>$name</font>&gt;”;
}

function characterData($parser, $data)
{
echo “<b>$data</b>”;
}

function PIHandler($parser, $target, $data)
{
switch (strtolower($target)) {
case “php”:
global $parser_file;
// If the parsed document is “trusted”, we say it is safe
// to execute PHP code inside it.  If not, display the code
// instead.
if (trustedFile($parser_file[$parser])) {
eval($data);
} else {
printf(“Untrusted PHP code: <i>%s</i>”,
htmlspecialchars($data));
}
break;
}
}

function defaultHandler($parser, $data)
{
if (substr($data, 0, 1) == “&” && substr($data, -1, 1) == “;”) {
printf(‘<font color=”#aa00aa”>%s</font>’,
htmlspecialchars($data));
} else {
printf(‘<font size=”-1″>%s</font>’,
htmlspecialchars($data));
}
}

function externalEntityRefHandler($parser, $openEntityNames, $base, $systemId,
$publicId) {
if ($systemId) {
if (!list($parser, $fp) = new_xml_parser($systemId)) {
printf(“Could not open entity %s at %s\n”, $openEntityNames,
$systemId);
return false;
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($parser, $data, feof($fp))) {
printf(“XML error: %s at line %d while parsing entity %s\n”,
xml_error_string(xml_get_error_code($parser)),
xml_get_current_line_number($parser), $openEntityNames);
xml_parser_free($parser);
return false;
}
}
xml_parser_free($parser);
return true;
}
return false;
}

function new_xml_parser($file)
{
global $parser_file;

$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, 1);
xml_set_element_handler($xml_parser, “startElement”, “endElement”);
xml_set_character_data_handler($xml_parser, “characterData”);
xml_set_processing_instruction_handler($xml_parser, “PIHandler”);
xml_set_default_handler($xml_parser, “defaultHandler”);
xml_set_external_entity_ref_handler($xml_parser, “externalEntityRefHandler”);

if (!($fp = @fopen($file, “r”))) {
return false;
}
if (!is_array($parser_file)) {
settype($parser_file, “array”);
}
$parser_file[$xml_parser] = $file;
return array($xml_parser, $fp);
}

if (!(list($xml_parser, $fp) = new_xml_parser($file))) {
die(“could not open XML input”);
}

echo “<pre>”;
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf(“XML error: %s at line %d\n”,
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
echo “</pre>”;
echo “parse complete\n”;
xml_parser_free($xml_parser);

?>

I hope this will help. Your comments are welcome.

  • admin

    Great Example dude!

    I have tried it by my own and it worked smoothly still i wondered i should write a Class for this and here it goes

    < ?php class Simple_Parser { var $parser; var $error_code; var $error_string; var $current_line; var $current_column; var $data = array(); var $datas = array(); function parse($data) { $this->parser = xml_parser_create(‘UTF-8’);
    xml_set_object($this->parser, $this);
    xml_parser_set_option($this->parser, XML_OPTION_SKIP_WHITE, 1);
    xml_set_element_handler($this->parser, ‘tag_open’, ‘tag_close’);
    xml_set_character_data_handler($this->parser, ‘cdata’);
    if (!xml_parse($this->parser, $data))
    {
    $this->data = array();
    $this->error_code = xml_get_error_code($this->parser);
    $this->error_string = xml_error_string($this->error_code);
    $this->current_line = xml_get_current_line_number($this->parser);
    $this->current_column = xml_get_current_column_number($this->parser);
    }
    else
    {
    $this->data = $this->data[‘child’];
    }
    xml_parser_free($this->parser);
    }

    function tag_open($parser, $tag, $attribs)
    {
    $this->data[‘child’][$tag][] = array(‘data’ => ”, ‘attribs’ => $attribs, ‘child’ => array());
    $this->datas[] =& $this->data;
    $this->data =& $this->data[‘child’][$tag][count($this->data[‘child’][$tag])-1];
    }

    function cdata($parser, $cdata)
    {
    $this->data[‘data’] .= $cdata;
    }

    function tag_close($parser, $tag)
    {
    $this->data =& $this->datas[count($this->datas)-1];
    array_pop($this->datas);
    }
    }

    $xml_parser = new Simple_Parser;
    $xml_parser->parse(‘test‘);

    ?>

  • Rajesh Shah

    Seems like this XML parsing isn’t that tough which i thought initially in my mind.

    Thanks for the article, it helped a lot to understand the XML- PHP parsing.