I needed to build some XML, such that certain elements which were deeply nested required attributes on them.
Drupal comes with a format_xml_elements() function, which is fantastic for module developers who can’t know whether some end-user’s PHP environment will have XML extensions installed. In fact, Drupal’s built-in RSS formatter uses this function, and for just this reason.
So, I investigated the API page: http://api.drupal.org/api/drupal/includes–common.inc/function/format_xml_elements/7. Right away I saw how to use this function to format either an element like <tag>content</tag>, or an element like <tag attribute="value">content</tag>.
Well, nice. Keep in mind: My goal was formatting an XML tree with multiple levels of nested elements, some including attributes.
Those above two elements are done differently with the format_xml_elements() function.
I had elements like:
<mykey>myvalue</mykey>,
which I could format like
format_xml_elements(
array(
'mykey' => 'myvalue',
)
);
and I had elements like
<mytag myattr="someval" />,
which I could format like
format_xml_elements(
array(
'key' => 'mykey',
'value' => '',
'attributes' => array(
'myattr' => 'someval',
),
)
);
and I had one more element like
<thistag attr="valuehere">somecontent</thistag>
which I could format like
format_xml_elements(
array(
'key' => 'thistag',
'value' => 'somecontent',
'attributes' => array(
'attr' => 'valuehere',
),
)
);
Pretty quickly, I also tried multiple elements at once, like
<item>stuff</item>
<item>morestuff</item>
<item>yourstuff</item>
<anotheritem att="specialval">mystuff</anotheritem>,
which I could format like
format_xml_elements(
array('item' => 'stuff'),
array('item' => 'mystuff'),
array('item' => 'yourstuff'),
array(
'key' => 'anotheritem',
'value' => 'mystuff',
'attributes' => array(
'att' => 'specialval',
),
)
);
OK, great. Learned that, fast. Formatting one element, or even a bunch of sibling elements all at once, turns out to be very straightforward.
Very quickly I updated my module from concatenating a whole mess of strings together to build XML, to using calls to format_xml_elements() on each item which I had previously been wrapping in ugly tag-string concatenations.
It still wasn’t very satisfying, though. This gave me a collection of elements, but I still had not nested them properly yet. Naturally I tried to just see if passing nested arrays in would yield an XML tree out.
To get
<element>
<subelement>nicevalue</subelement>
</element>
I tried
format_xml_elements(
array(
'element' => array(
'subelement' => 'nicevalue',
),
)
);
Great! Worked. format_xml_elements() clearly works recursively. In fact you can see that from the sourcecode of the function, linked above. No surprise.
However, when I tried creating child-elements with attributes, using the longer-format specification, I ran into big trouble.
To get
<mykey>
<mysubkey attrib="goodvalue" />
</mykey>
I tried
format_xml_elements(
array(
'mykey' => array(
'key' => 'mysubkey',
'value' => '',
'attributes' => array(
'attribs' => 'subvalue',
),
),
)
);
Well. That got all ‘splodey.
I wound up with a whole mess of un-intended elements – elements like <attributes />, <attribs />, <key />, <value />… and no attributes on any of these elements. OK, maybe recursion only can work with atrribute-free elements… I don’t know… seems plausible…
I really have to get child elements with attributes on them! And it didn’t work with one format_xml_elements() call with a big, deep array for an argument.
I found the two Drupal-core functions which call format_xml_elements(), to see how it was used in core itself. It’s used by two RSS-formatting functions, and, the element arrays passed into it are just not as deep or attribute-laden as mine, so, those real-life usage examples were no help.
So, what next? How about – manually passing output of one format_xml_elements() call into a new format_xml_elements() call? Do my own nesting without trying to let the function recurse on itself. A-like so:
To get
<mykey>
<mysubkey attrib="goodvalue" />
</mykey>
I tried
$subkey = format_xml_elements(
array(
'key' => 'mysubkey',
'value' => '',
'attributes' => array(
'attrib' => 'goodvalue',
),
)
);
return $format_xml_elements(
array(
'mykey' => $subkey,
)
);
That actually worked. However, the special characters present in the $subkey string wound up escaped, so I wound up with
<mykey>
<mysubkey attrib="goodvalue" />
</mykey>.
Not cool!
So I changed to
return htmlspecialchars_decode(
format_xml_elements(
array(
'mykey' => $subkey,
)
)
);
in order to un-escape this string back to normal XML.
Whoa. Whatever.
After all that, I wound up with a working module which included many calls to format_xml_elements() and many htmlspecialchars_decode() calls too. Ugly but working. Sigh.
This was just fine, until I got ready for a code-review on the part of drupal.org project application volunteer reviewers. I *really* wanted to polish up this module so that format_xml_elements() was only present once in my XML-builder function. Plus, I thought that the very presence of htmlspecialchars_decode() just stunk. I was convinced that I should discover whether format_xml_elements() could be used on a large array all at once, or not, once and for all. I would either learn to use it the right way, or, I would write my own recursive function to wrap around format_xml_elements() so that I could use one explicit function call on my big array, instead of repetitively spamming my module full of format_xml_element() and htmlspecialchars_decode() calls.
Over about four days, I asked about format_xml_elements() in #drupal-support and in #drupal-contribute on Freenode IRC. No bites at all. It was starting to become comical, seeming that nobody was familiar with this function. Well, someone joined #drupal-support today, announcing their arrival with “Hello Drupalers ! Anyone got a good problem for me?”
I mean, that’s just asking for it, isn’t it?
So I said, “Yes, I do”! At first, FrobinRobin agreed with me, that it appeared format_xml_elements() would turn multidimensional arrays into simple XML elements, recursively, but that it wouldn’t set attributes on elements deeper than the zero’th level, and that calling format_xml_elements() multiple times explicitly was the easiest way to manage an array like mine. Well, lucky for me, FrobinRobin took it as a personal challenge, to put more effort into investigating this (after all, they asked for a “good problem”, didn’t they?)
Anticlimax: FrobinRobin decided that explicit numeric array-keys were required, and that if this were done, format_xml_elements() could successfully fully recurse an array which described an XML result of any complexity, either or without attributes on arbitrary elements. With this in mind, they worked on making an array intended to produce the output from format_xml_elements() which I needed. After several tries, they arrived at one.
Now: FrobinRobin succeeded at building an array which would produce the right XML output. However, I removed the explicit numeric key assignments, because if you don’t specify a key, PHP automatically sets them to numeric index keys. In other words, setting the numeric keys explicitly was a useful way to highlight what was going on in the function (its sourcecode contains a check for a numeric key-index), but, it’s not necessary to be explicit about the numbers. In fact, setting them explicitly would make our array-building harder, especially in cases where the arrays are constructed programmatically and are very large. Who wants to write extra code for not only setting the numeric key-index, but, for keeping track of the current index position?
So, if spelling out the numeric key-index didn’t actually have to do with the success of the array-to-XML conversion, what did?
Basically the crux was that it’s necessary to read the documentation of this function extremely carefully, in order to understand that the “long form” element specification (the kind which includes attributes) requires more arrays than it actually seems!
Specifically, while
format_xml_elements(
array(
'mykey' => array(
'key' => 'mysubkey',
'value' => '', 'attributes' => array(
'attribs' => 'subvalue',
),
),
)
);
yielded
<mykey>
<key>mysubkey</key>
<value></value>
<attributes>
<attribs>subvalue</attribs>
</attributes>
</mykey>,
it was necessary to put the subkey element inside one more array. So
format_xml_elements(
array(
'mykey' => array(
array(
'key' => 'mysubkey',
'value' => '',
'attributes' => array(
'attribs' => 'subvalue',
),
),
),
)
);
yields
<mykey>
<mysubkey attribs="subvalue" />
</mykey>.
Just what the doctor ordered.
It’s not easy to tell from the documentation that when an element (array member) is the child (value) of another element (array member) (parent), the child itself must be an array which contains its own complete definition.
For a simple, attribute-free element, that means it can be described as a key/value pair, and that’s the end of it: array('nameofelement' => 'valueofelement'). But for an attribute-containing element, it’s described not as a key/value pair, but as an array with named keys for the element name, the element value, and the element attributes – which are the members of yet another associative array. So by definition it’s an array within an array, just to describe the element, with yet another array for all the attributes. A-like so:
array(
array(
'key' => 'nameofelement',
'value' => 'valueofelement',
'attributes' => array(
'nameofattribute' => 'valueofattribute',
),
),
).
It’s subtle, and, because you don’t construct this apparent doubly-nested array when passing an attribute-bearing element to format_xml_elements() directly, it’s very easy to forget, or overlook, or fail to grok, that you must do this when such element isn’t being directly passed to format_xml_elements() but is instead being assigned as the value of a parent element.
When setting a parent element which is already described by an array to contain such a child, the element is either “a key/value pair” or “an associative array” (see the function’s page on api.drupal.org). In other words, it’s easy to see that the key/value pair needs to be wrapped by an “array()”, but when the element isn’t a bare key/value pair but is itself already an array, it still must be wrapped in an “array()” just like the key/value pair of the simpler, attribute-free element.