ForumsDevelopersInvalid XML chars


Invalid XML chars
Author Message
Benjamin Leclerc

Posted: May 17, 2011
Score: 0 Reference
Hi,

Toodledo sometimes sends invalid XMLs.
These XMLs are invalid because of invalid characters (like char 0x1a).

These characters can be added in a note by copy paste (from Word for example).

The w3c specifies the valid chars.
I have patched my app to remove invalid chars, but it would be easier if toodledo does it upfront.

http://www.w3.org/TR/xml11/#charsets

Thank you
fahad

Posted: May 17, 2011
Score: 0 Reference
Yes this has been happening for over two years and has been reported before.

Toodledo please fix!


+1
Jake

Toodledo Founder
Posted: May 17, 2011
Score: 0 Reference
I thought we had fixed this. I guess I should say that awhile ago we implemented some code for scrubbing out invalid characters, but we may have missed a character or some entry point.

Do you know of a way to replicate the problem? Do you have a Word file that you can copy and paste from to produce an invalid character in Toodledo? If so, please share it with us so that we can fix the problem. I just ran my unit tests for this and they all passed with the known invalid characters being correctly removed.

These are the characters that we allow. All others are removed.

($char == 0x9) || //tab (9)
($char == 0xA) || //newline (10)
($char == 0xD) || //carriage return (13)
(($char >= 0x20) && ($char <= 0xD7FF)) || //space and printable characters (32-55295)
(($char >= 0xE000) && ($char <= 0xFFFD)) || //(57344-65533)
(($char >= 0x10000) && ($char <= 0x10FFFF))) //(65536-1114111)
fahad

Posted: May 17, 2011
Score: 0 Reference
I for example don't remember what other characters we've encountered in the recent past but I think the safest way to actually remove all other invalid utf-8 characters would be achieved by the following in PHP:

$str = iconv("UTF-8","UTF-8//IGNORE",$str);

This ensures all invalid UTF-8 characters are removed. Characters you've mentioned did indeed give us headache a year ago but I recall this was indeed fixed. There are still times when we've heard from users that 'sync' isn't working and it usually turns out to be some weird UTF-8 character that forms a malformed XML output. Other than that the following list shows 'valid' characters and a range of characters to avoid:

http://www.w3.org/TR/xml/#charsets

Thanks


This message was edited May 17, 2011.
Benjamin Leclerc

Posted: May 18, 2011
Score: 0 Reference
Unfortunately, I cannot reproduce it.

One of the users reported that error to me.
The only thing I saw is that the char 0x1a was present in the XML from Toodledo.

Then I have added a function to remove the invalid XML chars in my app (the same conditions has you mentioned above) and the problem was fixed.

So for me, somewhere in the text of a notebook, it is still possible that Toodledo sends invalid chars.


This message was edited May 18, 2011.
Jake

Toodledo Founder
Posted: May 20, 2011
Score: 0 Reference
We were able to find one place where invalid XML characters were not being scrubbed out of notebook entries, so this has been fixed. That might take care of it going forwards.
You cannot reply yet

U Back to topic home

R Post a reply

To participate in these forums, you must be signed in.