XmlLite handles whitespace inconsistently

  • Thread starter Thread starter MarekKnápek
  • Start date Start date
M

MarekKnápek

Guest
Hi, I found, what I believe, is a bug in XmlLite library. Consider I have XML file with following structure:


<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

<abc>

<abc>

<abc>

<abc>

<abc>

</abc>

</abc>

</abc>

</abc>

</abc>



It contains root element named abc which contains single element named abc, which contains single element named abc, ... and so on, and so on. Each element begins on separate line, each nesting level is achieved by single tab character and each element ends on its own line. I want to parse this document using XmlLite library. According the documentation, the library uses a "pull" model where an user (me) should call library (XmlLite.dll) function (IXmlReader.Read) repeatedly to "pull" an element from document and examine its content. It works fine for small documents. The library return following results:


XmlNodeType_XmlDeclaration

XmlNodeType_Whitespace

XmlNodeType_Element

XmlNodeType_Whitespace

XmlNodeType_Element

XmlNodeType_Whitespace

XmlNodeType_Element

XmlNodeType_Whitespace

XmlNodeType_Element

XmlNodeType_Whitespace

XmlNodeType_Element

XmlNodeType_Whitespace

XmlNodeType_EndElement

XmlNodeType_Whitespace

XmlNodeType_EndElement

XmlNodeType_Whitespace

XmlNodeType_EndElement

XmlNodeType_Whitespace

XmlNodeType_EndElement

XmlNodeType_Whitespace

XmlNodeType_EndElement

XmlNodeType_Whitespace



This is fine, this is small document with nesting level of 5. But when I have larger document with the same structure but with much, much deeper nesting (over 5000) an inconsistency manifests itself. The library suddenly starts, seemingly randomly, to return XmlNodeType_Text containing only whitespaces instead of XmlNodeType_Whitespace. So I believe this is a bug in XmlLite.


My environment is Windows 7 32 bit, XmlLite.dll version 1.03.1001.0.


I have small C++ program which can generate XML document with any nesting depth and then try to parse it. To use it, I first generate XML file by `ConsoleApplication1.exe write temp.xml 5` call, then I parse the document by calling `ConsoleApplication1.exe read temp.xml > out.txt` call. When I inspect the contents of `out.txt`, I realize it is the same as I posted here. But when I change 5 to 5000 in fist invocation of my application, parse it and then inspect the output file, I will find unexpected XmlNodeType_Text entries in it. Which, to me, is bug in XmlLite.


Am I right? Where should I report such a bug?


Thanks, Marek.






ConsoleApplication1 source code:


#include <iostream>

#include <string>

#include <cstring>

#include <iterator>

#include <string>

#include <fstream>



#include <windows.h>

#include <shlwapi.h>

#include <xmllite.h>





static constexpr char const crlf[] = "\x0D\x0A";

static constexpr char const xml_header[] = R"---(<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>)---";

static constexpr char const tab = '\t';

static constexpr char const elem_begin[] = "<abc>";

static constexpr char const elem_end[] = "</abc>";





int main(int argc, char const* argv[])

{

if(std::strcmp(argv[1], "write") == 0)

{

std::ofstream ofs(argv[2], std::ios_base::out | std::ios_base::binary | std::ios_base::trunc);

int depth = std::stoi(argv[3]);

ofs.write(xml_header, std::size(xml_header) - 1);

ofs.write(crlf, std::size(crlf) - 1);

for(int i = 0; i != depth; ++i)

{

for(int j = 0; j != i; ++j)

{

ofs.write(&tab, 1);

}

ofs.write(elem_begin, std::size(elem_begin) - 1);

ofs.write(crlf, std::size(crlf) - 1);

}

for(int i = depth - 1; i != -1; --i)

{

for(int j = 0; j != i; ++j)

{

ofs.write(&tab, 1);

}

ofs.write(elem_end, std::size(elem_end) - 1);

ofs.write(crlf, std::size(crlf) - 1);

}

return 0;

}

else if(std::strcmp(argv[1], "read") == 0)

{

HMODULE shlwapi = LoadLibraryA("shlwapi.dll");

HMODULE xmllite = LoadLibraryA("xmllite.dll");

IStream* is;

(*reinterpret_cast<HRESULT(__stdcall*)(LPCSTR, DWORD, IStream**)>(GetProcAddress(shlwapi, "SHCreateStreamOnFileA")))(argv[2], STGM_READ, &is);

IXmlReader* reader;

(*reinterpret_cast<HRESULT(__stdcall*)(REFIID, void**, IMalloc*)>(GetProcAddress(xmllite, "CreateXmlReader")))(__uuidof(IXmlReader), reinterpret_cast<void**>(&reader), nullptr);

reader->SetProperty(XmlReaderProperty_MaxElementDepth, 64 * 1024);

reader->SetInput(is);

XmlNodeType node_type;

while(reader->Read(&node_type) == S_OK)

{

switch(node_type)

{

case XmlNodeType_Element:

{

std::cout << "XmlNodeType_Element\n";

}

break;

case XmlNodeType_Text:

{

std::cout << "XmlNodeType_Text\n";

}

break;

case XmlNodeType_Whitespace:

{

std::cout << "XmlNodeType_Whitespace\n";

}

break;

case XmlNodeType_EndElement:

{

std::cout << "XmlNodeType_EndElement\n";

}

break;

case XmlNodeType_XmlDeclaration:

{

std::cout << "XmlNodeType_XmlDeclaration\n";

}

break;

default:

{

std::cout << "default\n";

}

break;

}

}

reader->Release();

is->Release();

FreeLibrary(xmllite);

FreeLibrary(shlwapi);

return 0;

}

return 1;

}

Continue reading...
 
Back
Top