Can not Import a new project due to error in XML Parsing


Bonjour Rachana,

using OBI 3.9.1 I try to Create a new Project by Import for a DTB recorded with Plextalk. The NCC.HTML is there, tons of MP3s and SMILs.
The book is named 'Les grands hommes et leur mère'.

An error message is displayed on the screen (I will attach the capture just below) with in french:
Title of the window: L'importation a échoué (Import failed in english)
and the message is:Impossible de créer un projet à partir de l'importation: ParseXmlFromfile.

Looking at the NCC.HTML I can not see what could be wrong in the XML.

I tried to Import in OBI 391 another book (so another NCC.HTML) and all is fine, so the problem is within the first book itself (the first NCC.HTML).

I have to say that I tried to Import this DTB using Plextalk and it is importing perfectly....;-(

What is wrong in this particular NCC.HTML or whatever SMIL ? Could we have a more explicit error message (at least the name of the file
where the problem could be or the XML tag that is causing the problem ?) ?
To be sure that you can see everything, I will move all the DTB in a Dropbox for you. I add a new file here: only the error message and the NCC.HTML

Thanks for your quick help on this

I noticed the reader did not record a section 'Title' (usually the first one in our DTB) but started directly with the 'Legal statement'
(Mention légale in french). I do not see why it could be a problem, but who knows ?

Hello Rachana,

1) finally I found why OBI 391 was refusing to Import our DTB: nothing to see with the Title not there.
If you look in the NCC HTML I provided you in the DropBox you will see line 128:

Louis XIII : ou comment se déb&rasser d'une mère sans la tuer ; page 77

Notice approx in position 111 you have a character & in déb&rasser. In french it means nothing of course the word is débarrasser. So I imagine it is a typo from our reader when he recorded the section.
I updated that line 128 like this

Louis XIII : ou comment se débarrasser d'une mère sans la tuer ; page 77

and OBI 391 is very happy to Import our DTB.
Now my main "comment" is : Why an Ampersand & is causing such a big problem in OBI that it refuses to Import the whole
book when Plextalk continues to Import it ?

2) Additionnally my other request to have a better error message in OBI remains...;-) I should get the information that the
error is within the NCC.HTML and possibly indicating the line where the problem is. I have to say that running the DAISY 2.02 Regenerator' on the PLextalk files, I could see that the DAISY 2.02 Regenerator found itself the error and explicitly told me that
it is on line 128 at Position 111....;-)
Is there a chance to have this implemented in OBI ?
Thanks for your help

Hi Eric,

It looks that you are using quite old version of Plextalk. It is creating files with old Windows 1252 encoding. This was also the cause of problem of accents that you reported some weeks ago.
Obi expects UTF 8 encoding, which is the most widely used standard at this time. We have added a checkbox in project preferences to import as per old Windows 1252 encoding. It will solve one issue but it is not possible to fix all the problems of old fileset while import. There are tools available to fix the old files like regenerator that you used, Pipeline 1 also has scripts for fixing errors in DAISY books. It would be good to use these tools for fixing issues in the files created by old tools, before importing in Obi.
Hopefully it will be temporary phase, once you migrate your production system to new technology, you will not face these issues

With regards