22
Views
6
Comments
Solved
[OfficeUtils] Special character makes broken word file
Question
officeutils
Service icon
Forge asset by Bruno Gonçalves
Application Type
Service

Hi there,

Great to see some things are updated and it's kept the same with the ODC version, great work! 

We have some functionality where people copy-paste data from different communication channels to text fields which are stored in OS just fine. Some background information, this is for auditing purposes where whats app messages are copied and pasted into the case of a customer. Sometimes some signs or smilies are converted to the square (□) image in the text field which works fine in OS. However when this is converted to a word file there isn't an error but you'll can't open the file (it just say error while opening, try .......). 

Also worked with the deprecated office template component ( https://www.outsystems.com/forge/component-overview/235/office-template-o11 ) which generates an error when this is the case ('□', hexadecimal value 0x02, is an invalid character) so then it isn't generated.

I've searched a lot and wasn't able to find a good post about which characters are allowed or which aren't regarding the generation of Word file, so hope you're familiar with this or have encounter same issue and can share made workaround?

Kind regards,
Evert

UserImage.jpg
Bruno Gonçalves
Solution

Hi Evert,

Just to let you know that I have just released a new version of OfficeUtils (5.3.0) that provides the validation of invalid XML characters, and an option to replace them automatically.

I got the impression that you might be using OfficeUtils in ODC, and in this case the new version is currently waiting for approval, which should not take more than a couple of days based on my experience.

You can set the option "AutoRemoveInvalidXMLChars" through the "Word_Export_SetOptions" helper. I still have to document this new helper action in the OfficeUtils demo app.

Best,

Bruno

UserImage.jpg
Bruno Gonçalves

Glad that someone acknowledges the efforts :) Thanks Evert!

I never encontered this issue, but after a short research I have an idea on what it can be:

Since docx files are basically a collection of XML documents, the XML character restrictions have to be accounted for. Most likely, the Office Template underlying library (Open XML SDK) is checking for XML invalid characters, while the OfficeUtils one (NPOI) is not.

The list of allowed XML 1.0 characters is:

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

Thus an immediate solution could be to remove automatically the chars that are outside of these ranges before passing the strings to OfficeUtils.

A quite detailed discussion on this topic:

https://stackoverflow.com/questions/730133/what-are-invalid-characters-in-xml

I still want to do some testing to confirm this.

Best,

Bruno

2025-11-03 12-56-18
Evert van der Zalm
 
MVP

Bruno,

I know how much work it is so really appreciated (also the quick reaction). 

Thanks for explaining the difference between the components, I didn't have time to check the underlying logic. Removing the characters is indeed what we're doing now but since we're gathering some data from different sources to the word it is a little time consuming but for now that indeed works.

Regarding the (final) solution, as an user of the component would be great if we'll have an option when passing the data:

1) All special characters are ignored (and thus removed or converted)

2) Raise an error if special character is found (with information which character and which/where in the text is was)

By this I as an user can always determinate what to choose, so for uses I can let it generate and it's fine, for maintence I can let the error raise so they can check. 

Another though, just add to the description which characters are not allowed (or what I read from the stack overflow, needs to be encoded) and how to check these in you're own logic so I as user now what to do.

If you'll need support, send me a PM.

Kind regards,
Evert


UserImage.jpg
Bruno Gonçalves

Hi Evert,

I agree! It makes total sense that the component itself handles this situation, thinking on simplicity of usage.

And your proposed solutions are also aligned with my initial thoughts, which were:

  • As the component default behaviour validate the characters, and if any forbidden char is detected provide informative feedback (i.e. error description, placeholder, and char).
  • Add a boolean (RemoveInvalidXMLChars) to the interface, so that the user can decide for any invalid XML characters to be removed/replaced automatically.

I will definitelly include this functionality in one of the upcomming releases. Just not sure if it will be on the next one, as it is comming to be too "big" already :)

Thank you for bringing this up! I will keep you posted...

Best,

Bruno


UserImage.jpg
Bruno Gonçalves
Solution

Hi Evert,

Just to let you know that I have just released a new version of OfficeUtils (5.3.0) that provides the validation of invalid XML characters, and an option to replace them automatically.

I got the impression that you might be using OfficeUtils in ODC, and in this case the new version is currently waiting for approval, which should not take more than a couple of days based on my experience.

You can set the option "AutoRemoveInvalidXMLChars" through the "Word_Export_SetOptions" helper. I still have to document this new helper action in the OfficeUtils demo app.

Best,

Bruno

2025-11-03 12-56-18
Evert van der Zalm
 
MVP

Just to repond here as well.


Let me start with saying you really did it quickly so big thanks for this contribution! It really takes time and I'll know its all free time here so thanks for this community contribution. Also the chosen solution sound great.

I don't have the time yet on the project to upgrade the component but will definitelybe on the backlog. I will keep it updated here!

Kind regards,
Evert

UserImage.jpg
Bruno Gonçalves

You're very welcome, Evert! And looking forward to hear the news from you.

Best,

Bruno

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.