1317
Views
5
Comments
Convert Binary Data from upload file to text
Question
Application Type
Reactive

Hi,

I have an upload file input accepting word document. Upon upload, the file is converted into binary data.

I want to get convert that binary data into text the same as what the word doc has in order to find and replace some text.

I've used the BinaryDataToText, but seems the output is some weird text.

Any help is greatly appreaciated!

2020-09-15 13-07-23
Kilian Hekhuis
 
MVP

Hi Esrom,

I think you are confusing a few things. A Word document, a file with a .docx extension, is a binary file, that contains the text inside the document, but also all the formatting, fonts used, styling, undo history, any images, and so on and so forth. The actual text you see when you open the document is not easily available.

For example, the content a document with the following text:

is stored in a .docx file, which is actually a ZIP archive, that looks like this when opened:

In the word folder, there's these files:

The text you are looking for is inside the documents.xml file, and looks like this:

When you use BinaryDataToText, you are effectively telling the app "treat this binary data as if it were text". But that doesn't do you much good, as you experienced, you get garbage. Because what you are interpreting as text is the ZIP archive that's the .docx file.

What you need is a piece of software that knows how to manipulate .docx files. Luckily, there's such a Forge component, MSWordUtils, that allows you to do just this. I'd advise you to install it, study the documentation, and then you can probably do what you intended to.



UserImage.jpg
Esrom Galang

Thanks, Kilian, Just realized that now. I have that wordutils installed but haven't

't explore its documentation yet. 

2020-09-15 13-07-23
Kilian Hekhuis
 
MVP

Then I'd say go read it, and if you have any questions, post in the component's subforum, so that the component owners are notified and can answer you quickly. Good luck!

P.S. If you found my answer above valuable, please mark it as Solution, thanks!

UserImage.jpg
Esrom Galang

Its search and replace will do the trick for finding and replacing text. However, I still need to scan the document and get all the text enclosed in <>. Do you know any extension that I can use on this? 

2020-09-15 13-07-23
Kilian Hekhuis
 
MVP

I would assume that MSWordUtils should be able to assist you in that as well, but I'd advise you to ask on the subforum I linked to. If you can extract all text, you can also use regular expressions (from the Platform's Text Extension) or simply use IndexOf to search for the right characters.

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.