25
Views
5
Comments
Convert Binary Data from upload file to text
Question
Application Type
Reactive

Hi,

I have an upload file input accepting word document. Upon upload, the file is converted into binary data.

I want to get convert that binary data into text the same as what the word doc has in order to find and replace some text.

I've used the BinaryDataToText, but seems the output is some weird text.

Any help is greatly appreaciated!

mvp_badge
MVP
Rank: #2

Hi Esrom,

I think you are confusing a few things. A Word document, a file with a .docx extension, is a binary file, that contains the text inside the document, but also all the formatting, fonts used, styling, undo history, any images, and so on and so forth. The actual text you see when you open the document is not easily available.

For example, the content a document with the following text:

is stored in a .docx file, which is actually a ZIP archive, that looks like this when opened:

In the word folder, there's these files:

The text you are looking for is inside the documents.xml file, and looks like this:

When you use BinaryDataToText, you are effectively telling the app "treat this binary data as if it were text". But that doesn't do you much good, as you experienced, you get garbage. Because what you are interpreting as text is the ZIP archive that's the .docx file.

What you need is a piece of software that knows how to manipulate .docx files. Luckily, there's such a Forge component, MSWordUtils, that allows you to do just this. I'd advise you to install it, study the documentation, and then you can probably do what you intended to.