Corrupt docx files after transfer to SQL

Corrupt docx files after transfer to SQL

Our Outsystems app is an online job application site in the cloud (.net/oracle).  As a part of the application process, an individual can upload their resume/CV as well as several other types of documents.  After someone submits their application an action creates a record for them in our internal HR software which a linked Database connection (via an extension).  It then copies over the documents (the UploadDoc.Content binary saved when they upload the file is assigned to the external entity binary field - as well as the other document attributes and then the CreateOrUpdate action is used to create it).  The HR Software is MS SQL Server.

This works great for pdf, tiff, jpg etc..., but we find when we try to open .docx files Word is telling us the document is corrupt.  If we allow Word to recover the contents, it recovers the document just the way it was uploaded.  I don't get the corrupt error when I download directly from the cloud, just when I open the document I transferred over from our app.

Any idea why?

The file corruption may be happening on all your files, but you just don't know about it.  Xml files are much more sensitive to "trailing garbage" at the end of the file, whereas files such as jpg and pdf files will ignore anything beyond the end of the actual document. The trailing garbage is sometimes inserted into the binary stream because the response stream is not properly closed after the file is transmitted to the client.

In a regular asp application, you would prevent the "trailing garbage" from being added to the downloaded binary stream, by sending a "Response.End" directly after the "Response.BinaryWrite" command.

I don't know exactly how your are downloading the file from the final destination in order to open it, but you may want to check your file size at the various stages in the process to see if it changes at all.

Hope this helps

I just compared the size of the file from the app in the cloud to the size of the file from the download in the HR software.  For the one I am looking at, the corrupt file is one byte smaller (120,735 vs 120,734).

I'm really not sure what to do.
Hi Cory,

If you just upload it to a local entiry using a Create action and then download it on an webscreen, do you also have the same problem?
I'm trying to figure out on what part of the process you are losing the byte, so try to do it in little steps.

If that also keeps failing try to do put a download node directly in the screen action that you are doing the upload. To see if it's a problem on the upload.

João Rosado
The problem isn't in the upload.  If I download the document from within our OutSystems app via a download node, it downloads fine and has the right file size.  It definitely seems to be in the action that transfers the document from the OutSystems cloud (oracle) to our internal HR System (MS SQL).  I'm looking for a few other examples to see if it is always a single byte.
Here is what the action looks like (doclib is in the HR system):

Now this is interesting - when I do a dbms_lob.getlength(attachment) in oracle for the BLOB and compare it to the datalength(attachment) in MS SQL, they are the same size.  The file sizes I referenced above are for the documents after I download them from OS or HR.  

So it appears as though the data is transferring correctly.
It is something in the HR system.  When I take a file that is 120,735 bytes (not corrupt) and uplod it into the HR system (from within the HR system, not from Outsystems), it stores the file as 120,736 bytes, but when I download it from the HR system the downloaded file is 120,735 bytes.

I'm trying to walk through the screen that does it in HR (I'm not a .net programmer) but there is a vb function that does the following before assigning the data to the database field:

Public Function DOC_GetByteArrayFromFileField( _
  ByVal FileField As System.Web.UI.HtmlControls.HtmlInputFile) _
  As Byte()
  ' Returns a byte array from the passed file field controls file
  Dim intFileLength As Integer, bytData() As Byte
  Dim objStream As System.IO.Stream
  If DOC_FileFieldSelected(FileField) Then
    intFileLength = FileField.PostedFile.ContentLength
    ReDim bytData(intFileLength)
    objStream = FileField.PostedFile.InputStream
    objStream.Read(bytData, 0, intFileLength)
    Return bytData
  End If
End Function

I'm going to try to reach out to the company that developed the HR system and see if they can offer any insight, but this is probably a little advanced for their support team and it isn't easy to get to someone in development.  
Curious, do you notice any difference in this process between the older Word docs (.doc) (ie.2007) vs. the new .docx format files?
Next step, I looked to see what Mime-Type I was assigning when downloading it from OS - I always assigned "application/octet-stream".  The HR system assigns a Mime-Type based on the extension of the file (it stores the extension in a field called file_type).  For .docx it assigns "application/vnd.openxmlformats-officedocument.wordprocessingml.document".  I did a little manipulation and changed the document file_type in the HR database to "1.docx" so the system would assign the default of "application/octet-stream" and it comes up in Word just fine.  

I just went and modified my download from OS to assign the correct mime-type and it opens just fine - go figure.


Gerry wrote:
Curious, do you notice any difference in this process between the older Word docs (.doc) (ie.2007) vs. the new .docx format files?
 Older ones open just fine.
Hi Cory,

This usually happens when the file extension and the mime-type don't match. In a previous experience I would download a .docx with application/msword mime-type. When opening in Word it would complain about it. Here's a link with mime-types for word documents:
A think that the usual trick is to send the content as octect stream and let Word figure it out on its own.