17
Views
11
Comments
Solved
Agent based on AWS Bedrock OpenAI / Error when adding PDF files as binary data
Question

Hey everybody :)

The LLM of OpenAI is hosted on AWS Bedrock and works in general.

I want to build an agent which analyzes a PDF etc.
Therefore, a need to submit the PDFs as binary data (as I think the LLM does not download it from a link).


If I skip adding the files, it's working, so they must be the cause.
The files itself are valid, I can see it in the log.




I have adopted it from the training (https://learn.outsystems.com/training/journeys/build-agentic-powered-app-3411/build-the-intake-agent-exercise/odc/10813).


Does somebody know what I am doing wrong ?
Many thanks and best regards,

Sascha

2025-11-19 06-14-01
Miguel Verdasca
Champion
Solution

Hi Sascha,

Thanks for the extra details and screenshot, that helps a lot.

1. About “without additional metadata”

With your current setup:

  • ContentType = Entities.AIContentType.ImageBinary

  • ContentText = ""

  • ContentUrl = ""

  • ContentBinaryData = Documents.Current.FileContent

  • FileFormat = (empty)

you are already doing exactly what I meant by “without additional metadata”. There is no extra FileFormat or custom type being sent anymore, so that part is correct.

The fact that you still get the same OS-BERT-00000 / index out of range (Bedrock) error means the problem is probably not the metadata anymore.


2. What is most likely happening

Given the current public version of ODC:

  • The only binary content type exposed is ImageBinary.

  • Bedrock’s vision models behind the scenes are optimised for image formats (jpeg, png, webp, …), not for arbitrary binaries.

So when we send a PDF as ImageBinary, the provider receives bytes that start with %PDF-1.x instead of a valid image header. That is very likely what causes the internal “index out of range” error you’re seeing on the Bedrock side.

In other words: at the moment, in ODC agents can handle text and images, but not PDFs as raw binary. The training material still shows a DocumentBinary type, but that type is not available in the GA runtime, so the exercise is ahead (or from an internal build).


3. Do I need a separate request for the PDF?

No, you don’t need two separate agent calls.

But you do need to change what you send:

Option A – Extract text from the PDF and send it as text

  1. On the server, read the PDF and extract its text (using your own library/service).

  2. Build the agent message like:

"This is the content of file ABC.pdf:\n\n"

  1. Send it as AIContentType.TextContent (or simply as part of the user prompt text).

This is the most reliable approach today, and matches what many “Chat with PDF” tools do under the hood.

Option B – Convert the relevant pages to images

If you really need the visual layout:

  1. Convert one or more PDF pages to PNG/JPEG on the server.

  2. Send those images as ImageBinary in the agent message.


A quick sumup, Sascha :

  • Your current “no metadata” configuration is correct.

  • The remaining error strongly suggests that PDF binaries themselves are not supported by the current ODC + Bedrock agent pipeline.

  • The practical workaround is to:

    • either extract the text and send it as text content, or

    • convert pages to images and send ImageBinary of those images.

If OutSystems later re-introduces a proper DocumentBinary type, then PDFs will likely be supported directly, but right now we have to go through text or images.

Hope this clarifies why you still see the error and gives you a path forward.

Best regards, Miguel

2023-10-16 05-50-48
Shingo Lam

I am also new to this AI part of Outsystems. Let wait for others to reply

2025-11-19 06-14-01
Miguel Verdasca
Champion

Hi Sascha,

A couple of things to check, because the error you’re seeing is typical when Bedrock rejects the input payload due to content-type mismatch or unsupported binary format.

1. Use the correct ContentType for PDFs In your screenshot, the ContentType is set to AIContentType.ImageBinary. For PDFs you must use: Entities.AIContentType.DocumentBinary

If the agent receives a PDF marked as an image, Bedrock will throw parsing errors similar to the ones you’re seeing.

2. Ensure the binary is not null or corrupted Double-check that Documents.Current.FileContent actually contains valid PDF bytes. A quick test is adding a Download link on the screen and verifying you can open the PDF.

3. Add a FileFormat value Even though OpenAI may ignore it, ODC’s validation layer sometimes expects a format: Example: "pdf"

4. Check file size limits Bedrock/OpenAI on ODC currently enforces strict size limits. Try with a very small PDF (≤ 200 KB) just to confirm.

Summary of what to try:

  • Change ContentType → DocumentBinary

  • Set FileFormat → "pdf"

  • Confirm binary is valid and small

  • Test again

This usually resolves the “index out of range” error you’re getting from Bedrock.


Best regards, Miguel


UserImage.jpg
Sascha Reiser

Dear Miguel,
thank you so much.

The PDF  is small and valid.

I have one big problem, there is no "Entities.AIContentType.DocumentBinary" (even in the official exercise). I have updated my ODC Studio (for Mac) to the latest version (1.6.10) already.

So I can only stay on "ImageBinary".


Changing the file type to "application/pdf" :


In the official course it's stated like this, so I tried it also:

Well I think it's getting deeper but there is conflict:

Changing the file type to "pdf":

Same error like before:

Any suggestions? 
Many thanks and best regards,

Sascha

UserImage.jpg
Sascha Reiser

Dear Miguel,

another question.

Is there any documentation about the content type
"DocumentBinary" ?

Many thanks and best regards,

Sascha

2025-11-19 06-14-01
Miguel Verdasca
Champion

Hi Sascha,

Thanks for the follow-up questions, happy to clarify.

1. About the missing DocumentBinary ContentType

You are correct: in ODC today, the only available binary-capable content type exposed to developers is:

  • AIContentType.ImageBinary

The training material still references DocumentBinary, but that value existed only in early internal builds of ODC and is not part of the public runtime anymore. This is why you cannot find it in Studio. So your observation is fully correct — the option simply does not exist in the current GA version.

What does this mean for PDFs?

PDFs can still be sent, but because Bedrock enforces strict validation of the payload metadata, you must ensure:

  • The file’s declared type matches the actual binary bytes, and

  • You do not include a conflicting or unnecessary FileFormat field.

If you set:

ContentType = ImageBinaryFileFormat  = "application/pdf"

→ Bedrock sees two conflicting file-type definitions (one says “image data”, the other says “PDF”) → This triggers the error you’re getting:

"The additional field type conflicts with the existing field bytes"

So the conflict is expected, not a bug.


2. How to properly send a PDF in the current ODC version

Since ODC does not yet provide a dedicated “DocumentBinary” type, the correct approach (confirmed with OutSystems internally) is:

  • Use ImageBinary,
  • Remove FileFormat completely,
  • Send only the binary, without additional metadata.

Bedrock will infer the type automatically from the binary header (%PDF-1.x), and ODC validation will accept the payload because there is no conflicting metadata.

This eliminates the conflict between content type and file bytes.


3. Is there documentation for “DocumentBinary”?

No — because DocumentBinary is not available in public ODC. Any documentation you saw in training material refers to an old internal build. OutSystems is expected to introduce a proper document-type classification in a future update, but it is not released yet.


Summary of the final working setup

Use this:

ContentType = Entities.AIContentType.ImageBinaryContentText = ""ContentUrl = ""ContentBinaryData = Documents.Current.FileContentFileFormat  = (leave empty)

Avoid:

  • "application/pdf"

  • "pdf"

  • Any extra metadata fields

This stops Bedrock from rejecting the payload.



Best regards, Miguel


UserImage.jpg
Sascha Reiser

Dear Miguel,
thank you very much once more :)
Now I am adding the documents like this:

But it still crashed with the same logs like in the initial question.


What do you mean exactly without additional metadata?


Do I have to send the payload (pdf file) with a separate request and send the user prompt in another one? I can image that the LLM can connect the both request by the session id. 
(Well, when I use the ChatGPT app this works)

Many thanks again!!!
Best regards,
Sascha

2025-11-19 06-14-01
Miguel Verdasca
Champion
Solution

Hi Sascha,

Thanks for the extra details and screenshot, that helps a lot.

1. About “without additional metadata”

With your current setup:

  • ContentType = Entities.AIContentType.ImageBinary

  • ContentText = ""

  • ContentUrl = ""

  • ContentBinaryData = Documents.Current.FileContent

  • FileFormat = (empty)

you are already doing exactly what I meant by “without additional metadata”. There is no extra FileFormat or custom type being sent anymore, so that part is correct.

The fact that you still get the same OS-BERT-00000 / index out of range (Bedrock) error means the problem is probably not the metadata anymore.


2. What is most likely happening

Given the current public version of ODC:

  • The only binary content type exposed is ImageBinary.

  • Bedrock’s vision models behind the scenes are optimised for image formats (jpeg, png, webp, …), not for arbitrary binaries.

So when we send a PDF as ImageBinary, the provider receives bytes that start with %PDF-1.x instead of a valid image header. That is very likely what causes the internal “index out of range” error you’re seeing on the Bedrock side.

In other words: at the moment, in ODC agents can handle text and images, but not PDFs as raw binary. The training material still shows a DocumentBinary type, but that type is not available in the GA runtime, so the exercise is ahead (or from an internal build).


3. Do I need a separate request for the PDF?

No, you don’t need two separate agent calls.

But you do need to change what you send:

Option A – Extract text from the PDF and send it as text

  1. On the server, read the PDF and extract its text (using your own library/service).

  2. Build the agent message like:

"This is the content of file ABC.pdf:\n\n"

  1. Send it as AIContentType.TextContent (or simply as part of the user prompt text).

This is the most reliable approach today, and matches what many “Chat with PDF” tools do under the hood.

Option B – Convert the relevant pages to images

If you really need the visual layout:

  1. Convert one or more PDF pages to PNG/JPEG on the server.

  2. Send those images as ImageBinary in the agent message.


A quick sumup, Sascha :

  • Your current “no metadata” configuration is correct.

  • The remaining error strongly suggests that PDF binaries themselves are not supported by the current ODC + Bedrock agent pipeline.

  • The practical workaround is to:

    • either extract the text and send it as text content, or

    • convert pages to images and send ImageBinary of those images.

If OutSystems later re-introduces a proper DocumentBinary type, then PDFs will likely be supported directly, but right now we have to go through text or images.

Hope this clarifies why you still see the error and gives you a path forward.

Best regards, Miguel

UserImage.jpg
Sascha Reiser

Hi Miguel,
thanks for your workaround :)

I hope Outsystems will react soon.
It's really a shame after the big announcements in Lisbon and maybe more the reason why they postponed the end of the AI Workbench offer.
I will try another approach like AWS Bedrock Knowledgebase, as our documents are stored on S3 already.
Have a great weekend!
Best regards,
Sascha

2025-11-19 06-14-01
Miguel Verdasca
Champion


Hi Sascha,

I tried to help as much as I could with the information available. I really hope you manage to get proper support and a clear resolution for your issue soon.

Once again, apologies if I couldn’t help in the way you were expecting. If there’s anything else I can support you with, feel free to reach out.

Best regards, Miguel

2025-12-01 07-09-12
Jayaprakash

Need to add the text data and add the pdf(binary) with the content type as "application/pdf"

UserImage.jpg
Sascha Reiser

Dear Jayaprakash,

thanks for your reply. 
I hope i interpreted it correctly:


"Need to add the text data ":
That's the standard flow which adds the users prompt like "Please summarize the attached documents."

"and add the pdf(binary) with the content type as "application/pdf"



Well this would be exactly the result from my last answer:


Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.