110
Views
6
Comments
Programmatically fetch HTML page from website
Application Type
Reactive, Service

I want to fetch a HTML page from a website and store it in an entity.
The catch is: the webpage uses a lot of Javascript to build the webpage. So just getting the source does not the trick.
How can I fetch a webpage as seen in my browser, thus with all Javascript executed and then getting the result (just the HTML). 


2020-09-15 13-07-23
Kilian Hekhuis
 
MVP

Hi Erwin,

There's no use storing the HTML only, as without the CSS it won't render correctly. I think your best bet is to use an HTML to PDF converter like Ultimate PDF and store that.

2026-01-03 13-44-38
Erwin van Rijsewijk
Champion

Hi Kilian, it is not a requirement to render it back correctly. I'm only interested in some data in certain tags.
Think about this imaginary use case:

  • Open a website with weather predictions.
  • Fetch the HTML as Chrome would see it.
  • Store the HTML
  • Do some nice regular expression stuff to grab the temperature of today and do something smart with that.


2020-09-15 13-07-23
Kilian Hekhuis
 
MVP

Ah, right. In that case, I'd not store the HTML, but process it directly to grab the data you need. But either way, you need to crawl the browser document, which is a problem, as that lives only client side. Server side, no JavaScript will be run, so you can only get the actual HTML document via HTTP. So I think you have a problem there with the JavaScript, if you are trying to fetch data server-side.

2026-01-03 13-44-38
Erwin van Rijsewijk
Champion

Kilian, I 've just tested UltimatePDF and it can print the page I want to save, but there is a Cookie banner of the website in it .... so I have to find out how and which cookies to send in the request :) Maybe this is gonna work ;-)
And if it works I have to find out how to get the data I want out of the PDF. Nice to experiment, it's not a real customer testcase (yet)


2022-08-03 04-32-50
Ravi Punjwani

Hi Erwin,

Have you read this article on web scraping?

https://www.outsystems.com/blog/posts/web-scraping-tutorial/


I think this will help you figuring out your query.

2026-01-03 13-44-38
Erwin van Rijsewijk
Champion

Hi Ravi, I will definitive dive into this article to see if this is the solution!

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.