Programmatically fetch HTML page from website

Back to Forums

Erwin van Rijsewijk

Champion

Question

Reactive

HTML

Service

Application Type

Reactive, Service

I want to fetch a HTML page from a website and store it in an entity.
The catch is: the webpage uses a lot of Javascript to build the webpage. So just getting the source does not the trick.
How can I fetch a webpage as seen in my browser, thus with all Javascript executed and then getting the result (just the HTML).

14 Dec 2022

Kilian Hekhuis

MVP

Hi Erwin,

There's no use storing the HTML only, as without the CSS it won't render correctly. I think your best bet is to use an HTML to PDF converter like Ultimate PDF and store that.

14 Dec 2022

3 replies

Last reply 14 Dec 2022

Show thread

Hide thread

Erwin van Rijsewijk

Champion

Hi Kilian, it is not a requirement to render it back correctly. I'm only interested in some data in certain tags.
Think about this imaginary use case:

Open a website with weather predictions.
Fetch the HTML as Chrome would see it.
Store the HTML
Do some nice regular expression stuff to grab the temperature of today and do something smart with that.

14 Dec 2022

Kilian Hekhuis

MVP

Replying to Erwin van Rijsewijk's comment on 14 Dec 2022 09:24:23

Ah, right. In that case, I'd not store the HTML, but process it directly to grab the data you need. But either way, you need to crawl the browser document, which is a problem, as that lives only client side. Server side, no JavaScript will be run, so you can only get the actual HTML document via HTTP. So I think you have a problem there with the JavaScript, if you are trying to fetch data server-side.

14 Dec 2022

Erwin van Rijsewijk

Champion

Replying to Kilian Hekhuis' comment on 14 Dec 2022 10:28:39

Kilian, I 've just tested UltimatePDF and it can print the page I want to save, but there is a Cookie banner of the website in it .... so I have to find out how and which cookies to send in the request :) Maybe this is gonna work ;-)
And if it works I have to find out how to get the data I want out of the PDF. Nice to experiment, it's not a real customer testcase (yet)

14 Dec 2022

Ravi Punjwani

Hi Erwin,

Have you read this article on web scraping?

https://www.outsystems.com/blog/posts/web-scraping-tutorial/

I think this will help you figuring out your query.

14 Dec 2022

1 reply

14 Dec 2022

Show thread

Hide thread

Erwin van Rijsewijk

Champion

Hi Ravi, I will definitive dive into this article to see if this is the solution!

14 Dec 2022

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.

See the full guidelines