read table from html page

Linguo You

Question

in python there is a nice funtion to read tables from html pages, is there anything in outsystems that can do the same?

import pandas as pd

	tables = pd.read_html("https://apps.sandiego.gov/sdfiredispatch/")

25 Sep 2017

Suraj Borade

Hi Linguo

It's not clear from the example what you want to achieve. Please provide more explanation.

25 Sep 2017

Eduardo Jauch

Hello Linguo,

This is a feature of the panda library, right?

OutSystems do not provide anything to do that in the system installation. Maybe you can find something in the forge, but I don't know any.

If you know of a C# or Java library (depending on your stack), you can create an extension.

Cheers

Eduardo Jauch

25 Sep 2017

Paulo Ramos

Staff

Hi Linguo,

There's a built-in library that allows you load HTML content, given its URL: reference action HttpGet from extension RichMail. If you need a more powerful library (e.g. allowing POST HTTP verb), try component ardoHTTP from the Forge.

Note that you'll still have to parse the HTML content on your own or using a screen scraping library.

25 Sep 2017

1 reply

25 Sep 2017

Show thread

Hide thread

Linguo You

Paulo Ramos wrote:

Hi Linguo,

Note that you'll still have to parse the HTML content on your own or using a screen scraping library.

OHHH Paulo, long time no see, hope you are doing great :) still in Singapore?

great help as always, httpGet solved the problem. only thing is that it does not process the content got back.

but still I am able to automatically get toto result from official website and update it to my app

thanks all for the tips.

25 Sep 2017

Sekar

Hi Guys,

There is a jQuery Plugin, to load Content from HTML(URL)

https://api.jquery.com/load/

Sekar

25 Sep 2017

Linguo You

i am able to get the numbers in the blue circle based on their class id. but i cannot find the red text in the red circle in the html content got from httpGet, the same when i view page source, it is not there.

any idea how to find it?

https://www.singaporepools.com.sg/en/product/sr/Pages/toto_results.aspx?sppl=RHJhd051bWJlcj0zMzAy

25 Sep 2017

Paulo Ramos

Staff

Hi Linguo,

Nice to hear from you. Yes, still enjoying Singapore. :)

HttpGet will get you the raw static HTML. For simple stuff, it may be easy enough to search and parse the information that you need.

Regarding the red text, it's not on the page itself - it's being retrieved by an Ajax request, after the page loads. Using Chrome's inspector (Network tab) I was able to see the request to:

https://www.singaporepools.com.sg/DataFileArchive/Lottery/Output/toto_next_draw_estimate_en.html?v=2017y9m25d17h30m

This returns:


	<div style='vertical-align:top;'>
	<div>
	<div style='float:left; width:120px; font-weight:bold;'>
	Next Jackpot
	</div>
	<span style='color:#EC243D; font-weight:bold'>$2,200,000 est</span>
	</div>
	<div>
	<div style='float:left; width:120px; font-weight:bold;'>
	Next Draw
	</div>
	<div class='toto-draw-date'>Mon, 25 Sep 2017 , 6.30pm</div>
	</div>
	</div>

In this case, you probably can build a separate request for retrieving this value (note how the URL is built, including the date and time).

Another option would be using a 3rd party screen scraping library. These are more complex and powerful, and may be paid (and can be overkill if you can get away without it).

Of course, the best option would be getting these values from an API. The UI can change without notice. :)

25 Sep 2017

1 reply

25 Sep 2017

Show thread

Hide thread

Linguo You

Paulo Ramos wrote:

Hi Linguo,

Nice to hear from you. Yes, still enjoying Singapore. :)

HttpGet will get you the raw static HTML. For simple stuff, it may be easy enough to search and parse the information that you need.

Regarding the red text, it's not on the page itself - it's being retrieved by an Ajax request, after the page loads. Using Chrome's inspector (Network tab) I was able to see the request to:

https://www.singaporepools.com.sg/DataFileArchive/Lottery/Output/toto_next_draw_estimate_en.html?v=2017y9m25d17h30m

This returns:


	<div style='vertical-align:top;'>
	<div>
	<div style='float:left; width:120px; font-weight:bold;'>
	Next Jackpot
	</div>
	<span style='color:#EC243D; font-weight:bold'>$2,200,000 est</span>
	</div>
	<div>
	<div style='float:left; width:120px; font-weight:bold;'>
	Next Draw
	</div>
	<div class='toto-draw-date'>Mon, 25 Sep 2017 , 6.30pm</div>
	</div>
	</div>

In this case, you probably can build a separate request for retrieving this value (note how the URL is built, including the date and time).

Another option would be using a 3rd party screen scraping library. These are more complex and powerful, and may be paid (and can be overkill if you can get away without it).

Of course, the best option would be getting these values from an API. The UI can change without notice. :)

thank you so much

I am monitoring the result every day just in case they change the UI and hopefully not very often.

25 Sep 2017

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.

See the full guidelines