read table from html page

read table from html page

  

in python there is a nice funtion to read tables from html pages, is there anything in outsystems that can do the same?


import pandas as pd



tables = pd.read_html("http://apps.sandiego.gov/sdfiredispatch/")

Hi Linguo

It's not clear from the example what you want to achieve. Please provide more explanation.

Hello Linguo,

This is a feature of the panda library, right?

OutSystems do not provide anything to do that in the system installation. Maybe you can find something in the forge, but I don't know any.

If you know of a C# or Java library (depending on your stack), you can create an extension.

Cheers

Eduardo Jauch

Hi Linguo,

There's a built-in library that allows you load HTML content, given its URL: reference action HttpGet from extension RichMail. If you need a more powerful library (e.g. allowing POST HTTP verb), try component ardoHTTP from the Forge.

Note that you'll still have to parse the HTML content on your own or using a screen scraping library.

Hi Guys,

There is a jQuery Plugin, to load Content from HTML(URL)

http://api.jquery.com/load/

Sekar



Paulo Ramos wrote:

Hi Linguo,

There's a built-in library that allows you load HTML content, given its URL: reference action HttpGet from extension RichMail. If you need a more powerful library (e.g. allowing POST HTTP verb), try component ardoHTTP from the Forge.

Note that you'll still have to parse the HTML content on your own or using a screen scraping library.

OHHH Paulo, long time no see, hope you are doing great :) still in Singapore?

great help as always, httpGet solved the problem. only thing is that it does not process the content got back.

but still I am able to automatically get toto result from official website and update it to my app


thanks all for the tips.



i am able to get the numbers in the blue circle based on their class id. but i cannot find the red text in the red circle in the html content got from httpGet, the same when i view page source, it is not there.

any idea how to find it?

http://www.singaporepools.com.sg/en/product/sr/Pages/toto_results.aspx?sppl=RHJhd051bWJlcj0zMzAy



Hi Linguo,

Nice to hear from you. Yes, still enjoying Singapore. :)

HttpGet will get you the raw static HTML. For simple stuff, it may be easy enough to search and parse the information that you need.

Regarding the red text, it's not on the page itself - it's being retrieved by an Ajax request, after the page loads. Using Chrome's inspector (Network tab) I was able to see the request to:

http://www.singaporepools.com.sg/DataFileArchive/Lottery/Output/toto_next_draw_estimate_en.html?v=2017y9m25d17h30m

This returns:




                        <div style='vertical-align:top;'>

                            <div>

                                <div style='float:left; width:120px; font-weight:bold;'>

                                    Next Jackpot

                                </div>

                                <span style='color:#EC243D; font-weight:bold'>$2,200,000 est</span>

                            </div>

                            <div>

                                <div style='float:left; width:120px; font-weight:bold;'>

                                    Next Draw

                                </div>

                                <div class='toto-draw-date'>Mon, 25 Sep 2017 , 6.30pm</div>

                            </div>

                        </div>


In this case, you probably can build a separate request for retrieving this value (note how the URL is built, including the date and time).

Another option would be using a 3rd party screen scraping library. These are more complex and powerful, and may be paid (and can be overkill if you can get away without it).

Of course, the best option would be getting these values from an API. The UI can change without notice. :)

Paulo Ramos wrote:

Hi Linguo,

Nice to hear from you. Yes, still enjoying Singapore. :)

HttpGet will get you the raw static HTML. For simple stuff, it may be easy enough to search and parse the information that you need.

Regarding the red text, it's not on the page itself - it's being retrieved by an Ajax request, after the page loads. Using Chrome's inspector (Network tab) I was able to see the request to:

http://www.singaporepools.com.sg/DataFileArchive/Lottery/Output/toto_next_draw_estimate_en.html?v=2017y9m25d17h30m

This returns:




                        <div style='vertical-align:top;'>

                            <div>

                                <div style='float:left; width:120px; font-weight:bold;'>

                                    Next Jackpot

                                </div>

                                <span style='color:#EC243D; font-weight:bold'>$2,200,000 est</span>

                            </div>

                            <div>

                                <div style='float:left; width:120px; font-weight:bold;'>

                                    Next Draw

                                </div>

                                <div class='toto-draw-date'>Mon, 25 Sep 2017 , 6.30pm</div>

                            </div>

                        </div>


In this case, you probably can build a separate request for retrieving this value (note how the URL is built, including the date and time).

Another option would be using a 3rd party screen scraping library. These are more complex and powerful, and may be paid (and can be overkill if you can get away without it).

Of course, the best option would be getting these values from an API. The UI can change without notice. :)

thank you so much    

I am monitoring the result every day just in case they change the UI and hopefully not very often.