Import information from a HTML page to DB and export to Word

Import information from a HTML page to DB and export to Word

  
I need to now if it's possible import information from a html page to a DB, and than build a personalized report (Word) with outsystems??

Best Regards,

Artur
Hi Artur,

First of all, welcome to out community!

From what I see, the operation you want must be done in four steps:
1 - Retrieve the HTML of the page that contains the information you need (the page to scrap)
2 - Parse the HTML in order to have the information in an understandable format
3 - Save the information in the database
4 - Generate a Word document

And the answer is: yes! All this can be done with Outsystems. Let's see it step by step.

1 - Retrieve the HTML of the page that contains the information you need (the page to scrap)

In this step, all you have to do is to retrieve the HTML that is returned by an URL to an Outsystems Service Studio string. That can easily be done using the HttpGet action in the HttpRequestHandler solution (http://www.outsystems.com/NetworkSolutions/solutiondetail.aspx?ProjectId=63) . Simply pass the URL to the action and you're done.

2 - Parse the HTML in order to have the information in an understandable format

In this step you'll have to parse the HTML string you got in step 1 and store it an outsystems Entity that represents the data you want to retrieve. The correct way to do it will vary depending on your needs and on the the nature of the HTML you're parsing. For this you can use:
- The outsystems XML extension (in case the page is XHTML - see it at http://www.outsystems.com/NetworkSolutions/solutiondetail.aspx?ProjectId=55)
- The outsystems Text extension (has several text processing actions - see http://www.outsystems.com/NetworkSolutions/ProjectDetail.aspx?ProjectId=553)
- Outsystems built-in text functions (search, substring, etc.)
- If you have very specific needs, you can write an outsystems extension yourself and take advantage of the full C# (or Java) power to process the HTML text.

3 - Save the information in the database

After parsing the document in step two, this one is easy. Simply save the entity with the processed data in the database using the corresponding Entity action.

4 - Generate a Word document

You can do this one by using the outsystems MailMerge extension (see http://www.outsystems.com/NetworkSolutions/ProjectDetail.aspx?ProjectId=61). With this extension you'll be able to execute the typical MSWord Mail Merge operation using a word document with the merge fields (the template) and an excel file will the fields values to fill in (you can generate this excel file by using the Service Studion RecordListToExcel feature) Be aware that this extension only Works up to MSWord 2003.

In case the excel data you need for the WordMerge opertaion is complex (for example, the fields of the excel vary) the Service Studion RecordListToExcel feature will not be enough (because the structure is rigid). In this case you can use the DocServices.oml eSpace that is contained in the DocumentManagement solution that simplifies the merge operation and generates the necessary excel on the fly for you (check it out at http://www.outsystems.com/NetworkSolutions/projectdetail.aspx?ProjectId=81). The drawback here is that you'll need to understand how to use the DocServices eSpace (i.e. use it only if necessary! Otherwise, stick to the extension only).

The bad news is that if you need to generate MSWord2007 files, you're on your one. The MailMerge extension does not support MSWord2007. If this is the case, you'll have to write an Outsystems extension to perform the Word Document generation yourself.

I hope this indformation is useful,

Best Regards,

Daniel Lourenço
Hello all

Note that, from the above links, HTTPRequestHandler and Text no longer work. That is because these two components are now part of Enterprise Manager, where the most recent versions of these extensions are found.

Make sure to get them from there.