70
Views
8
Comments
Fetch more records in API
Application Type
Reactive
Service Studio Version
11.54.54 (Build 63219)

Hello all,

I have a requirement that i need to consume the API. For single call i will be getting 300k records. also i need to store all those in OS tables.

During this process, Lets say after fetching 10000 records, if there is error, i need to recall the API and make sure i shouldn't fetching the existing records. 

Any idea how to implement this flow?


Regards,

Saravana

2022-07-11 14-05-36
Jeroen Barnhoorn

Hi Saravana,

Does your API support some form of pagination? Cause for those amounts of records, I would highly recommend splitting your call into multiple calls. That way you can avoid time-outs and it makes it much easier to perform a retry in case of an error.

Also, you might want to look into these resources:
https://learn.outsystems.com/training/journeys/architecture-patterns-407/core-module-patterns/o11/677
https://success.outsystems.com/documentation/best_practices/architecture/designing_the_architecture_of_your_outsystems_applications/integration_patterns_for_core_services_abstraction/


UserImage.jpg
Gowtham Raj

This really sounds good. But I need one clarification on this.  By splitting into multiple call, we can avoid time-out, right?  Every time I call the API, it would giving me all records from 0 to 300k, right?
lets say for the first call, i stored 1000 records. For next call, i should store the records from 1001 to remaining records or set of records. How can i implement this ? Any suggestions  ?

2024-09-14 05-42-00
Ozan Can Çali
Champion

Hi Saravana,

Can you clarify the part where you said you would be receiving 300K records in a single call, but then you said you want to do a check after fetching 10K records? Do you mean after saving the 10K records into your database?

Also, for what you want, the API you are consuming should have the ability to "filter" certain records before returning them, such as an input list with "IDs already returned". Then you would feed the records that you already received earlier and it would not return their details again. But I don't think the API would have that, seems like a strange functionality.


UserImage.jpg
Saravana Kumar Sivaraj

Hi Ozan,

Thank you for the reply. 

Can you clarify the part where you said you would be receiving 300K records in a single call, but then you said you want to do a check after fetching 10K records? Do you mean after saving the 10K records into your database?  ---- I meant that 10k is not a static number. In very first API hit i may get 50k or 100k records. 

I need to store all the records in DB. Its fine if we are retrying multiple times until last records is fetch.

Hope you understand the flow

 

2024-09-14 05-42-00
Ozan Can Çali
Champion


Hi Saravana,

As others mentioned, I suggest you insert the results of the API call right after you receive them into an "APIOutput" entity etc. Also, when you have a lot of records (in the thousands), inserting them one by one with a loop will cause performance issues and timeouts. So, you can insert them with BulkInsert in a single time, or divide them into smaller sub-lists if the number is really high and insert each sub-list with BulkInsert.

I also want to point out that normally you insert all the results in one flow, and unless you use "CommitTrans", when an error occurs during the saving of a record or while getting the results from the API, an exception is thrown and all the previous inserts are lost (not committed).

Having an exclusive entity just to keep the API results also gives you the advantage to quickly see what is successfully saved into your database after an API call. So when you want to process these records afterwards, you select them from this entity, process them and save the results into other related entities.

It is probably not possible to get from the API only the data that is not yet saved into your database. The API would need to support this, for example with an input parameter where you can feed the records that you already inserted. And then the API should be intelligent enough not to return them again. You can check if the API has something like that, but I doubt it. This means, you cannot really improve the performance of the API, unless it supports pagination (does it have StartIndex and Count input parameters?). But you can improve the inserting/processing time on the OutSystems side.

In this case, each time you call the API, you can check its output against the records in your APIOutput entity. If a record already exists in your entity, you ignore that record and check the next one. If it doesn't exist yet, you add it to a local list. In the end, you bulk-insert this list. It is up to you to make the rules of this check. I would assume that each record has at least one unique field (or combination of multiple fields make a record unique) and you can use this uniqueness to check if you already saved this record before or not.

2019-01-07 16-04-16
Siya
 
MVP

You could fetch the 300,000 records and perform a BulkInsert into a temporary table to avoid timeouts and errors associated with creating entries in a loop. To prevent fetching the same record again from the API, implement a mechanism where you send IDs of records to be excluded in the call. This requires complex logic and API support for exclusion. The simplest approach is to retrieve the data, bulk insert it into a temporary table, and then conduct any necessary post-processing.

UserImage.jpg
Gowtham Raj

Hai Saravana, 
You could able to retrieve the 300k records from API without getting time out from API ?

UserImage.jpg
Afaque Shaikh

Hi,


There are 2 ways to achieve this,

If API supports Pagination the follow below steps:

 1. Pagination: If the API supports pagination, use it to fetch records in chunks (e.g., 10,000 at a time). This will help manage memory and reduce the risk of errors.

 2. Offset and Limit: Keep track of the offset (last record fetched) and limit (number of records to fetch) to ensure you don't fetch existing records.

 3. Error Handling: Implement error handling to catch exceptions. If an error occurs, store the last successful offset and recall the API with the next offset.

 4. Database Transactions: Use database transactions to ensure data consistency. If an error occurs, roll back the transaction to maintain data integrity.


If API does not support Pagination,  you can:

1. Fetch the entire dataset and store it as binary data in a file or database.

2. Use a timer to read the data in chunks, implementing a retry mechanism to handle timeout.

3. Store the last processed record index in a table, so if a timeout occurs, you can resume processing from the next index (last index + 1).

 

This approach allows you to:

 

- Handle large datasets without relying on pagination

- Implement a retry mechanism using the timer and last processed index

- Store the raw data for future reference or processing

 

Here's a high-level example of this approach:

 

1.Fetch_Data 

- Fetch the entire dataset from the API

- Store the raw data as binary in a file or database

2.Process_Data

- Use a timer to read the data in chunks

- Process each chunk, storing the last processed record index in a table

- If a timeout or error occurs, store the last index and wake the timer again.


This approach ensures that you can handle large datasets and recover from errors using the retry mechanism.


Also this link might help you to implement the timer with timeout mechanism,

https://www.outsystems.com/forums/discussion/81684/best-solution-of-timer/

I hope this will help.


Thanks,
Afaque Shaikh

 


Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.