Hey there guys,

I want to compare two tables and, based on their discrepancies, update one of them. It is not possible to create one aggregate based on the two tables, because the two tables are in different databases.

So, I created two aggregates, one for each table, and then I compare their records count to see whether I need to proceed with comparing the records of the tables.

Is there an easy way to merge the two tables, avoiding comparing record by record?

Thanks!

Marios



Hello Marios,

What's the context? Depending on the "why" are you trying to do this, we may find an easier way of accomplishing what you need.

For now, with both tables in different databases, I would say the only way of doing what you are asking is to bring them and compare them... Of course, depending on the characteristics of those tables (both can be changed at their source? etc), the approach can be optimized.

Cheers.

Hey Eduardo,


Since the tables are in different databases (hence I can't create aggregates or SQL Queries by joining them), I need to cache the external. The external table's Read Only.

So, I'm planning to create a Timer based on a Server Action which will "merge" the two. I can't simply delete and repopulate the cached table, because I need all data, even the obsolete ones. 

I already found a way, but it requires multiple queries and aggregates:

  • Create an aggregate for the external table
  • Cycle through the records and
    • Get the record's primary key
    • Create an aggregate for the cached table based on the value above
    • If there is no record, then Create/Update The Record in the cached table
  • And perhaps do the inverse process.


That is what I'm trying to find a way to optimize.


Thanks!

Hum...

Still not sure if I understood.
You need the data from the external table. But it is Read Only. So, you will not "send back" any data. Is that right?

If it is, are you "changing it" in your application or just read it? 

If you are not changing the data, you can take two approaches, at least.

1. Delete all data in the cache table and replace it with the data in the READ ONLY external data. This works well if you have a small number of rows.

2. You try to optimize the update of the Cache entity.
a) Keep Id's not auto-number, to use those from the original data.
b) Fetch both entities with aggregates ordered by the Id
c) Do a normal looping using indexes for both lists. This way you can iterate both of them at different paces, incrementing the IDs when needed, and discovering if you need to add a new line from the original list or remove one from the cached entity.

If you can change the original table / code, you could add a "changed date time" attribute (or something similar), to fetch only new data and changed data since the previous syncing. This would really speed up things.

Cheers.

Eduardo,

Yes, the data from the external table is read-only. It's actually a View.

I've thought of the 2nd approach myself, but I have to do the same iteration twice, one for each list, which might take some time. And this is what I'm trying to avoid, if possible...


Thanks!



Hi,

In reality, no.

Let's assume that you don't change the data (only create new or delete) to simplify a bit.
Let's assume you have in the cache the records 1, 2, 3, 4, 5, 8, 9
In the view, you have 1, 2, 4, 5, 8, 9, 10, 12

So, let's start ONE looping with two indexes v (view) and c (cache) for the lists from the cache (LC) and from the view (LV).

v = 0, c = 0

IF LV[v].id = LC[c].id THEN
    c++, v++
ELSE IF LV[v].id > LC[c].id THEN
    DELETE LC[c]
    c++
ELSE IF LV[v].id < LC[c].Id THEN
    INSERT into LC ( LV[v] )
    v++
REPEAT

This way you iterate both lists at the same time.
(did this by head, my need adjustments)

Cheers.

Solution

The downside of the above solution is that it will consume a good bit of memory if the tables are large. In that case, I would propose the following (I use this for updating a 9M+ entity from an external source):

  1. Have an "IsUpdated" or "IsActive" Boolean Attribute in your cache Entity, that you set to False before starting (use a SQL Query to update them all at once);
  2. For Each over the output of the query fetching the external data;
  3. Query (by means of an Aggregate) the cache table based on whatever unique characteristic you have (probably an Id); make sure you use an indexed attribute!
  4. Update the result of the Aggregate with the data from the external source, set the "IsUpdated" flag to True;
  5. Repeat until done.
  6. Delete all records with "IsUpdated" still set to False (use a SQL Query for this one as well).


Solution

Hey there Guys, 

I never thought of setting the IsActive flag to False before starting the process, so that I don't have to iterate two tables.

Thanks!




Nice solution, Kilian :)

@Marios: You're welcome :). Happy coding!

@Eduardo: Thanks :).