The Problem

Data migrations, despite their different scenarios (migration from legacy systems, data consolidation, distributed systems integrations, etc), are common requirements for many IT projects. According to rough industry estimates, about 2 thirds of IT effort is related to some sort of data migration project. Often when planning a new project, data migration is neglected, causing project overruns and poor data quality. Some reasons for these projects to fail (completely or overrun) are:

Data requirements are not (clearly) defined:
- Technical knowledge is missing or insufficient about the source data repository.
- Business knowledge may also be limited (legacy systems).
Incongruences on the source data repository (this shouldn’t be here… how do I deal with it?).
Data size.
Migration poor performance.
System downtime during migration.
Acceptance tests not defined.

Planning the Migration Project

Most data migration projects will roughly go through 2 main phases:

Data analysis (profiling).
Data mapping.

Data analysis(or profiling) is the initial step for understanding the source data repository. In this phase, its contents and structure are studied and documented. Hopefully, data quality and integrity is also understood at this point – although unfortunately this often makes a comeback during the 2^nd phase of the project.

Once the source data is understood, data mapping sets the specifications for migrating the source data repository content into the target system, with all required data transformations.

Data Consistency

As previously stated, data incongruence is a real problem in this kind of projects. This is particularly true when migrating legacy systems. Often, technical knowledge about these systems is incomplete or simply absent. Bugs in the legacy application may have caused data inconsistencies in the data model. Business users may have been using workarounds (hacks) to known bugs or limitations, causing even more inconsistencies.

This is not a specific technology problem, of course – OutSystems platform has proved quite capable of handling these projects in a very efficient way, regardless if we’re using web services, file manipulation, linked servers, or any other approach. The main risk, no matter how top-notch the technical details seem on paper, is having a crude case of garbage in, garbage out on these scenarios.

The goal of automatically migrating 100% of data may prove to be difficult, to say the least. Exceptions due to poor data quality will likely cause the acceptance tests to fail. Correctly handling these exceptions may be impossible in a reasonable period, causing frustration to the development team and the business users testing the outcome of the migration process.

On most projects, achieving something like a 99% migration rate is actually quite good. Often, data inconsistencies are actually non-critical, so it makes sense spending the most effort in the 99% data that can be migrated with (almost) no issues.

Giving Control to the User

For the (critical or not) data that cannot be migrated automatically, we’re left with two options:

Further study, identify and handle the data inconsistencies during the migration, if possible.
Give control to the user– let him/her decide what should be done in an autonomous way.

Option number 2 is basically giving the business user – the one that really knows how the data should be interpreted and handled – the tools for correctly handling the migration process, thus reducing or eliminating the alternative costly iteration with the dev team for specific (minor) cases.

These tools will be strongly different from case to case, but may include, for example:

A tool allowing the user to quickly identify cases needing manual intervention:
- Data inconsistencies;
- Cases where the mapping to the new data repository couldn’t be completed in a satisfying way.
A tool identifying differences between the source and target repositories.
Specific tools for handling these cases by the user – for example:
- Show whatever related data / complementary information is needed for reviewing;
- Apply selected differences from the source system to the target;
- Take specific business actions.

It’s important noting that most times these tools will be used by the business user, so they need to reflect business concepts known to him – thus removing the technical details as much as possible. For some scenarios, it may possible having a similar tool meant to be used for technical users, of course.

Example 1 – Identifying differences between source and target system.

Example 2 - Identifying cases needing manual intervention.

Example 3 - When possible, quantify business impact on each case. Give plenty of related information.

Example 4 - Give enough options for correcting the problem.

Conclusion

Data migration approaches may differ across different projects, and these techniques may not be well suited for all. However, many legacy systems migrations may benefit from it, especially if the data requirements are poorly understood and data quality is a real issue.

Advantages of the give control to the user approach include:

Can be very useful when data quality of the source repository is a real problem.
Gives control to the business user – empowering him with the tools to take the decisions on each case.
Simplifies acceptance tests.
These tools are normally designed to be used in normal work hours, avoiding stressful overtime effort or system downtime.

Among the limitations:

Not suitable for all migration projects.
Doesn’t replace data mapping analysis, of course.
These tools need analysis and some effort to be reliable and covering most cases.

References

[1] https://www.information-management.com/specialreports/20040518/1003611-1.html
[2] https://www.information-management.com/issues/19990601/996-1.html
[3] https://wikibon.org/wiki/v/Data_migration_strategies
[4] https://www.serverwatch.com/hreviews/article.php/10823_3484731_2/Hardware-Today-Server-Migration-Strategies-Made-Easy.htm

07 Mar 2011

Community GuidelinesBe kind and respectful, give credit to the original source of content, and search for duplicates before posting.

See the full guidelines