Data validation framework | System design discussion

Design a data validation framework whose purpose is to check whether data selected from source for loading target table matches on number of records on daily basis.
Hypothesis:
(Count(DB1[Source])=Count(DB2[Target]) per batch.

Assume source and target DB can be anything and number of tables as x

My approach:

  1. Store a config file over cloud storage which contains key:value pairs as eg: taget_table_name:Source_table_name.
    (This way we can store key value pairs for different kind of db as well for scalable solution)

  2. Run a daily batch job and iterate over the config file saved in step 1 and insert data table wise and day wise data in a table. Ex:
    date1,table1,count
    date1,table2,count

  3. Expose APIs over source table to get data in aggregate structure like below:
    date1 : difference, if any
    date2: difference, if any

Alternate approach anyone ?
Any feedback or GitHub links are also welcome.

Comments (1)