\n\nDesign a data validation framework whose purpose is to check whether data selected from source for loading target table matches on number of records on daily basis.\nHypothesis:\n(Count(DB1[Source])=Count(DB2[Target]) per batch.\n\nAssume source and target DB can be anything and number of tables as x\n\nMy approach:\n1. Store a config file over cloud storage which contains key:value pairs as eg: taget_table_name:Source_table_name.\n(This way we can store key value pairs for different kind of db as well for scalable solution)\n\n2. Run a daily batch job and iterate over the config file saved in step 1 and insert data table wise and day wise data in a table. Ex:\ndate1,table1,count\ndate1,table2,count\n\n3. Expose APIs over source table to get data in aggregate structure like below:\ndate1 : difference, if any\ndate2: difference, if any\n\nAlternate approach anyone ?\nAny feedback or GitHub links are also welcome.

\n\nDesign a data validation framework whose purpose is to check whether data selected from source for loading target table matches on number of records on dail