I'm having a hard time with this one... I imagine a clever use of itertools or pythonic slicing and zipping would do the job, but can't figure it out. Part of the reason may be that the problem itself is underspecified, but here goes:
You're given a matrix of characters where the first row represents an overlapping 3-gram string of length k, and the following rows can be anything: e.g., for string "ABCDE":
[['A', 'B', 'C', 'B', 'C', 'D', 'C', 'D', 'E'], # <-- string ABCDE broken into ABC BCD CDE then flattened to characters
['F', 'G', 'H', 'X', 'Y', 'Z', '1', '2', '3'], # <-- arbitrary
......
]The goal is to generate the set of all strings of length k that are the ordered permutations (may be incorrect terminology) that follow the groupings of the 3-grams, .e.g.:
{"ABCDE", "ABCD3", "ABC2E", ..., "FGHZ3"}
For this example (if only the first two rows exist), there are 120 valid strings (a formula to figure out exactly how many strings would be generated might be helpful, but I've yet to work it out).
Note that order is important, i.e., "AXHDE" is invalid, as X occurs after H, even though X relates to B and H relates to C.
I've implemented a recursive solution that solves this example quickly, but doesn't scale at all with the length of k or the number of rows.
Here is a list of all valid strings for the example above:
check = { "AB123", "AB12E", "AB1D3", "AB1DE", "ABC23", "ABC2E", "ABCD3", "ABCDE",
"ABCZ3", "ABCZE", "ABH23", "ABH2E", "ABHD3", "ABHDE", "ABHZ3", "ABHZE",
"ABY23", "ABY2E", "ABYD3", "ABYDE", "ABYZ3", "ABYZE", "AG123", "AG12E",
"AG1D3", "AG1DE", "AGC23", "AGC2E", "AGCD3", "AGCDE", "AGCZ3", "AGCZE",
"AGH23", "AGH2E", "AGHD3", "AGHDE", "AGHZ3", "AGHZE", "AGY23", "AGY2E",
"AGYD3", "AGYDE", "AGYZ3", "AGYZE", "AX123", "AX12E", "AX1D3", "AX1DE",
"AXC23", "AXC2E", "AXCD3", "AXCDE", "AXCZ3", "AXCZE", "AXY23", "AXY2E",
"AXYD3", "AXYDE", "AXYZ3", "AXYZE", "FB123", "FB12E", "FB1D3", "FB1DE",
"FBC23", "FBC2E", "FBCD3", "FBCDE", "FBCZ3", "FBCZE", "FBH23", "FBH2E",
"FBHD3", "FBHDE", "FBHZ3", "FBHZE", "FBY23", "FBY2E", "FBYD3", "FBYDE",
"FBYZ3", "FBYZE", "FG123", "FG12E", "FG1D3", "FG1DE", "FGC23", "FGC2E",
"FGCD3", "FGCDE", "FGCZ3", "FGCZE", "FGH23", "FGH2E", "FGHD3", "FGHDE",
"FGHZ3", "FGHZE", "FGY23", "FGY2E", "FGYD3", "FGYDE", "FGYZ3", "FGYZE",
"FX123", "FX12E", "FX1D3", "FX1DE", "FXC23", "FXC2E", "FXCD3", "FXCDE",
"FXCZ3", "FXCZE", "FXY23", "FXY2E", "FXYD3", "FXYDE", "FXYZ3", "FXYZE" }