Hello!
This past weekend, I played in TJCTF with les amateurs. We ended up in 1st place!
We won because we were the only team to solve misc/golf-hardester
. In this write-up, I will be
documenting my thought process while solving golf-hardester
.
Challenge Breakdown
golf-hardester
is a regex golfing challenge where you are given sample texts to match and sample
texts to not match, and you must write a regex within a certain character limit that matches the pattern.
There are 5 levels, each of which has its own pattern that you must write a regex for. To get the flag,
you must solve all 5 levels.
In this write-up, I will mainly be focusing on level 5, as it was by far the most difficult and was the one where most teams got stuck on.
Levels 1-4
Level 1
1. "Warmup" This one's pretty straightforward.
Match on all of these: But none of these:------------------------ ------------------------ampasimenite jasmoneanchorable decisivenessesaconic backoffantistrophic whitebarkabrade physogastrismarroya shaveeapoplex hanoverianayahs weatherstrippedarock naturalizeanconoid lophotrichousanglophile nonprecedentialacalephan tjctfamaze shepherdiaadelphe waymarkanno picnickeramiable bottleholderaliquid poruleachromatin diagramingaircheck solifugeantigenically phrenetically
Maximum allowable length is 2
The pattern is just to match any string that begins with an a
.
^a
Level 2
2. "Tenet" It's a classic chicken-and-egg question: Which came first, the Roman graffiti or the Christopher Nolan movie?
Match on all of these: But none of these:------------------------- ---------------------------------------------satorarepotenetoperarotas chirldecaleminacivicanimelaced kayakstrawtrelareferalertwarts croisstrutannotdaweshakamanasakayakasanamakah khnabadalanamanaladabanhkdimitinarimadamiranitimid parssalertrevertrelassrapfaradaromarotoramoradaraf makahadelykerekyledahakambanakaholanoyonalohakanab crimpgruftgowermairsjessilassoartusstetssutraossal nasabbnamasamasamanbbasanrakisanolikodokilonasikar mfssaartusstatssutraassfmtadesaneledewedelenasedat camusvmelumesemulemvsumacassetscamesalasemacstessa tknakaronanoyonanorakanktcestiecartsamastraceitsec sejabenemarexeramenebajesgnatsnonetananatenonstang esshmshamasamasamahsmhsserotasoperatevetareposator talasaladalwvwladalasalatrakesamenekeyekenemasekar talasadelslemelsledasalatsalemanelelemelelenamelas areposatortenetrotasoperaassesspalesamaselapssessa faradaledaaeveaadeladarafdartsavertrevertrevastrad assamshampsalaspmahsmassacoramoletareveratelomaroc nesoaevilssirissliveaosennertaecartradartraceatren zaiananasamadamasananaiazharamadelareferaledamarah kayakkayakkayakkayakkayaklatonanimotipitominanotal kanltadalanamanaladatlnakkedarenemadewedameneradek batheskaldifritfacksladesremopmasseattisstatssittaessam awhetmeresfatalabdulaboonamulagiamokelsielapslavalspaleislek ervumkemmemuzakalgiaapersdikesswinkrizzowaresalonerotorenolaseraw avenspairtstrohaddiasixthaeriesclavtepeecribssusanulemasexesamelunasus
Maximum allowable length is 50
The pattern here is a bit difficult to see, but the rule is to match any string that matches the following conditions:
- The string must be a palindrome.
- The string must be 25 characters long.
- The 2nd character must be the same as the 6th character.
- The 3rd character must be the same as the 11th character.
- The 4th character must be the same as the 10th character.
- The 7th character must be the same as the 12th character.
To match a palindrome, we can use the regex
module’s
recursion feature.
This regex matches any palindrome:
^(.|(.)(?1)\2)$
Here, we have a base case of a single character. In the recursive case, we capture a character, recursively match the regex, then backreference the captured character to ensure it occurs again at the end of the string.
We can match conditions 2-6 with the following regex:
^.(.)(.)(.).\3.(.).\5\4\6.{13}$
We can then combine the two regexes using a lookahead:
^(?=(.|(.)(?1)\2)$).(.)(.)(.).\3.(.).\5\4\6.{13}$
Level 3
3. "Triumvirate" Ah yes, quinary, my fifth favorite base.
Match on all of these: But none of these:----------------------------------- ----------------------------------0 13 411 21014 1131022 21413124430 32130020302402301 3030320142122423144312 231332302134111413130043 1102200310204303402023233320 1422343112104210130424322431012 11031030411213204331243200410022310 1002104210220442101420210414114303030 1114412243313242103302100312023023123 244403142003144230442020212213114432333240 104310214210310332001243332123330120041301 431140423202034404111403212440313014044442 112210314002204031222432301404334330404131321 20113122400131413033301413011141021024302200400 1140420131230132311231023444210332323242404120224132 102310311320124044204140131114401022024420003234234444 100044123310122023041400411233313020411424304401310144130040302 1021231134414114301242044323030111202241201424204222113011221133 20121332123142340324001223212344234343341022001132401403022101104 42441333222430014043301433413001231421441204321403132040024202002 24231010323432410001133113023033410410400231012014340122214134013042 1441413341001141224132000024031214
Maximum allowable length is 70
The pattern here is to match any base-5 number that is divisible by 3.
The idea here is that we will step through each digit, and keep a running count of the number modulo 3.
Once we reach the end of the string, our running count will be the entire number, modulo 3. Thus, we can
match only strings that end with a running count of 0 (mod 3). For example, consider the base-5 number 124
(39
in decimal). This is how our regex will evaluate it:
count starts at 0[1]23 -> count = (count * 5) + 1 = (0 * 5) + 1 = 1 (mod 3)1[2]3 -> count = (count * 5) + 2 = (1 * 5) + 2 = 1 (mod 3)12[4] -> count = (count * 5) + 4 = (1 * 5) + 4 = 0 (mod 3)string ends with count = 0, so we accept it
The trick here is that, because we are operating mod 3, the count
variable can only have 3 values: 0
, 1
,
or 2
. We can create three separate regexes to model each possible value of count
:
count_0: $|[03](?&count_0)|[14](?&count_1)|2(?&count_2)count_1: [03](?&count_2)|[14](?&count_0)|2(?&count_1)count_2: [03](?&count_1)|[14](?&count_2)|2(?&count_0)
Combining these three regexes, and stripping group names to make it shorter, we get:
^($|[03](?1)|[14]([03]([03](?2)|[14](?3)|2(?1))|[14](?1)|2(?2))|2(?3))
Level 4
4. "Lucky Number" I don't think you need any hints as to which base this is.
Match on all of these: But none of these:------------------------ ------------------------0 1111 101110 1110101 10011100 101100011 110101010 1000110001 1001111000 1010111111 10111000110 11001001101 11011010100 11111011011 100001100010 100011101001 100101110000 100111110111 101001111110 1011010000101 1011110001100 10110110010011 110000110011010 110111110100001 1011101110101000 1111010010101111 1111011110110110 10011011010111101 10000111111000100 11000101011001011 11111000011010010 10110100111011001 11011111011100000 101110001111100111 101001011111101110 111000000111110101 101101101111111100 1001101110100000011 1011110110100001010 1001100110100010001 1010001111
Maximum allowable length is 162
The pattern here is to match any base-2 number that is divisible by 7.
We can use the same technique as in level 3.
^(?<s0>$|0(?&s0)|1(?<s1>0(?<s2>0(?<s4>0(?&s1)|1(?&s2))|1(?<s5>0(?&s3)|1(?&s4)))|1(?<s3>0(?<s6>0(?&s5)|1(?&s6))|1(?&s0))))
Level 5
5. "Tally" Alright, time for a real challenge. This one should actually be difficult!
Match on all of these: But none of these:------------------------ ------------------------arraigning edifiednonordered unreverberatingabadbacdcacbdbdc underpassmesosome interinsertananna pilferedunendued nippinesstromometer gregariniancaucasus deicideintestines nonaristocratici rototillerdeed ozonizinghorseshoer museumshappenchance backbreakerreappear interradiateddeeded antistallingpullup naturalizearraigning equitriangulartestes reparticipatemononymy ppdscintillescent miasmascouscous cabbage
Maximum allowable length is 62
Now, we get to the last level, and the hardest one by far.
Luckily, by the time that I had gotten to this level, the author had already released a hint about what the
pattern to match is:
We need to match any string that has equal number of occurrences of all the unique characters in the string.
Brainstorming
One of the first things that my teammate and I noticed was that because of the transitive property of equality, we only actually need to check that a given character has the same number of occurrences as only one other character in the string. We do not need to test every character against every other character.
For example, in the string aabbcc
:
- Checking
a
againstb
, they both have 2 occurrences. - Checking
b
againstc
, they both have 2 occurrences. - We automatically know that
a
andc
have the same number of occurrences by the transitive property of equality.
This insight allows us to come up with an algorithm that may be possible to implement in regex that satisfies the pattern:
- Step through each character in the string.
- Check if this character and the next character have the same number of occurrences.
- If all characters pass this test, then the string is valid.
Now that we had an idea for an algorithm, I started implementing it in regex. My goal was to first implement a regex that passes the test cases, then to golf it down to fit into the 62 character limit.
Matching Equal Counts (part 1)
Starting simply, I first wanted to make a regex that matched any string that had equal counts of the characters a
and b
,
assuming the string had no other characters other than a
s and b
s.
The key insight that I had here is: if there are equal counts of a
s and b
s, then we can pair up every single a
with a
single b
. Thus, if the first character is an a
, then there must be a b
somewhere later in the string that we can pair
the it up with (and vice versa).
In regex, that would look something like this:
^(a.*b|b.*a)
It is possible for there to be multiple pairs of a
s and b
s in the string. For example, consider the string abba
. This
string has two pairs: ab
and ba
. So, we need to repeat this pairing pattern until the end:
^(a.*b|b.*a)*$
Up until now, we have been ignoring the portion of the string between the paired a
and b
characters with a .*
. However,
we also need to ensure that this substring has an equal number of a
s and b
s. For example, consider the string aabb
.
This string contains an ab
pair within an ab
pair. This is perfect use case for recursion! So, our new regex looks like
this:
^(a(?1)*b|b(?1)*a)*$
This regex matches any string with an equal count of a
s and b
s, assuming the string only contains a
s and b
s.
Matching Equal Counts (part 2)
Now, we can extend this regex to match any string with equal counts of a
s and b
s, regardless of what other characters
it contains.
All that we need to do this is to skip characters that are not a
s or b
s. One way we could do this is with an exclusive
set. In regex, the set [^ab]
will match any character that is not an a
or b
. So, we can modify our regex as such:
^([^ab]*|a(?1)*b|b(?1)*a)*$
However, eventually, we will want to use this regex to test any arbitrary captured characters, not just a
and b
.
Unfortunately, we cannot put backreferenced captured strings into a set, so we will need to use a different approach to
match any character except a
or b
.
The method I came up with to do this is using negative lookaheads. Putting a negative lookahead before a pattern will cause
that pattern not match if the negative lookahead matches. So, the following pattern will match any character
that is not an a
or b
:
(?!a|b).
We can replace our exclusive set with this negative lookahead:
^(((?!a|b).)*|a(?1)*b|b(?1)*a)*$
We now have a regex that matches any string with equal counts of a
s and b
s, while ignoring all other characters!
Extending to Captured Characters
Now, we can extend our regex to match any two characters, not just a
and b
. Our original algorithm idea was to step
through the string and check if consecutive characters have equal counts. For now, we will not worry about stepping through
the string; we will only check if the first character has the same count as the second character.
To start, we just need to capture the first and second characters:
^(.)(.)
However, we run into an issue here. We can’t just append our equal-counts regex after the captures because the captures have already consumed the first two characters of the string. Our equal-counts regex must run on the entire string.
Furthermore, we don’t want our equal-counts regex to consume any characters because eventually we will need to extend our regex to step through the string character-by-character.
The solution I came up with is to first use a positive lookahead to go to the end of the string, then use a negative lookahead to run the equal-counts regex on the entire string. Lookaheads do not consume characters, so this does exactly what we need! This technique looks like this:
(?= .*$ # Goto end of string (?<= ^ # Reset back to beginning of string
# Equal-counts regex: (((?!a|b).)*|a(?1)*b|b(?1)*a)*$ ))
Now, we can append this to the regex that captures the first and second characters, and substitute the equal-counts regex
a
and b
with group backreferences:
^(.)(.)(?= .*$ # Goto end of string (?<= ^ # Reset back to beginning of string
# Equal-counts regex: (((?!\1|\2).)*|\1(?3)*\2|\2(?3)*\1)*$ ))
Iterating the String (part 1)
todo: finish writeup odsajfoiadsjfoiasdjfoiawe
Solve Scripts
Final regex:
^(.)(\1|(.)(?=.*$(?<=^((?!\1|\3).|\1(?4)*?\3|\3(?4)*?\1)*)))*$
Annotated:
# Want to check if the string has equal counts of all the unique characters
# Capture first character; we will test everything against it^(?<a>.)
( \g<a>| # Skip matching a character against itself
# Capture next character to test against the first (?<b>.) (?= # Goto end of string .*$ (?<=
# Test if chars in \g<a> and \g<b> have same counts in the string ^ (?<main_loop> (?!\g<a>|\g<b>). # skip |\g<a>(?&main_loop)*?\g<b> |\g<b>(?&main_loop)*?\g<a> )*
) ))*$
import pwn
r = pwn.remote('tjc.tf', 31132)r.sendlineafter(b'> ', br'^a')r.sendlineafter(b'> ', br'^(?=(.|(.)(?1)\2)$).(.)(.)(.).\3.(.).\5\4\6.{13}$')r.sendlineafter(b'> ', br'^($|[03](?1)|[14]([03]([03](?2)|[14](?3)|2(?1))|[14](?1)|2(?2))|2(?3))')r.sendlineafter(b'> ', br'^(?<s0>$|0(?&s0)|1(?<s1>0(?<s2>0(?<s4>0(?&s1)|1(?&s2))|1(?<s5>0(?&s3)|1(?&s4)))|1(?<s3>0(?<s6>0(?&s5)|1(?&s6))|1(?&s0))))')r.sendlineafter(b'> ', br'^(.)(\1|(.)(?=.*$(?<=^((?!\1|\3).|\1(?4)*?\3|\3(?4)*?\1)*)))*$')r.interactive()