Logo
TJCTF 2025: "golf-hardester"

TJCTF 2025: "golf-hardester"

June 9, 2025
10 min read
index

Hello!

This past weekend, I played in TJCTF with les amateurs. We ended up in 1st place! image

We won because we were the only team to solve misc/golf-hardester. In this write-up, I will be documenting my thought process while solving golf-hardester. image

Challenge Breakdown

golf-hardester is a regex golfing challenge where you are given sample texts to match and sample texts to not match, and you must write a regex within a certain character limit that matches the pattern. There are 5 levels, each of which has its own pattern that you must write a regex for. To get the flag, you must solve all 5 levels.

In this write-up, I will mainly be focusing on level 5, as it was by far the most difficult and was the one where most teams got stuck on.

Levels 1-4

Level 1

1. "Warmup"
This one's pretty straightforward.
Match on all of these: But none of these:
------------------------ ------------------------
ampasimenite jasmone
anchorable decisivenesses
aconic backoff
antistrophic whitebark
abrade physogastrism
arroya shavee
apoplex hanoverian
ayahs weatherstripped
arock naturalize
anconoid lophotrichous
anglophile nonprecedential
acalephan tjctf
amaze shepherdia
adelphe waymark
anno picnicker
amiable bottleholder
aliquid porule
achromatin diagraming
aircheck solifuge
antigenically phrenetically
Maximum allowable length is 2

The pattern is just to match any string that begins with an a.

^a

Level 2

2. "Tenet"
It's a classic chicken-and-egg question:
Which came first, the Roman graffiti or the Christopher Nolan movie?
Match on all of these: But none of these:
------------------------- ---------------------------------------------
satorarepotenetoperarotas chirl
decaleminacivicanimelaced kayak
strawtrelareferalertwarts croisstrutannotdawes
hakamanasakayakasanamakah khnabadalanamanaladabanhk
dimitinarimadamiranitimid parssalertrevertrelassrap
faradaromarotoramoradaraf makahadelykerekyledahakam
banakaholanoyonalohakanab crimpgruftgowermairsjessi
lassoartusstetssutraossal nasabbnamasamasamanbbasan
rakisanolikodokilonasikar mfssaartusstatssutraassfm
tadesaneledewedelenasedat camusvmelumesemulemvsumac
assetscamesalasemacstessa tknakaronanoyonanorakankt
cestiecartsamastraceitsec sejabenemarexeramenebajes
gnatsnonetananatenonstang esshmshamasamasamahsmhsse
rotasoperatevetareposator talasaladalwvwladalasalat
rakesamenekeyekenemasekar talasadelslemelsledasalat
salemanelelemelelenamelas areposatortenetrotasopera
assesspalesamaselapssessa faradaledaaeveaadeladaraf
dartsavertrevertrevastrad assamshampsalaspmahsmassa
coramoletareveratelomaroc nesoaevilssirissliveaosen
nertaecartradartraceatren zaiananasamadamasananaiaz
haramadelareferaledamarah kayakkayakkayakkayakkayak
latonanimotipitominanotal kanltadalanamanaladatlnak
kedarenemadewedameneradek batheskaldifritfacksladesremop
masseattisstatssittaessam awhetmeresfatalabdulaboonamulagiamo
kelsielapslavalspaleislek ervumkemmemuzakalgiaapersdikesswinkrizzo
waresalonerotorenolaseraw avenspairtstrohaddiasixthaeriesclavtepeecribs
susanulemasexesamelunasus
Maximum allowable length is 50

The pattern here is a bit difficult to see, but the rule is to match any string that matches the following conditions:

  1. The string must be a palindrome.
  2. The string must be 25 characters long.
  3. The 2nd character must be the same as the 6th character.
  4. The 3rd character must be the same as the 11th character.
  5. The 4th character must be the same as the 10th character.
  6. The 7th character must be the same as the 12th character.

To match a palindrome, we can use the regex module’s recursion feature. This regex matches any palindrome:

^(.|(.)(?1)\2)$

Here, we have a base case of a single character. In the recursive case, we capture a character, recursively match the regex, then backreference the captured character to ensure it occurs again at the end of the string.

We can match conditions 2-6 with the following regex:

^.(.)(.)(.).\3.(.).\5\4\6.{13}$

We can then combine the two regexes using a lookahead:

^(?=(.|(.)(?1)\2)$).(.)(.)(.).\3.(.).\5\4\6.{13}$

Level 3

3. "Triumvirate"
Ah yes, quinary, my fifth favorite base.
Match on all of these: But none of these:
----------------------------------- ----------------------------------
0 1
3 4
11 210
14 11310
22 214131244
30 3213002030
2402301 303032014212
2423144312 231332302134
111413130043 110220031020430
3402023233320 1422343112104210
130424322431012 110310304112132043
31243200410022310 100210421022044210
1420210414114303030 1114412243313242103
302100312023023123 24440314200314423044
2020212213114432333240 10431021421031033200
1243332123330120041301 43114042320203440411
1403212440313014044442 1122103140022040312224
32301404334330404131321 20113122400131413033301
413011141021024302200400 114042013123013231123102344
4210332323242404120224132 102310311320124044204140131
114401022024420003234234444 1000441233101220230414004112333
13020411424304401310144130040302 10212311344141143012420443230301
11202241201424204222113011221133 20121332123142340324001223212344
234343341022001132401403022101104 42441333222430014043301433413001
231421441204321403132040024202002 242310103234324100011331130230334
10410400231012014340122214134013042 1441413341001141224132000024031214
Maximum allowable length is 70

The pattern here is to match any base-5 number that is divisible by 3.

The idea here is that we will step through each digit, and keep a running count of the number modulo 3. Once we reach the end of the string, our running count will be the entire number, modulo 3. Thus, we can match only strings that end with a running count of 0 (mod 3). For example, consider the base-5 number 124 (39 in decimal). This is how our regex will evaluate it:

count starts at 0
[1]23 -> count = (count * 5) + 1 = (0 * 5) + 1 = 1 (mod 3)
1[2]3 -> count = (count * 5) + 2 = (1 * 5) + 2 = 1 (mod 3)
12[4] -> count = (count * 5) + 4 = (1 * 5) + 4 = 0 (mod 3)
string ends with count = 0, so we accept it

The trick here is that, because we are operating mod 3, the count variable can only have 3 values: 0, 1, or 2. We can create three separate regexes to model each possible value of count:

count_0: $|[03](?&count_0)|[14](?&count_1)|2(?&count_2)
count_1: [03](?&count_2)|[14](?&count_0)|2(?&count_1)
count_2: [03](?&count_1)|[14](?&count_2)|2(?&count_0)

Combining these three regexes, and stripping group names to make it shorter, we get:

^($|[03](?1)|[14]([03]([03](?2)|[14](?3)|2(?1))|[14](?1)|2(?2))|2(?3))

Level 4

4. "Lucky Number"
I don't think you need any hints as to which base this is.
Match on all of these: But none of these:
------------------------ ------------------------
0 1
111 10
1110 11
10101 100
11100 101
100011 110
101010 1000
110001 1001
111000 1010
111111 1011
1000110 1100
1001101 1101
1010100 1111
1011011 10000
1100010 10001
1101001 10010
1110000 10011
1110111 10100
1111110 10110
10000101 10111
10001100 101101
10010011 1100001
10011010 1101111
10100001 10111011
10101000 11110100
10101111 11110111
10110110 100110110
10111101 100001111
11000100 110001010
11001011 111110000
11010010 101101001
11011001 110111110
11100000 1011100011
11100111 1010010111
11101110 1110000001
11110101 1011011011
11111100 1001101110
100000011 1011110110
100001010 1001100110
100010001 1010001111
Maximum allowable length is 162

The pattern here is to match any base-2 number that is divisible by 7.

We can use the same technique as in level 3.

^(?<s0>$|0(?&s0)|1(?<s1>0(?<s2>0(?<s4>0(?&s1)|1(?&s2))|1(?<s5>0(?&s3)|1(?&s4)))|1(?<s3>0(?<s6>0(?&s5)|1(?&s6))|1(?&s0))))

Level 5

5. "Tally"
Alright, time for a real challenge. This one should actually be difficult!
Match on all of these: But none of these:
------------------------ ------------------------
arraigning edified
nonordered unreverberating
abadbacdcacbdbdc underpass
mesosome interinsert
ananna pilfered
unendued nippiness
tromometer gregarinian
caucasus deicide
intestines nonaristocratic
i rototiller
deed ozonizing
horseshoer museums
happenchance backbreaker
reappear interradiated
deeded antistalling
pullup naturalize
arraigning equitriangular
testes reparticipate
mononymy ppd
scintillescent miasmas
couscous cabbage
Maximum allowable length is 62

Now, we get to the last level, and the hardest one by far.

Luckily, by the time that I had gotten to this level, the author had already released a hint about what the pattern to match is: image

We need to match any string that has equal number of occurrences of all the unique characters in the string.

Brainstorming

One of the first things that my teammate and I noticed was that because of the transitive property of equality, we only actually need to check that a given character has the same number of occurrences as only one other character in the string. We do not need to test every character against every other character.

For example, in the string aabbcc:

  1. Checking a against b, they both have 2 occurrences.
  2. Checking b against c, they both have 2 occurrences.
  3. We automatically know that a and c have the same number of occurrences by the transitive property of equality.

This insight allows us to come up with an algorithm that may be possible to implement in regex that satisfies the pattern:

  1. Step through each character in the string.
  2. Check if this character and the next character have the same number of occurrences.
  3. If all characters pass this test, then the string is valid.

Now that we had an idea for an algorithm, I started implementing it in regex. My goal was to first implement a regex that passes the test cases, then to golf it down to fit into the 62 character limit.

Matching Equal Counts (part 1)

Starting simply, I first wanted to make a regex that matched any string that had equal counts of the characters a and b, assuming the string had no other characters other than as and bs.

The key insight that I had here is: if there are equal counts of as and bs, then we can pair up every single a with a single b. Thus, if the first character is an a, then there must be a b somewhere later in the string that we can pair the it up with (and vice versa).

In regex, that would look something like this:

^(a.*b|b.*a)

It is possible for there to be multiple pairs of as and bs in the string. For example, consider the string abba. This string has two pairs: ab and ba. So, we need to repeat this pairing pattern until the end:

^(a.*b|b.*a)*$

Up until now, we have been ignoring the portion of the string between the paired a and b characters with a .*. However, we also need to ensure that this substring has an equal number of as and bs. For example, consider the string aabb. This string contains an ab pair within an ab pair. This is perfect use case for recursion! So, our new regex looks like this:

^(a(?1)*b|b(?1)*a)*$

This regex matches any string with an equal count of as and bs, assuming the string only contains as and bs.

Matching Equal Counts (part 2)

Now, we can extend this regex to match any string with equal counts of as and bs, regardless of what other characters it contains.

All that we need to do this is to skip characters that are not as or bs. One way we could do this is with an exclusive set. In regex, the set [^ab] will match any character that is not an a or b. So, we can modify our regex as such:

^([^ab]*|a(?1)*b|b(?1)*a)*$

However, eventually, we will want to use this regex to test any arbitrary captured characters, not just a and b. Unfortunately, we cannot put backreferenced captured strings into a set, so we will need to use a different approach to match any character except a or b.

The method I came up with to do this is using negative lookaheads. Putting a negative lookahead before a pattern will cause that pattern not match if the negative lookahead matches. So, the following pattern will match any character that is not an a or b:

(?!a|b).

We can replace our exclusive set with this negative lookahead:

^(((?!a|b).)*|a(?1)*b|b(?1)*a)*$

We now have a regex that matches any string with equal counts of as and bs, while ignoring all other characters!

Extending to Captured Characters

Now, we can extend our regex to match any two characters, not just a and b. Our original algorithm idea was to step through the string and check if consecutive characters have equal counts. For now, we will not worry about stepping through the string; we will only check if the first character has the same count as the second character.

To start, we just need to capture the first and second characters:

^(.)(.)

However, we run into an issue here. We can’t just append our equal-counts regex after the captures because the captures have already consumed the first two characters of the string. Our equal-counts regex must run on the entire string.

Furthermore, we don’t want our equal-counts regex to consume any characters because eventually we will need to extend our regex to step through the string character-by-character.

The solution I came up with is to first use a positive lookahead to go to the end of the string, then use a negative lookahead to run the equal-counts regex on the entire string. Lookaheads do not consume characters, so this does exactly what we need! This technique looks like this:

(?=
.*$ # Goto end of string
(?<=
^ # Reset back to beginning of string
# Equal-counts regex:
(((?!a|b).)*|a(?1)*b|b(?1)*a)*$
)
)

Now, we can append this to the regex that captures the first and second characters, and substitute the equal-counts regex a and b with group backreferences:

^(.)(.)
(?=
.*$ # Goto end of string
(?<=
^ # Reset back to beginning of string
# Equal-counts regex:
(((?!\1|\2).)*|\1(?3)*\2|\2(?3)*\1)*$
)
)

Iterating the String (part 1)

todo: finish writeup odsajfoiadsjfoiasdjfoiawe

Solve Scripts

Final regex:

^(.)(\1|(.)(?=.*$(?<=^((?!\1|\3).|\1(?4)*?\3|\3(?4)*?\1)*)))*$

Annotated:

# Want to check if the string has equal counts of all the unique characters
# Capture first character; we will test everything against it
^
(?<a>.)
(
\g<a>| # Skip matching a character against itself
# Capture next character to test against the first
(?<b>.)
(?=
# Goto end of string
.*$
(?<=
# Test if chars in \g<a> and \g<b> have same counts in the string
^
(?<main_loop>
(?!\g<a>|\g<b>). # skip
|\g<a>(?&main_loop)*?\g<b>
|\g<b>(?&main_loop)*?\g<a>
)*
)
)
)*$
import pwn
r = pwn.remote('tjc.tf', 31132)
r.sendlineafter(b'> ', br'^a')
r.sendlineafter(b'> ', br'^(?=(.|(.)(?1)\2)$).(.)(.)(.).\3.(.).\5\4\6.{13}$')
r.sendlineafter(b'> ', br'^($|[03](?1)|[14]([03]([03](?2)|[14](?3)|2(?1))|[14](?1)|2(?2))|2(?3))')
r.sendlineafter(b'> ', br'^(?<s0>$|0(?&s0)|1(?<s1>0(?<s2>0(?<s4>0(?&s1)|1(?&s2))|1(?<s5>0(?&s3)|1(?&s4)))|1(?<s3>0(?<s6>0(?&s5)|1(?&s6))|1(?&s0))))')
r.sendlineafter(b'> ', br'^(.)(\1|(.)(?=.*$(?<=^((?!\1|\3).|\1(?4)*?\3|\3(?4)*?\1)*)))*$')
r.interactive()