I timed following 8th code on ROCK64 (should be about the same as RPI4 speedwise):
1000000 constant LIMIT
a:new 0 a:push var, a
0 b:new true b:writable var, s
0 b:new true b:writable var, t
: iterate
  s @ "" 2 pick n:1- 26 n:mod 65 n:+ s:+ b:append
  b:len 26 n:< not if
    t @ swap b:append drop
    0 b:new true b:writable s !
  else
    drop
  then
  a:push ;
: app:main
  a @ ' iterate 1 LIMIT loop
  t @ b:rev >s s:len "r LEN: %d\n" s:strfmt .
  dup 26 s:lsub "Front: %s\n" s:strfmt .
  26 s:rsub "Back:  %s\n" s:strfmt .
  LIMIT a:@ nip "UBVal: %d\n" s:strfmt .
  bye ; 
Result was:
root@DietPi:~/Downloads# time /opt/8th/bin/rpi64/8th r3.8th
r LEN: 999986
Front: ZYXWVUTSRQPONMLKJIHGFEDCBA
Back:  ZYXWVUTSRQPONMLKJIHGFEDCBA
UBVal: 1000000
real	0m2,270s
user	0m2,160s
sys	0m0,100s
Memory usage was:
{"fault":0,"isrss":0,"rss":66452,"load15":0.30000,"ixrss":0,"load1":0.56000,"idrss":0,"swap":0,"load5":0.47000}
So, it's quite a lot faster than 40 seconds you have on the chart...