Skip to content

gui/rename: fix script error that sometimes caused the script to malfunction when started from the Launcher.#1595

Open
SilasD wants to merge 2 commits into
DFHack:masterfrom
SilasD:rename
Open

gui/rename: fix script error that sometimes caused the script to malfunction when started from the Launcher.#1595
SilasD wants to merge 2 commits into
DFHack:masterfrom
SilasD:rename

Conversation

@SilasD

@SilasD SilasD commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

This error apparently was an interpreter context conflict. The overlay manager runs scripts in core context but the Launcher does not, so if the script was started from the Launcher, there is a race condition with a variable.
I was able to trigger this bug in four or five different sessions, so it's very much worth fixing.

The script abort was caused by attempting to index into a vector[-1].
I fixed this by adding a limit check before the vector access, around line 650 in the new version.
I preemptively added a limit check to one other suspicious vector access, around line 550 in the new version.
I also did very minor code cleanup, mostly changing some integer constants into symbolic constants.
I added LuaLS annotations added to many functions so I could figure out what the code is doing.
The LuaLS DFHack definitions are from a manually-augmented copy of vallode's definitions, which I intend to properly fork and make available Real Soon Now.

The error presented as the gui/rename window not working or locking up, and the overlay manager printing stack traces over and over.
One copy of a stack trace:

C:/Games/Dwarf Fortress/scripts/gui/rename.lua:611: Cannot read field int32_t[].-1: index out of bounds.
stack traceback:
        [C]: in metamethod '__index'
        C:/Games/Dwarf Fortress/scripts/gui/rename.lua:611: in function <C:/Games/Dwarf Fortress/scripts/gui/rename.lua:610>
        (...tail calls...)
        ...rtress\Playing5312\hack\lua\gui\widgets\labels\label.lua:137: in function 'gui.widgets.labels.label.render_text'
        ...Dwarf Fortress\Playing5312\hack\lua\gui\widgets\list.lua:277: in function 'gui.widgets.list.onRenderBody'
        C:\Games\Dwarf Fortress\Playing5312\hack\lua\gui.lua:820: in method 'render'
        C:\Games\Dwarf Fortress\Playing5312\hack\lua\gui.lua:807: in method 'renderSubviews'
        C:\Games\Dwarf Fortress\Playing5312\hack\lua\gui.lua:822: in method 'render'
        C:\Games\Dwarf Fortress\Playing5312\hack\lua\gui.lua:807: in method 'renderSubviews'
        C:\Games\Dwarf Fortress\Playing5312\hack\lua\gui.lua:822: in method 'render'
        C:\Games\Dwarf Fortress\Playing5312\hack\lua\gui.lua:807: in method 'renderSubviews'
        C:\Games\Dwarf Fortress\Playing5312\hack\lua\gui.lua:822: in method 'render'
        C:\Games\Dwarf Fortress\Playing5312\hack\lua\gui.lua:807: in method 'renderSubviews'
        C:\Games\Dwarf Fortress\Playing5312\hack\lua\gui.lua:822: in field 'render'
        C:\Games\Dwarf Fortress\Playing5312\hack\lua\gui.lua:1093: in method 'render'
        C:\Games\Dwarf Fortress\Playing5312\hack\lua\gui.lua:980: in function <C:\Games\Dwarf Fortress\Playing5312\hack\lua\gui.lua:979>
        [C]: in ?

the faulting code:

    for val, comp in ipairs(df.language_name_component) do
        if val < 0 then goto continue end -- ignore language_name_component NONE element with value -1
        local text = {
            {text=comp:gsub('(%l)(%u)', '%1 %2')}, NEWLINE,
            {gap=2, pen=COLOR_YELLOW, text=function()
                local word = self.target.name.words[val]
                if word < 0 then return end
                return ('%s'):format(language.words[word].forms[self.target.name.parts_of_speech[val]])
            end},
        }
        table.insert(choices, {text=text, data={val=val}})

        ::continue::
    end

Analysis: The code faulted in the anonymous callback function, so we don't have an accurate line number, only the line number of the start of the function definition.
The fault was indexing an int32_t vector by -1. Thus we can rule out:

  • language.words[word], which is a vector of df.language_word.
  • .forms[...], which is a container of type df.language_words_flags.
  • self.target.name.parts_of_speech[val], which is a container of type df.part_of_speech.

The remaining possibility is self.target.name.words[val].
self.target.name.words is of type df.language_name.int32_t[7], and is indexed by enum type df.language_name_component.
val must have been set to df.language_name_component.NONE, which is -1.

It is not clear how this happened. The code attempts to deal with this case with this line of code:

        if val < 0 then goto continue end -- ignore language_name_component NONE element with value -1

However, the faulting code is in a callback. There is apparently a codepath where the callback can be entered while val is -1.
This makes some sense, because the anonymous function won't be recompiled every time the outer function runs, and it also binds to val as an upvalue.
However, I don't see how the callback could run while the outer function is executing the for loop. They should be in the same interpreter context, the core context.
At any rate, I added a limit check:

                if val == df.language_name_component.NONE then return '' end

(Later:) Oh. Scripts started from the Launcher do not run in core context:

> :lua !dfhack.is_interactive()

false

So there very much can be race conditions and the like. This may need further study.

@sizzlins

Copy link
Copy Markdown

good job on the fix, i was looking at this into why this was crashing, and i thought id share what i found because its quite a fascinating c++ edge case

it turns out this isnt actually a lua race condition or an interpreter conflict. in Lua 5.3 (which DFHack uses), variables defined in a for loop are strictly block scoped locals, meaning each iteration gets a fresh, isolated local variable that cant mutate across threads

the real culprit is c++ implementation defined behavior for enums. if an enum doesnt have an explicitly defined underlying type, the compiler gets to choose one. on some platforms/compilers, df.language_name_component is compiled as a signed integer (where NONE is -1), but on the build where this crashed, the compiler chose an unsigned 32 bit integer

when -1 is cast to an unsigned integer, it wraps around to 4294967295. Because 4294967295 < 0 evaluates to false, the if val < 0 check completely failed to skip the element on your build, the closure was then created with val = 4294967295

when the GUI rendered the list, it executed the closure and accessed self.target.name.words[4294967295]. dfhacks lua bindings converted that unsigned number back into a signed int32_t to index the c++ array, wrapping it back around to -1, which triggered the exact error you saw: Cannot read field int32_t[].-1.

your fix to change the logic to if val == df.language_name_component.NONE is the perfect solution because it explicitly compares against the enum value, completely bypassing the < 0 arithmetic check which silently fails on unsigned enum builds.

just wanted to share the root cause for future reference, thanks for fixing this

@SilasD

SilasD commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

that, hmmm.
that bothers me. compilers shouldn't be that stupid to choose unsigned types if there are negative numbers in the enum source code.
bah, it's so hot upstairs I can't think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants