Part of a series on |
Machine learning and data mining |
---|
Mechanistic interpretability (often shortened to mech interp or MI) is a subfield of research within explainable artificial intelligence which seeks to fully reverse-engineer neural networks (akin to reverse-engineering a compiled binary of a computer program), with the ultimate goal of understanding the mechanisms underlying their computations.[1][2][3] The field is particularly focused on large language models.
mathematical
was invoked but never defined (see the help page).