[SOLVED] In LLVM, how to get all the struct types in a program?

Issue

This Content is from Stack Overflow. Question asked by 董英帅

In LLVM, we can use StructType() to get all structs in a Module, but when we use multiple source files, the Modules are different, so the same struct type may display multiple times in different Modules.
How to get all struct types in a program, not only a single source file?
And is there any method that record where or when the structs first appear?



Solution

That’s a big and fairly fuzzy question. It depends.

LLVM has named and unnamed struct types. You can easily get a list of the named struct types, and if your program uses a single context, you can merge the lists and will have a complete list of the defined named struct types. That set of defined types will be a superset of the used struct types.

Your programs may also use unnamed struct types or some parts of your programs don’t come from an LLVM Module.

If you use unnamed struct types, life is more difficult. Two types may both be {i32, i32} and maybe that’s the same type for your purposes, and maybe it’s two independent types that happen to have the same members. Difficult. Finding them is also more difficult: You have to loop over all of the functions and find the values each uses, and the types of those values. A slow loop, and boring to write. (But on the positive site, you won’t have false positives when a struct type is defined but never used. The named struct types above can have that problem.)

These problems become more difficult if your language is one where two struct types may or may not be the same type. For example in Java, all code that uses the String type (which is a struct type on the IR level) uses the Object type too. So if you’re making a list of modules that use Object, you have to consider functions that the refer to String, but if you’re making a list of modules that use String, you have to disregard functions that refer only to Object.

There are also issues around optimisations. For example SROA, which involves replacing struct usage with simpler scalars. After SROA, a function that used String in the source may not use it in IR any more. If you want to act on all functions that use a given type, you have to think about the effects of SROA.

This sort of language-dependent complexity is the reason that LLVM provides (almost?) no helper functions for these problems. People’s requirements vary too much. LLVM helpers exist for the problems where one size fits many.

There is no "first occurence" in LLVM, that’s internal to each compiler frontend. You can use debug info to find out where in the source code a struct type occurs, and perhaps you’ll find a single file and a lowest line number. "Lowest line number" is what you mean by "first", right? It’s not the only possible meaning.


This Question was asked in StackOverflow by 董英帅 and Answered by arnt It is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?