Hello Jonas,
You ask good questions.
To answer your main question: it's a bit of both. More precisely, multiple found paths are internally treated as "propagation paths," which are functionally equivalent to multipositions. You can see these paths in the Voice Inspector's List View by expanding the Attenuation row (refer to the screenshots in this documentation).
However, for performance reasons, as shown in the Voice Profiler or the Voice Graph pane of the Voice Inspector, all these paths/multipositions are ultimately collapsed into a single signal that is filtered only once. The resulting volume, LPF, HPF, and panning matrix are calculated using a complex algorithm that attempts to account for the perceptual loudness of each path when computing their respective contributions. The panning of the resulting signal is more or less a weighted average of all path directions. Of course, this is inherently an approximation. And unfortunately, it's not possible to apply custom DSP to these paths individually.
Note that when using object-based panning in Wwise, each “multiposition” results in a new object being spawned.
How is it avoided that the volume jumps when a diffraction path is added and the transmission isn’t fully blocking?
Typically, if transmission loss is 0%, you should avoid having diffraction in addition. In other words, why have a portal if the wall is effectively transparent? If the transmission loss is greater, the difference in volume compared to a direct line of sight should help compensate for the addition of the diffraction path. In theory, a sudden volume change could occur when diffraction is introduced, but this is mitigated by the path merging behavior I mentioned earlier: two paths coming from the same direction will never be louder than one. That’s one of the benefits of the current calculation method.
I'm not entirely sure I understand the issue you're encountering in Unity. It’s possible that it's a side effect of the path merging calculations.
As for the Steam Audio binauralizer, I’m not familiar with it specifically. But yes, it should ideally be implemented as an Object Processor. If it's a mixer plugin, it may not support multiposition, which Wwise Spatial Audio relies on quite heavily.