| Commit message (Collapse) | Author | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ImageFetch offsets for 2D array coordinates have a different composite size than the coordinates. The rescaling pass was not taking this into account.
Fixes broken shaders when scaling is enabled in Astral Chain, and likely other titles.
|
|
|
|
Thanks to @asLody for optimizing this function. This raised the focus that this function should be optimized more.
The current table assumes that the host GPU is able to invert for free, so only AND,OR,XOR are accumulated in the performance metrik.
Performance results:
Instructions
0: 8
1: 30
2: 114
3: 80
4: 24
Latency
0: 8
1: 30
2: 194
3: 24
|
|
|
|
|
|
|
|
Some drivers do not support 64-bit atomics, and fallback to atomically modifying U32x2 vectors. This change ensures that U32x2 storage vectors are defined in the spir-v shader when 64-bit atomics are used.
Fixes a hang on some devices, notably Intel GPUs, when booting Pokemon Legends Arceus
|
|
Fixes Transform Feedback on Vulkan AMD drivers.
|
|
Used by Pokemon Legends: Arceus
|
|
Since ConvertLegacyToGeneric has a void return value, there's nothing
that is actually returned by the function.
|
|
Found by static analysis with PVS-Studio. Original check wasn't actually checking for OOB and would segfault in case of it.
|
|
... to common/logging/formatter.h
|
|
|
|
|
|
|
|
|
|
some drivers have a bug bitwise converting floating point cbuf values to uint variables. This adds a workaround for these drivers to make all cbufs uint and convert to floating point as needed.
|
|
|
|
Works around an nvidia driver bug, where casting the integer attributes to float and back to an integer always returned 0.
|
|
|
|
|
|
GetAttribute expects an F32 result type at the IR level, this fixes the return value of attributes which were not returning an F32
|
|
|
|
|
|
|
|
|
|
emit_spirv.h is included in video_core, which was propagating further includes that video_core did not depend on.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Plus some code deduplication
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thanks for everything!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ensures all drivers behave the same way in this case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nested demote branches add complexity with combining the condition if it has not been initialized yet. Skip them for the time being.
|
|
This is only needed on select drivers when a fragment shader discards/demotes.
|
|
Some drivers misread data when demotes are interleaved in the program. This moves demote branches to be checked at the end of the program.
Fixes "wireframe" issue in Pokemon SwSh on some drivers
|
|
The IR expects GetAttribute to return an F32 value. This case was returning a U32 instead.
|
|
|
|
Simplifies the code a bit when possible. These instructions should be
no-ops codegen wise.
|
|
Fixes instances where fp16 types are not declared on SPIR-V but they are
used. This shouldn't happen on master, as it's been uncovered by an
additional optimization pass.
|
|
Ensures that exception construction is always explicit.
|
|
|
|
We can use the <exception> header instead of pulling in all of the
exception-style classes.
|
|
This should be LINES_ADJACENCY
|
|
|
|
[[nodiscard]] doesn't do anything on functions with a void return type
and causes superfluous warnings.
|
|
This previously duplicated the case of the PBK case above it.
|
|
Prevents undefined behavior from occurring.
|
|
Fold shaders doing "a * b + c" on integers from the pattern generated by
Nvidia's GL compiler.
On a somewhat complex compute shader it reduces the code size by 16
instructions from 2 matches on Turing GPUs.
On Intel as extracted from KHR_pipeline_executable_properties:
Before the optimization:
```
Instruction Count: 2057
Basic Block Count: 45
Scratch Memory Size: 14752
Spill Count: 232
Fill Count: 261
SEND Count: 610
Cycle Count: 11325
```
After the optimization:
```
Instruction Count: 2046
Basic Block Count: 44
Scratch Memory Size: 13728
Spill Count: 219
Fill Count: 268
SEND Count: 604
Cycle Count: 11367
```
|
|
Simplify a bit the logic.
|
|
|
|
Support ignoring immediate out of bound writes. Writing dynamically out
of bounds is not yet supported (e.g. R0+0x4).
Reading out of bounds yields zero. This is supported checking for the
size from the IR; if the input is immediate, the optimization passes
will drop it.
|
|
|
|
|
|
Adheres to GL_ARB_separate_shader_objects requirements
|
|
|
|
|
|
|
|
Silences the following warnings-turned-errors:
-Wsign-conversion
-Wunused-private-field
-Wbraced-scalar-init
-Wunused-variable
And some other errors
|
|
|
|
|
|
|
|
|
|
Fix regression on Fire Emblem: Three Houses when using native fp16.
|
|
|
|
|
|
|
|
|
|
account for the fact that program.*memory_size is in units of bytes.
|
|
Used by MH:Rise
|
|
Fixes OpenGL.
|
|
|
|
|
|
|
|
|
|
Put all varyings into a single std::bitset with helpers to access it.
Implement passthrough geometry shaders using host's.
|
|
|
|
|
|
|
|
Useful for mobile and Intel Xe devices.
|
|
|
|
|
|
|
|
This ensures the original operand values are not overwritten when being used in the overflow detection.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fixes rendering in Devil May Cry without regressing Ori and the Blind Forest.
|
|
|
|
Fixes shader compilation in Okami HD
|
|
|
|
Atomic operations are considered to have both read and write access. This was not being accounted for.
|
|
WAR for AMD reading zeroes on uniform buffers of size 2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fixes Ori and the blind forest title screen
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
plus some minor refactoring of implementations
|
|
|
|
|
|
|
|
|
|
and wip nv thread shuffle impl
|
|
|
|
|
|
Fix for SULD.D
|
|
|
|
|
|
|
|
along with some more cleanup/oversight fixes
|
|
|
|
|
|
|
|
|
|
and more cleanup
|
|
|
|
|
|
needed for HW:AoC.
|
|
along with some other misc changes and fixes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
and add some more separation in the shader for better debugability when dumped
|
|
|
|
|
|
and implement misc getters
|
|
|
|
|
|
SSBU now working
|
|
|
|
|
|
|
|
|
|
and missed a diff in emit_glsl relating to var alloc ref counting
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
fixes font rendering issues as these were used to index into the ssbos
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
to fix Loop control flow.
|
|
|
|
|
|
plus some other misc additions/changed
|
|
and many other misc implementations
|
|
|
|
|
|
|
|
|
|
Logic for ordered/unordered ops was wrong.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
based on glasm with some tweaks
|
|
|
|
|
|
|
|
|
|
Also add a setting for enable Nsight Aftermath.
|
|
|
|
The lod query functions exposed by the rendering API's do not make use of the texturearray layer indexing.
|
|
The sign bit on integers of size < 32 was not properly preserved in casts
|
|
BitCast U32 to S32 before converting to float on drivers with broken
signed operations.
|
|
Fixes DOOM 2016 missing local memory
|
|
|
|
Used by Claybook.
|
|
|
|
|
|
Increases performance significantly on certain titles.
|
|
|
|
|
|
|
|
|
|
"Negative" offsets don't exist. They are shown as such due to a bug in
nvdisasm.
Unaligned offsets have been proved to read the aligned offset. For
example, when reading an U32, if the offset is 6, the offset read will
be 4.
|
|
|
|
|
|
Fixes Ori and the Blind Forest's menu on GLASM. For some reason
(probably high level optimizations) it is not sanitized on SPIR-V for
OpenGL. Vulkan is unaffected by this change.
|
|
Fixes ubsan issue.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Causes regressions on Bowser's Fury.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Add support for null registers. These are used when an instruction has
no usages.
This comes handy when an instruction is only used for its CC value, with
the caveat of having to invalidate all pseudo-instructions before
defining the instruction itself in the register allocator. This commits
changes this.
Workaround a bug on Nvidia's condition codes conditional execution using
branches.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reorder them to the bottom of the file for readability.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
With this, Luigi's Mansion's sand renders properly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Immediate condition refs where not handled correctly. Just move the
value for now.
|
|
Fixes the identity removal pass.
|
|
|
|
|
|
|
|
|
|
Remove lod clamp from texture instructions with lod, as this is not
needed (nor supported).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fixes members of unnamed union not being accessible, and one function
without a declaration.
|
|
|
|
|
|
Silence unused variable warning
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This commit regresses VertexA shaders, their transformation pass has to
be adapted to the new control flow.
|
|
StorageAtomicExchangeU64 is failing test seemingly due to failure storing 64-bit
result into the register
|
|
Use a struct constructor to serialize register allocation arguments to
ensure registers are allocated in the same order regardless of the
compiler used.
The A and B functions can be called in any order when passed as
arguments to "foo":
foo(A(), B())
But the order is guaranteed for curly-braced constructor calls in
classes:
Foo{A(), B()}
Use this to get consistent behavior.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
also fixes ADD and SUB to use U modifier
|
|
|
|
Along with implementations of common instructions along the way
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This is enabled by an extension instead of the capability.
|
|
|
|
|
|
|
|
Workaround bug on Nvidia's OpenGL SPIR-V compiler when using unsigned
texture offsets.
|
|
|
|
Workaround more bugs on Nvidia's OpenGL SPIR-V compiler.
|
|
|
|
|
|
Worksaround a bug on Nvidia's OpenGL SPIR-V compiler where names are
used for name matching.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Our unit tests were hitting this exception.
|
|
|
|
|
|
|
|
Compute shaders spill uniform buffers on storage buffers, increasing the
expected number.
|
|
|
|
|
|
|
|
|
|
|
|
Avoid using std::array to fix Intellisense not properly compiling this
code and disabling itself on all files that include it.
While we are at it, change the code to use u8 instead of size_t for the
number of instructions in an opcode.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Find sibling node containing a nephew searching from the nephew itself
instead of the uncle.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
When we can't track the SSBO origin of a global memory instruction,
leave it as a global memory operation and assume these pointers are in
the NVN storage buffer slots, then apply a linear search in the shader's
runtime.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fix two bugs in BFI.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Throw when other instructions are missing CC.
|
|
|
|
|
|
|
|
|
|
Mostly fixing unused *, implicit conversion, braced scalar init,
fpermissive, and some others.
Some Clang errors likely remain in video_core, and std::ranges is still
a pertinent issue in shader_recompiler
shader_recompiler: cmake: Force bracket depth to 1024 on Clang
Increases the maximum fold expression depth
thread_worker: Include condition_variable
Don't use list initializers in control flow
Co-authored-by: ReinUsesLisp <reinuseslisp@airmail.cc>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
It may generate better code on some compilers and it's easier to handle.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This is needed because pseudo-instructions where invalidated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This could potentially leave unvisited blocks, leading to illegal phi
nodes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Also add a missing const on DADD
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Also fix oversight with adding SignedZeroInfNanPreserve execution mode.
|
|
|
|
|
|
|
|
And add a const in FCMP
|
|
|
|
still need to configure some settings for NV denorm flush and intel NaN
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|