| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Games using D3D idioms can join images and samplers when a shader
executes, instead of baking them into a combined sampler image. This is
also possible on Vulkan.
One approach to this solution would be to use separate samplers on
Vulkan and leave this unimplemented on OpenGL, but we can't do this
because there's no consistent way of determining which constant buffer
holds a sampler and which one an image. We could in theory find the
first bit and if it's in the TIC area, it's an image; but this falls
apart when an image or sampler handle use an index of zero.
The used approach is to track for a LOP.OR operation (this is done at an
IR level, not at an ISA level), track again the constant buffers used as
source and store this pair. Then, outside of shader execution, join
the sample and image pair with a bitwise or operation.
This approach won't work on games that truly use separate samplers in a
meaningful way. For example, pooling textures in a 2D array and
determining at runtime what sampler to use.
This invalidates OpenGL's disk shader cache :)
- Used mostly by D3D ports to Switch
|
|
|
|
|
| |
This silences an assertion we were hitting and uses workgroup memory
barriers when the game requests it.
|
|\
| |
| | |
shader/other: Implement BAR.SYNC 0x0
|
| |
| |
| |
| |
| | |
Trivially implement this particular case of BAR. Unless games use OpenCL
or CUDA barriers, we shouldn't hit any other case here.
|
|/
|
|
|
|
|
|
|
|
|
| |
Hardware S2R special registers match gl_Thread*MaskNV. We can trivially
implement these using Nvidia's extension on OpenGL or naively stubbing
them with the ARB instructions to match. This might cause issues if the
host device warp size doesn't match Nvidia's. That said, this is
unlikely on proper shaders.
Refer to the attached url for more documentation about these flags.
https://www.khronos.org/registry/OpenGL/extensions/NV/NV_shader_thread_group.txt
|
|
|
|
|
| |
This allows us to use native SPIR-V instructions without having to
manually check for NAN.
|
|\
| |
| | |
shader/texture: Support multiple unknown sampler properties
|
| | |
|
|/ |
|
|
|
|
|
|
|
|
| |
Implements a reduction operation. It's an atomic operation that doesn't
return a value.
This commit introduces another primitive because some shading languages
might have a primitive for reduction operations.
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| | |
Partially implement Indexed samplers in general and specific code in GLSL
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
ATOM operates atomically on global memory. For now only add ATOM.ADD
since that's what was found in commercial games.
This asserts for ATOM.ADD.S32 (handling the others as unimplemented),
although ATOM.ADD.U32 shouldn't be any different.
This change forces us to change the default type on SPIR-V storage
buffers from float to uint. We could also alias the buffers, but it's
simpler for now to just use uint. While we are at it, abstract the code
to avoid repetition.
|
| |
|
| |
|
|
|
|
|
|
|
| |
This commit introduces a mechanism by which shader IR code can be
amended and extended. This useful for track algorithms where certain
information can derived from before the track such as indexes to array
samplers.
|
| |
|
|
|
|
| |
Implement using memoryBarrier in GLSL and OpMemoryBarrier on SPIR-V.
|
| |
|
| |
|
|\
| |
| | |
Implement FLO & TXD Instructions on GPU Shaders
|
| | |
|
| | |
|
|/
|
|
|
| |
Instead of specializing shaders to separate texture buffers from 1D
textures, use the locker to deduce them while they are being decoded.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Bindless textures were using u64 to pack the buffer and offset from
where they come from. Drop this in favor of separated entries in the
struct.
Remove the usage of std::set in favor of std::list (it's not std::vector
to avoid reference invalidations) for samplers and images.
|
|
|
|
| |
Allows usages of the constructor to avoid an unnecessary copy.
|
|
|
|
|
|
| |
In the process remove implementation of SUATOM.MIN and SUATOM.MAX as
these require a distinction between U32 and S32. These have to be
implemented with imageCompSwap loop.
|
|
|
|
|
| |
* Implement SULD as float.
* Remove conditional declaration of GL_ARB_shader_viewport_layer_array.
|
|\
| |
| | |
shader_ir/warp: Implement SHFL for Nvidia devices
|
| | |
|
|\ \
| |/
|/| |
shader_ir: Implement shared memory
|
| |
| |
| |
| |
| | |
This instruction writes to a memory buffer shared with threads within
the same work group. It is known as "shared" memory in GLSL.
|
| | |
|
| | |
|
|/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics
Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.
To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:
* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true
ballotARB, also known as "uint64_t(activeThreadsNV())", emits
VOTE.ANY Rd, PT, PT;
on nouveau's compiler. This doesn't match exactly to Nvidia's code
VOTE.ALL Rd, PT, PT;
Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
|
|
|
|
|
| |
This commit takes care of implementing the F16 Variants of the
conversion instructions and makes sure conversions are done.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Hardware testing revealed that SSY and PBK push to a different stack,
allowing code like this:
SSY label1;
PBK label2;
SYNC;
label1: PBK;
label2: EXIT;
|
|
|
|
|
|
|
| |
Reflect std::shared_ptr nature of Node on initializers and remove
constant members in nodes.
Add some commentaries.
|
|
Analysis passes do not have a good reason to depend on shader_ir.h to
work on top of nodes. This splits node-related declarations to their own
file and leaves the IR in shader_ir.h
|