SHORT ALU stalled by LDS

EVENTSET
ROCM0 ROCP_SQ_WAIT_INST_LDS
ROCM1 ROCP_SQ_WAVES
ROCM2 ROCP_GRBM_GUI_ACTIVE

METRICS
GPU ALD stalled 100*ROCM0*4/ROCM1/ROCM2

LONG
Formulas:
GPU ALD stalled = 100*ROCP_SQ_WAIT_INST_LDS*4/ROCP_SQ_WAVES/ROCP_GRBM_GUI_ACTIVE
--
The percentage of GPUTime ALU units are stalled by the LDS input queue 
being full or the output queue being not ready. If there are LDS bank 
conflicts, reduce them. Otherwise, try reducing the number of LDS 
accesses if possible. 
Value range: 0% (optimal) to 100% (bad).
