// silently miscompiled the M=1 path even though the WMMA template was never // instantiated for M=1. Hipcc's optimizer appears to scope some decisions // at the TU level (likely register file / SGPR ...