Adds libxsmm support for micro-GEMMs#569
Open
zhihao-deng wants to merge 2 commits into
Open
Conversation
…SMM=ON) Fetch+build libxsmm from source (no system install assumed) and route the small strided tensor-of-tensors GEMMs (ce+e, ce+ce, scale) through its JIT, falling back to vendor BLAS for shapes max(M,N,K)>64. Runtime toggle TA_LIBXSMM=0.
- Install libxsmm.a+headers into TA's prefix; split TiledArray_LIBXSMM into BUILD/INSTALL interfaces so the exported config has no build-tree leak. - Guard 64->32-bit narrowing of lda/ldb/ldc in libxsmm_gemm_le64. - Make the libxsmm sub-make parallelism configurable (LIBXSMM_BUILD_NJOBS). - Add a TA_LIBXSMM=1 CTest gate + a direct scale_libxsmm_dgemm numerical test.
evaleev
reviewed
Jun 22, 2026
| set(LIBXSMM_BUILD_BYPRODUCTS "${_LIBXSMM_INSTALL_DIR}/lib/libxsmm.a") | ||
| message(STATUS "custom target libxsmm is expected to build these byproducts: ${LIBXSMM_BUILD_BYPRODUCTS}") | ||
|
|
||
| ExternalProject_Add(libxsmm |
Member
There was a problem hiding this comment.
Does libxsmm have a CMake harness so that we can use FetchContent?
Contributor
Author
There was a problem hiding this comment.
Found that they support CMake now. Will adapt later
Member
There was a problem hiding this comment.
Actually, don't adapt yet, it may not be usable as a subproject. Safer to use ExternalProject_add (only for projects that we control directly it's better to use FetchContent)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an optional libxsmm fast path for the small strided ToT micro-GEMMs, enabled with
-DTA_LIBXSMM=ON(default OFF). When on, TiledArray fetches and builds libxsmm from source itself and routes these GEMM families through libxsmm's JIT, falling back to the vendor BLAS for any shape withmax(M,N,K) > 64:ce+ce—arena_strided_dgemm_ce_ce_{right,left}ce+e—arena_strided_dgemm_ce_etot_x_tandt_x_totregimes (Tensor scale path)A runtime switch
TA_LIBXSMM=0(off/false/no) routes every micro-GEMM back through the vendor BLAS without rebuilding.ta_testarena_strided_dgemm_suite+arena_einsum_unit_suitepass with libxsmm ON (JIT fires) and withTA_LIBXSMM=0— numerically equivalent to the vendor-BLAS path.