December 2021 - vickieGPT’s blog

December 18, 2021December 22, 2021

A weird thing in arm64 of operator << in gcc-11

I'm trying to do some log stuff in a Compiler project. When I'm trying to use the fmt::format library.

It was safe and sound to run with apple-clang 13, but when it comes to gcc-11 for the following line:

if ((x.second)->is_list_type()) {
    LOG(INFO) << fmt::format("{} : [{}]", x.first,
            ((ClassValueType *)((ListValueType *)x.second)->elementType)->className);
}

LogStream is something like:

class LogStream {
public:
    LogStream() { sstream_ = new std::stringstream(); }
    ~LogStream() = default;

    template <typename T> LogStream &operator<<(const T &val) noexcept {
        (*sstream_) << val;
        return *this;
    }

    friend class LogWriter;

private:
    std::stringstream *sstream_;
};

The operator << gets error reading the memory byte from the fmt byte, possibly because the author of GCC is not aware the pointer passed do not fit in the following ldur style of stream out. On x86 OSX machine, the GCC have some _M_is_leaked() check in the same line and on Windows MSVC, the line has reported the memory leakage for doubly linked pointer.

The compiled code is:

There's trick to maintain a compiler that have a universal error code output.

December 18, 2021March 24, 2022

Lustre 文件系统使用

最近帮学长跑实验，同时也是毕业论文的实验，用的 Lustre。然后又重新读了一遍古老的PLFS、PMFS论文。同时用的是AMD超算集群，最多可以到512台node。

[scb5090@ln131%bscc-a6 ~]$ lfs quota -h -u scb5090 /public1
Disk quotas for usr scb5090 (uid 6171):
     Filesystem    used   quota   limit   grace   files   quota   limit   grace
       /public1  19.13G    450G    500G       -   50865       0       0       -
uid 6171 is using default file quota setting
[scb5090@ln131%bscc-a6 ~]$ lfs quota -h -u scb5090 /public2
lfs quota: cannot resolve path '/public2': No such file or directory (2)
[scb5090@ln131%bscc-a6 ~]$ lfs quota -h -u scb5090 /public3
Disk quotas for usr scb5090 (uid 6171):
     Filesystem    used   quota   limit   grace   files   quota   limit   grace
       /public3      0k      0k      0k       -       0       0       0       -
uid 6171 is using default block quota setting
uid 6171 is using default file quota setting

大致想干的一件事是把一个读hdf5的程序在并行文件系统上的scalability。

December 16, 2021February 6, 2022

The measurement of TSO on M1 Max

We've seen that there's already a TSO Enabler for M1 and worked perfectly fine on M1 Max using kernel injection kmutil trigger-panic-medic --volume-root /Volumes/Data/Library/Extensions/TSOEnabler.kext just like Hackintosh.

TSOEnabler: A kernel extension that enables total store ordering on Apple silicon for specific Arm applications.https://t.co/h1wpexhxlQ

(the MSR if you want to start with reversing this is to look at references to S3_0_c15_c9_0)
— Longhorn (@never_released) July 30, 2020

So, I just want to check the famous OOTA problem on both TSO on and off.

Line 24 will be stuck without TSO and will work fine with TSO.

December 15, 2021November 15, 2023

SC23 Attendency

这次是带学弟们打比赛，4090快递小哥，运费自己掏，顺便问问cxl科研的人在干嘛。现实是很多组都已经拿到样品了，但是有很多bug

IA3 https://hpc.pnl.gov/IA3

Micro的见过很多次的头在讲gapbs怎么用PIM paradigm优化。gapbs workload。很像pnnl的推介会。

offload operator to PIM。

General view for SCC ShanghaiTech

基本都是坏的，电脑还是挺容易坏如果你不交钱。

沙特甜枣

SW/HW codesign for ML

SW/HW has great progress, especially offloading the operator into the ML, the insight is the inference could be approximated to fp4 even and all the operator is read only.

HPC Checkpoint Restore

MPI integration, they want offloading through DPU and compliant to checkpoint of data and Linux control flow. Add Debugagablity to MPI.

Parallel IO

SK Hynix Booth

CXL Booth

Chip inspection

December 7, 2021October 31, 2022

ISC22 code challenge

感觉这次的code challenge 必然会很卷，因为有GaTech/ETHz这种HPC强校，感觉就是NV想从我们学生这榨干点优化，正如罡兴投资激发的大家对网卡simd优化产生了兴趣。这篇就稍微记录下可优化的点。

Continue reading "ISC22 code challenge"

December 7, 2021February 12, 2022

ICON spack compile

# Copyright 2013-2021 Lawrence Livermore National Security, LLC and other
# Spack Project Developers. See the top-level COPYRIGHT file for details.
#
# SPDX-License-Identifier: (Apache-2.0 OR MIT)

# ----------------------------------------------------------------------------
# If you submit this package back to Spack as a pull request,
# please first remove this boilerplate and all FIXME comments.
#
# This is a template package file for Spack.  We've put "FIXME"
# next to all the things you'll want to change. Once you've handled
# them, you can save this file and test your package like this:
#
#     spack install icon
#
# You can edit this file again by typing:
#
#     spack edit icon
#
# See the Spack documentation for more information on packaging.
# ----------------------------------------------------------------------------

from spack import *
import os

class Icon(AutotoolsPackage):
    """ISC22 ICON package."""

    homepage = "https://hpcadvisorycouncil.atlassian.net/wiki/spaces/HPCWORKS/pages/2792161313/Getting+started+with+ICON+for+ISC22+SCC"
    url      = "http://localhost:31415/icon-2.6.4.tar.gz"

    maintainers = ['spedoske','victoryang00']

    version('2.6.4', sha256='1860028836d0894224ce301c3d0cb27a899548823267b08bf5c97ae696c3758d')

    depends_on('libxml2')
    depends_on('eccodes')
    depends_on('intel-oneapi-mkl')
    #if self.compiler.name == 'nvhpc':
    #depends_on('intel-oneapi-mpi')
    depends_on('netcdf-fortran')
    depends_on('netcdf-c')
    depends_on('hdf5+mpi+fortran+hl+szip')
    depends_on('zlib')
    depends_on('libszip')
    depends_on('mpi')
    depends_on('cuda')
    #depends_on('nvhpc')
    depends_on('serialbox')

    def configure_args(self):
        args = [#'--enable-art',
                '--enable-coupling',
                '--enable-serialization',
                '--enable-emvorado',
                '--enable-grib2',
                '--disable-yaxt',
                '--enable-sct',
                '--enable-ecrad',
                '--enable-rte-rrtmgp',
                '--enable-mpi',
                '--enable-gpu',
                '--disable-explicit-fpp'
                ]

        # if self.compiler.name == 'nvhpc':
        #    args.append('--enable-gpu')
        # else:
        #    args.append('--disable-gpu')

        #from print import pprint
        #pprint(vars(self.spec['nvhpc']),depth=4)

        #print(os.environ)

        os.environ['CC'] =  os.environ['MPICC']
        os.environ['CXX'] = os.environ['MPICXX']
        os.environ['F77'] = os.environ['MPIF77']
        os.environ['F90'] = os.environ['MPIF90']
        os.environ['FC'] =  os.environ['MPIF90']


        os.environ['MPICH_CC'] =  os.environ['MPICC']
        os.environ['MPICH_CXX'] = os.environ['MPICXX']
        os.environ['MPICH_F77'] = os.environ['MPIF77']
        os.environ['MPICH_F90'] = os.environ['MPIF90']
        os.environ['MPICH_FC'] =  os.environ['MPIF90']

        #nvhpc_bin = '/home/coffee/spack/opt/spack/linux-ubuntu20.04-zen2/oneapi-2021.4.0/nvhpc-21.9-ymnvxvr4e2q7475djr4lig7cbucg54uv/Linux_x86_64/21.9/compilers/bin'

        #os.environ['OMPI_CC'] =  os.path.join(nvhpc_bin,'nvc')
        #os.environ['OMPI_CXX'] = os.path.join(nvhpc_bin,'nvc++')
        #os.environ['OMPI_F77'] = os.path.join(nvhpc_bin,'nvfortran')
        #os.environ['OMPI_FC'] =  os.path.join(nvhpc_bin,'nvfortran')


        CPPFLAGS=["CPPFLAGS="]
        FCFLAGS=["FCFLAGS="]
        LDFLAGS=["LDFLAGS="]
        LIBS=["LIBS="]

        # zlib
        LDFLAGS.append('-L{}'.format(os.path.join(self.spec['zlib'].prefix,'lib')))
        LIBS.append('-lz')

        # hdf5
        CPPFLAGS.append('-I{}'.format(os.path.join(self.spec['hdf5'].prefix,'include')))
        FCFLAGS.append('-I{}'.format(os.path.join(self.spec['hdf5'].prefix,'include')))
        LDFLAGS.append('-L{}'.format(os.path.join(self.spec['hdf5'].prefix,'lib')))
        LIBS.append('-lhdf5hl_fortran -lhdf5_fortran -lhdf5')


        # netcdf-fortran
        FCFLAGS.append('-I{}'.format(os.path.join(self.spec['netcdf-fortran'].prefix,'include')))
        LDFLAGS.append('-L{}'.format(os.path.join(self.spec['netcdf-fortran'].prefix,'lib')))
        LIBS.append('-lnetcdff')

        # netcdf-c
        CPPFLAGS.append('-I{}'.format(os.path.join(self.spec['netcdf-c'].prefix,'include')))
        LDFLAGS.append('-L{}'.format(os.path.join(self.spec['netcdf-c'].prefix,'lib')))
        LIBS.append('-lnetcdf')

        # blas and lapack
        CPPFLAGS.append('-I{}'.format(os.path.join(self.spec['intel-oneapi-mkl'].prefix,'include')))
        LIBS.append('-lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl -lmpi -lxml2')

        # xml2
        CPPFLAGS.append('-I{}'.format(os.path.join(self.spec['libxml2'].prefix,'include/libxml2/')))

        # eccodes c/fc
        CPPFLAGS.append('-I{}'.format(os.path.join(self.spec['eccodes'].prefix,'include')))
        LDFLAGS.append('-L{}'.format(os.path.join(self.spec['eccodes'].prefix,'lib')))
        FCFLAGS.append('-I{}'.format(os.path.join(self.spec['eccodes'].prefix,'include')))
        LIBS.append('-leccodes -leccodes_f90')

        # serialbox
        LDFLAGS.append('-L{}'.format(os.path.join(self.spec['serialbox'].prefix,'lib')))
        FCFLAGS.append('-I{}'.format(os.path.join(self.spec['serialbox'].prefix,'include')))
        LIBS.append('-lSerialboxFortran')


        # LDFLAGS.append('-L{}'.format(os.path.join(self.spec['cuda'].prefix,'lib64')))
        # LIBS.append('-lcudart')

        # gpu
        #LDFLAGS.append('-L/usr/lib')
        #LIBS.append('-lstdc++ -lstdc++ -L/usr/lib')

        # gpu
        if self.compiler.name == 'nvhpc':
           LDFLAGS.append('-L{}'.format(os.path.join(self.spec['cuda'].prefix,'lib64')))
           LDFLAGS.append('-L/lib/x86_64-linux-gnu/ -lstdc++')
           LDFLAGS.append('-I{}'.format(os.path.join(self.spec['openmpi'].prefix,'lib')))
           FCFLAGS.append('-I{}'.format(os.path.join(self.spec['openmpi'].prefix,'lib')))
           LIBS.append('-lcudart')

        args.append(' '.join(CPPFLAGS))
        args.append(' '.join(FCFLAGS))
        args.append(' '.join(LDFLAGS))
        args.append(' '.join(LIBS))
        args.append('SB2PP='+self.spec['serialbox'].prefix+'/python/pp_ser/pp_ser.py')
        if self.compiler.name == 'nvhpc':
            depends_on('nvhpc')
            #args.append('MPI_LAUNCH='+self.spec['openmpi'].prefix+'/bin/mpirun')
            args.append("MPI_LAUNCH=/usr/bin/mpirun")
            args.append('NVCC=nvcc -std=c++11  -ccbin {} -allow-unsupported-compiler'.format(self.compiler.cxx))
        else:
            depends_on('intel-oneapi-mpi')
            args.append('MPI_LAUNCH='+self.spec['intel-oneapi-mpi'].prefix+'/mpi/2021.4.0/bin/mpirun')
            args.append('NVCC=nvcc -std=c++11  -ccbin {} -allow-unsupported-compiler'.format(self.compiler.cc))
        return args

    def setup_run_environment(self, env):
        env.set('PYTHONPATH', join_path(self.spec['serialbox'].prefix,'python'))

December 6, 2021November 15, 2023