in ,

Nvidia Drops Support for CUDA on macOS, Hacker News


                         

               

                     

1. CUDA Toolkit Major Components

                  

                     

This section provides an overview of the major components of the CUDA Toolkit and points                         to their locations after installation.                      

                     

Compiler

                        

The CUDA-C and CUDA-C compiler,nvcc, is found in the                            bin /directory . It is built on top of the NVVM optimizer,                            which is itself built on top of the LLVM compiler infrastructure. Developers who                            want to target NVVM directly can do so using the Compiler SDK, which is                            available in thenvvm /directory.                         

                        

Please note that the following files are compiler-internal and subject to change without any prior notice.                              

    (any file ininclude / crtandbin / crt                              

  • (include / common_functions.h),include / device_double_functions.h,include / device_functions.h,include / host_config.h,include / host_defines.h, andinclude / math_functions.h
  •                               

  • (nvvm / bin / cicc)
  •                               

  • (bin / cudafe ) ,bin / bin2c, andbin / fatbinary
  •                            

                        

Tools

                        

The following development tools are available in thebin /directory (except                            for Nsight Visual Studio Edition (VSE) which is installed as a plug-in to Microsoft                            Visual Studio, Nsight Compute and Nsight Systems are available in a separate                            directory).                              

  • IDEs:nsight(Linux, Mac), Nsight VSE (Windows)                               
  •                               

  • Debuggers:cuda-memcheck,cuda-gdb                                 (Linux), Nsight VSE (Windows)                               
  •                               

  • Profilers: Nsight Systems, Nsight Compute,nvprof,                                  nvvp, Nsight VSE (Windows)                               
  •                               

  • Utilities:cuobjdump,nvdisasm
  •                            

                        

Libraries

                        

The scientific and utility libraries listed below are available in thelib /                           directory (DLLs on Windows are inbin /), and their interfaces                            are available in theinclude /directory.                              

    cublas(BLAS)                                                             

  • cublas_device(BLAS Kernel Interface)                               
  •                               

  • cuda_occupancy(Kernel Occupancy Calculation [header file implementation])                               
  •                               

  • (cudadevrt(CUDA Device Runtime)                               
  •                               

  • (cudart) (CUDA Runtime)                               
  •                               

  • (cufft(Fast Fourier Transform [FFT])                               
  •                               

  • (cupti) (CUDA Profiling Tools Interface)                               
  •                               

  • (curand) (Random Number Generation)                               
  •                               

  • (cusolver) (Dense and Sparse Direct Linear Solvers and                                  Eigen Solvers)                               
  •                               

  • (cusparse(Sparse Matrix)                               
  •                               

  • (libcu (CUDA Standard C Library)                               
  •                               

  • (nvJPEG(JPEG encoding / decoding)                               
  •                               

  • (npp) (NVIDIA Performance Primitives [image and signal processing])                               
  •                               

  • nvblas(“Drop-in” BLAS)                               
  •                               

  • (nvcuvid(CUDA Video Decoder [Windows, Linux])                               
  •                               

  • (nvgraph(CUDA nvGRAPH [accelerated graph analytics])                               
  •                               

  • (nvml) (NVIDIA Management Library)                               
  •                               

  • (nvrtc) (CUDA Runtime Compilation)                               
  •                               

  • (nvtx) (NVIDIA Tools Extension)                               
  •                               

  • (thrust) (Parallel Algorithm Library [header file implementation])                               
  •                            

                        

CUDA Samples

                        

                           

Code samples that illustrate how to use various CUDA and library APIs are                               available in thesamples /directory on Linux and Mac, and are                               installed toC: ProgramData NVIDIA Corporation CUDA Sampleson                               Windows. On Linux and Mac, thesamples /directory is read-only                               and the samples must be copied to another location if they are to be modified.                               Further instructions can be found in theGetting Started Guidesfor                               Linux and Mac.                            

                        

                        

Documentation

                        

                           

The most current version of these release notes can be                               found online athttp://docs.nvidia.com/ cuda / cuda-toolkit-release-notes / index.html. Also, theversion.txtfile                               in the root directory of the toolkit will contain the version and build number of                               the installed toolkit.                            

                        

                        

                           

Documentation can be found in PDF form in thedoc / pdf /directory,                               or in HTML form atdoc / html / index.htmland online athttp://docs.nvidia.com/cuda/index.html.                            

                        

                        

CUDA Driver

                        

                           

Running a CUDA application requires the system with at least one CUDA capable GPU                               and a driver that is compatible with the CUDA Toolkit. SeeTable 1. For more                               information various GPU products that are CUDA capable, visithttps://developer.nvidia.com/cuda-gpus.                            

                           

Each release of the CUDA Toolkit requires a minimum version of the CUDA driver.                               The CUDA driver is backward compatible, meaning that applications compiled against                               a particular version of the CUDA will continue to work on subsequent (later)                               driver releases.                            

                           

More information on compatibility can be found athttps://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#cuda-runtime-and-driver- api-version.                            

                        

                        

                                                    

                        

                           

For convenience, the NVIDIA driver is installed as part of the CUDA Toolkit                               installation. Note that this driver is for development purposes and is not                               recommended for use in production with Tesla GPUs.                            

                           

For running CUDA applications in production with Tesla GPUs, it is recommended to                               download the latest driver for Tesla GPUs from the NVIDIA driver downloads site at                               http: //www.nvidia. com / drivers.                                                           

                        

                        

                           

During the installation of the CUDA Toolkit, the installation of the NVIDIA                               driver may be skipped on Windows (when using the interactive or silent                               installation) or on Linux (by using meta packages).                            

                           

For more information on customizing the install process on Windows, seehttp://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#install- cuda-software.                            

                           

For meta packages on Linux, seehttps://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#package-manager-metas

                        

                        

CUDA-GDB Sources

                        

CUDA-GDB sources are available as follows:

                        

                              

  • For CUDA Toolkit 7.0 and newer, in the installation directoryextras /.                                  The directory is created by default during the toolkit installation                                  unless the.rpm (or. debpackage                                  installer is used. In this case, thecuda-gdb-src                                 package must be manually installed.                               
  •                               

  • For CUDA Toolkit 6.5, 6.0, and 5.5, at (https://github.com/NVIDIA/cuda-gdb) .                               
  •                               

  • For CUDA Toolkit 5.0 and earlier, atftp://download.nvidia.com/CUDAOpen 64 /.                               
  •                               

  • Upon request by sending an e-mail tomailto: [email protected].                               
  •                            

                     

               

               

                     

2. CUDA 10 2 Release Notes

                                     

                        

2.1. General CUDA

                     

                        

  • Added support for CUDA Virtual Memory Management APIs.                                                           
  •                            

  •                               10 .2 now includeslibcu , a parallel standard C library for GPUs.                                                           
  •                            

  •                               

                                        The following                                  new operating systems are supported by CUDA. See the System Requirements section in                                  the NVIDIA CUDA InstallationGuidefor Linux for a full                                  list of supported operating systems.                                  Note that support for RHEL 6.x is deprecated and support will be dropped in the next release of CUDA.                               

                                     

      (Fedora)
  •                                  

  • Red Hat Enterprise Linux (RHEL) 7.x and 8.x
  •                                  

  • OpenSUSE 15 .x
  •                                  

  • SUSE SLES 12 4 and SLES 15 .x
  •                                  

  • Ubuntu 16. 04 .6 LTS and Ubuntu 18. .3 LTS
  •                               

                           

  • CUDA 10 .2 (Toolkit and NVIDIA driver) is the last release to support macOS for developing and running CUDA applications.                               Support for macOS will not be available starting with the next release of CUDA.                            
  •                            

  • CUDA Graphs APIs now support updates to node parameters in instantiated graphs.
  •                            

  • CUDA 10 .2 includes new interop APIs (NVSci *libraries for buffer allocation,                               synchronization, and streaming APIs). These are beta and the APIs may change in future CUDA releases.                            
  •                            

  • The 1D linear texture size limit supported for Maxwell (ie GM (x ) GPUs in CUDA is now 2 ^ 28 (up from 2 ^
  •                         

                      

                                         

                            

    2.3. CUDA Libraries

                         

                                                     

                                  This release of the CUDA                            toolkit is packaged with libraries that deliver new and extended functionality, bug fixes,                            and performance improvements for single and multi-GPU environments.                         

                            

    Also in this release thesonameof the libraries has been modified                            to not include the minor toolkit version number. For example, the cuFFT library                            sonamehas changed fromlibcufft.so. 10 1to                            libcufft.so. 10. This is done to facilitate any future library updates                            that do not include API breaking changes without the need to relink.                         

                         

                         

                               

    2.3.1. cuBLAS Library

                            

                               

    • Improved the performance on some large and other GEMM sizes (mostly M * N100) due to increased internal workspace                                  size.                               
    •                            

                         

                         

                               

    2.3.2. cuSOLVER Library

                            

                               

    • cusolverMgGetrfandcusolverMgGetrsare added in cusolverMg library to support multiGPU LU.                               
    •                               

    • A new Tensor Cores Accelerated Iterative Refinement Solver (TCAIRS) is introduced. This is a linear solver to solve                                  AX=Band is similar to the LAPACKXgesv(orXgetrf Xgetrs) functions, but is different in that                                  it uses reduced precision internally for acceleration and then refines the solution to achieve the corresponding accuracy                                  of the solution.                                  This solver support real and complex data types as well as single and multiple RHS systems of equations. Depending on the                                  problem size and data types used,                                  the observed speedup could reach over 5X. There are two types of APIs to access this solver:                                                                      
      •                                        The basic user friendly LAPACK style APIs:                                          
        • cusolverDnDgesv_bufferSize
        •                                           

        • cusolverDnDgesv
        •                                        

        Where P1 is the final solution precision and P2 is the lowest precision used in the solver.                                           For ExamplecusolverDnZKgesv_bufferSizewill use tensor core accelerated half precision compute while the final solution will be double precision compute accurate.                                        

                                            

      •                                  

    •                               

    •                                  Added the following experimental expert APIscusolverDnIRSXgesv,cusolverDnIRSXgesv_bufferSizeand related APIs.                               
    •                            

                         

                         

                               

    2.3.3. cuFFT Library

                            

                               

    • Improved the performance and scalability for the following use cases:                                                                      
      •                                        

        multi-GPU non-power of 2 transforms

                                            

      •                                     

      •                                        

        R2C and Z2D odd sized transforms

                                            

      •                                     

      •                                        

        2D transforms with small sizes and large batch counts

                                            

      •                                  

    •                            

                         

                         

                               

    2.3.4. CUDA Math Library

                            

                               

    • Added two absolute value APIs for half-precision__ halfand (__ half2) data types:habs,habs2.                               
    •                               

    • Improved performance and accuracy in the following math functions:tanhf, round, roundf, erf, erff, sinf, cosf, sincosf, tanf, sinpif, cospif, sincospif, j0f, j1f, y0f, y1f
    •                            

                         

                      

                      

                            

    2.4. Deprecated Features                            

                         

                            

    The following features are deprecated in the current release of the CUDA software. The                            features still work in the current release, but their documentation may have been                            removed, and they will become officially unsupported in a future release. We recommend                            that developers employ alternative solutions to these features in their software.                            

    General CUDA

                                  

                                     

    • Support for RHEL 6.x is deprecated with CUDA 10 .1. It will be dropped                                        in the next release of CUDA. Customers are encouraged to adopt RHEL 7.x to                                        use new versions of CUDA.                                     
    •                                     

    • Microsoft Visual Studio versions 2011, 2012 and 2013 are now deprecated as host compilers                                        for nvcc. Support for these compilers may be removed in a future release of                                        CUDA.                                     
    •                                     

    •                                        

      Support for the following compute capabilities are deprecated in CUDA 10 2. Note that support for                                           these compute capabilities may be removed in a future release of CUDA.                                        

                                                

        (SM _) (Kepler)
    •                                            (sm _) (Kepler)                                           (sm _) (Maxwell)                                       

    Support for KeplerSM _ 30architecture based products will be dropped starting with the next release of CUDA.                                        

                                           

    For more information on GPU products and compute capability, see thispage

                                                                            

  • For WMMA operations with floating point accumulators, the                                        satf(saturate -to-finite value) mode parameter is                                        deprecated. Using it can lead to unexpected results. Seehttp://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#wmma-description                                       for details.                                     
  •                                     

  • Support for Linux cluster packages is deprecated and may be dropped in a future release of CUDA.
  •                                     

  • CUDA 10 .2 (Toolkit and NVIDIA driver) is the last release to support macOS for developing and running CUDA applications.                                        Support for macOS will not be available starting with the next release of CUDA.                                     
  •                                  

                                  

    CUDA Libraries General

                                  

                                     

    • The nvGRAPH library is deprecated. The library will no longer be                                        shipped in future releases of the CUDA toolkit.                                     
    •                                     

    • Support for Kepler and Maxwell architectures (compute capabilitysm _ 35throughsm _ 52) is deprecated.                                     
    •                                     

    • NPP Compression Primitives (JPEG Encode / Decode) are being deprecated and will be removed in the next release.                                        Users of these functions are encouraged to use the nvJPEG library.                                     
    •                                  

                                  

    CUDA Libraries – NPP

                                  

                                     

    • NPP Compression Primitives (JPEG Encode / Decode) are being deprecated and will be removed in the next release.                                        Users of these functions are encouraged to use the nvJPEG library.                                     
    •                                  

                                  

    CUDA Libraries – nvJPEG

                                  

                                     

    • Below APIs are deprecated in this release:                                                                                
      • nvjpegStatus_t NVJPEGAPI nvjpegDecodePhaseOne
      •                                           

      • nvjpegStatus_t NVJPEGAPI nvjpegDecodePhaseTwo
      •                                           

      • nvjpegStatus_t NVJPEGAPI nvjpegDecodePhaseThree
      •                                           

      • nvjpegStatus_t NVJPEGAPI nvjpegDecodeBatchedPhaseOne
      •                                           

      • nvjpegStatus_t NVJPEGAPI nvjpegDecodeBatchedPhaseTwo
      •                                           

      • nvjpegStatus_t NVJPEGAPI nvjpegDecodeBatchedPhaseThree
      •                                        

      The functionality provided by these APIs will be offered by new functions in the next release.

                                          

    •                                  

                                  

    CUDA Libraries – cuSOLVER

                                  

                                     

    •                                        The 32 – bit API of cusolverMg multi-GPU library will be removed in the next release. Instead, a 64 – bit API will be adopted                                        for the                                        following:                                                                                
      • Symmetric eigensolver routinescusolverMgSyevd ()
      •                                           

      • LU factorization and solver routinescusolverMgGetrfandcusolverMgGetrs.                                           
      •                                        

    •                                     

    •                                        The expert interfacecusolverDnIRSXgesvof the TCAIRS solver and its helper functions which are proposed as                                        experimental expert APIs in this release will undergo minor changes. The basic user friendly API will remain the same.                                        In summary:                                                                                
      • The expert API will remove the input_data_type from its API since it is part of the structure Params.
      •                                           

      • ThecusolverDnIRSInfosXXXXhelpers function will no longer need the Params into their arguments.                                           
      •                                           

      • All instances ofcudaDataTypewill be replaced bycusolverPrecType_t.                                           
      •                                        

    •                                  

                               

                         

                      

                      

                            

    2.5. Resolved Issues                            

                         

                               

    2.5.1. CUDA Compilers

                            

                               

    • Fixed an issue where ptxas in some cases may optimize arithmetic shifts incorrectly.
    •                               

    •                                  Fixed a crash during compilation when using– DCONSTEXPR=constexprwith the– expt-relaxed-constexproption.                                                                 
    •                               

    • Added documentation for the– timeflag for nvcc. This flag can be used to measure the time                                  take by nvcc and internal sub-components. Seenvcc –helpfor details.                               
    •                               

    • Fixed an issue where ptxas crashes when assembling a PTX file with DWARF debug info generated by clang.
    •                            

                         

                         

                               

    2.5.2. CUDA Libraries

                            

                               

    The following issues have been resolved across the CUDA Libraries.                                                              

    CUDA Libraries – NPP

                                     

                                        

    • Fixed a race condition within NPP if user provided a custom created CUDA stream (default or non-blocking).
    •                                        

    • Fixed an issue where incorrect values ​​were returned by NPP Histogram helper functions. These values ​​are used for defining                                           the                                           size of side buffers used by NPP Histogram main functions.                                        
    •                                        

    • NPP does not support non-blocking streams on Windows for devices working in WDDM mode.
    •                                     

                                     

    CUDA Libraries – nvJPEG

                                     

                                        

    • Fixed an issue withNVJPEG_BACKEND_HYBRIDbackend when restart markers are enabled.                                        
    •                                     

                                     

    CUDA Libraries – cuSOLVER

                                     

                                        

    • Resolved a conflict of symbols betweenliblapack_static.aandlibf2c.                                        
    •                                        

    • Resolve missingGKlib / string.oinlibmetis_static.a
    •                                     

                                     

    CUDA Libraries – cuBLAS

                                     

                                        

    • Resolved an issue where CUDA Graph capture with cuBLAS routines on multiple concurrent streams would have caused hangs or                                           data corruption in some cases.                                        
    •                                        

    • Resolved an issue where strided batched GEMM routines can cause misaligned read errors.
    •                                     

                                     

    CUDA Libraries – cuFFT

                                     

                                        

    •                                           

      Added missing documentation for the following functions:                                              

                                                      

      • cufftXtExecDescriptorR2C
      •                                                 

      • cufftXtExecDescriptorC2R
      •                                                 

      • cufftXtExecDescriptorZ2D
      •                                                 

      • cufftXtExecDescriptorD2Z
      •                                              

                                             

    •                                        

    • Resolved an issue where multi-GPU supported functionality omitted in-place restriction for all FFT plan types.
    •                                        

    • Refer to thispagefor the temporary restriction on C2R / Z2D plans.                                        
    •                                     

                                     

    CUDA Libraries – cuRAND

                                     

                                        

    • Starting with CUDA 10 ordering of random numbers returned byMTGP 32andMRG 32 k3agenerators are no longer                                           the same as previous releases despite being guaranteed by the documentation for theCURAND_ORDERING_PSEUDO_DEFAULTsetting.                                           This issue will be addressed in the next release by providing a new non-default option which returns the same ordering                                           as previous                                            releases and the default option will continue to provide the best performance option.                                        
    •                                     

                                  

                            

                         

                      

                                      

                                                   


    Brave Browser
    Read More
    Payeer

    What do you think?

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    GIPHY App Key not set. Please check settings

    England in New Zealand: BJ Watling hits 205 as tourists face battle to save Test – BBC News, BBC News

    England in New Zealand: BJ Watling hits 205 as tourists face battle to save Test – BBC News, BBC News

    Cryptoqueen: How this woman scammed the world, then ran, Hacker News

    Cryptoqueen: How this woman scammed the world, then ran, Hacker News