# ######################################################################## # Copyright 2013 Advanced Micro Devices, Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ######################################################################## clFFT Readme Version: 1.10 Release Date: April 2013 ChangeLog: ____________ Current Version: * This release tested using the 9.012 runtime driver and the 2.8 APPSDK ____________ Version 1.8.291: Fixed: * Memory leaks affecting use cases where 'clfftEnqueueTransform' is used in a loop ____________ Version 1.8.269 (beta): New: * clFFT now supports real-to-complex and complex-to-real transforms; refer to documentation for details * This release tested using the 12.4 Catalyst software suite Known Issues: * Some degradation in performance of real transforms due to known runtime/driver issues * Failures in real transforms have been seen on 7xxx series GPUs with certain problem sizes involving powers of 3 and 5 ____________ Version 1.6.244: Fixed: * Failures observed in v1.6.236 in backward transforms of certain power of 2 (involving radix 4 and radix 8) problem sizes. ____________ Version 1.6.236: New: * Performance of the FFT library has been improved for Radix-2 1D and 2D transforms * Support for R4XXX GPUs is deprecated and no longer tested * Preview: Support for AMD Radeon™ HD7000 series GPUs * This release tested using the 8.92 runtime driver and the 2.6 APP SDK ____________ Version 1.4: New: * clFFT now supports transform lengths whose factors consist exclusively of powers of 2, 3, and 5 * clFFT supports double precision data types * clFFT executes on OpenCL 1.0 compliant devices * This release tested using the 8.872 runtime driver and the 2.5 APP SDK * A helper bash script appmlEnv.sh has been added to the root installation directory to assist in properly setting up a terminal environment to execute clFFT samples Fixed: * If the library is required to allocate a temporary buffer, and the user does not specify a temporary buffer on the Enqueue call, the library will allocate a temporary buffer internally and the lifetime of that temporary buffer is managed by the lifetime of the FFT plan; deleting the plan will release the buffer. * Test failures on CPU device for 32-bit systems (Windows/Linux) Known Issues: * Failures have been seen on graphics cards using R4550 (RV710) GPUs. ____________ Version 1.2: New: * Reduced the number of internal LDS bank conflicts for our 1D FFT transforms, increasing performance. * Padded reads/writes to global memory, decreasing bank conflicts and increasing performance on 2D transforms. * This release tested using the 8.841 runtime driver and the 2.4 APP SDK Fixed: * Failures have been seen attempting to queue work on the second GPU device on a multi GPU 5970 card on Linux. Known Issues: * It is recommended that users query for and explicitely create an intermediate buffer if clFFT requires one. If the library creates the intermediate buffer internally, a race condition may occur on freeing the buffer on lower end hardware. * Failures have been seen on graphics cards using R4550 (RV710) GPUs. * Test failures on CPU device for 32-bit systems (Windows/Linux) * It is recommended that windows users uninstall previous version of clFFT before installing newer versions. Otherwise, Add/Remove programs only removes the latest version. Linux users can delete the install directory. ____________ Version 1.0: * Initial release, available on all platforms Known Issues: * Failures have been seen attempting to queue work on the second GPU device on a multi GPU 5970 card on Linux. _____________________ Building the Samples: To install the Linux versions of clFFT, uncompress the initial download and then execute the install script. For example: tar -xf clFFT-${version}.tar.gz - This installs three files into the local directory, one being an executable bash script. sudo mkdir /opt/clFFT-${version} - This pre-creates the install directory with proper permissions in /opt if it is to be installed there (This is the default). ./install-clFFT-${version}.sh - This prints an EULA and uncompresses files into the chosen install directory. cd ${installDir}/bin64 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${OpenCLLibDir}:${clfftLibDir} - Export library dependencies to resolve all external linkages to the client program. The user can create a bash script to help automate this procedure. ./Client -h - Understand the command line options that are available to the user through the sample client. ./Client -iv - Watch for the version strings to print out; watch for 'Client Test *****PASS*****' to print out. The sample program does not ship with native build files. Instead, a CMake file is shipped, and users generate a native build file for their system. For example: cd ${installDir} mkdir samplesBin/ - This creates a sister directory to the samples directory that will house the native makefiles and the generated files from the build. cd samplesBin/ ccmake ../samples/ - ccmake is a curses-based cmake program. It takes a parameter that specifies the location of the source code to compile. - Hit 'c' to configure for the platform; ensure that the dependencies to external libraries are satisfied, including paths to 'ATI Stream SDK' and 'Boost'. - After dependencies are satisfied, hit 'c' again to finalize configure step, then hit 'g' to generate makefile and exit ccmake. make help - Look at the available options for make. make - Build the sample client program. ./clfft.Sample -iv - Watch for the version strings to print out; watch for 'Client Test *****PASS*****' to print out.