KITE - Khronos Initiative for Training and Education

Khronos Initiative for Training and Education is an initiative to increase communication and cooperation between INDUSTRY and EDUCATORS

KITE’s goal is to accelerate development of educational materials for Khronos APIs.

OpenCL KITE

Resources

Commercial and Open Source Implementations

PGCL - PGI OpenCL Compiler for Multi-Core ARM

PGCL™ is an OpenCL™ framework for compiling and running OpenCL 1.1 embedded profile applications on the ST-Ericsson NovaThor™ U8500 and follow-on platforms using a single ARM core as the OpenCL host and multiple ARM cores as an OpenCL computing device.

View PGCL - PGI OpenCL Compiler for Multi-Core ARM

Intel SDK for OpenCL Applications

OpenCL Development Environments for Intel Core

View Intel SDK for OpenCL Applications

Intel SDK for OpenCL Applications XE 2013

OpenCL Development Environments for Xeon Processors

View Intel SDK for OpenCL Applications XE 2013

Mali OpenCL SDK

The Mali OpenCL SDK provides developers a framework and series of samples for developing OpenCL 1.1 application on ARM Mali bases platforms such as the Mali-T600 family of GPUs. The samples cover a wide range of uses cases that utilize the Mali GPU to achieve a significant improvement in performance when compared to running on the CPU alone.

View Mali OpenCL SDK

Beignet: OpenCL Implementation for Ivy Bridge on Linux

Beignet is an open source implementaion of the OpenCL specification - a generic compute oriented API. This code base contains the code to run OpenCL programs on Intel GPUs which bsically defines and implements the OpenCL host functions required to initialize the device, create the command queues, the kernels and the programs and run them on the GPU.

View Beignet: OpenCL Implementation for Ivy Bridge on Linux

Portable OpenCL (pocl)

Portable OpenCL (pocl) is a MIT-licensed open source implementation of the OpenCL standard which can be easily adapted for new targets and devices, both for homogeneous CPU and heterogenous GPUs/accelerators.

pocl uses Clang as an OpenCL C frontend and LLVM for the kernel compiler implementation, and as a portability layer. Thus, if your desired target has an LLVM backend, it should be able to get OpenCL support easily by using pocl.

The goal is to accomplish improved performance portability using a kernel compiler that can generate multi-work-item work-group functions that exploit various types of parallel hardware resources: VLIW, superscalar, SIMD, SIMT, multicore, multithread ...

In addition to providing a portable open source implementation of OpenCL, another goal of the project is to serve as a research platform for issues in parallel programming on heterogeneous platforms.

View Portable OpenCL (pocl)

JavaCL - Java OpenCL bindings and utilities

JavaCL wraps OpenCL in a nice Java API with goodies, under the BSD-licensed.

View JavaCL - Java OpenCL bindings and utilities

OpenCL FFT

The Disccrete Fourier Transform (DFT) is a Fourier Analysis function that computes the frequency domain representation of a given input. The DFT has an enormous amount of applications, including spectral analysis, data compression, multiplication of polynomials and computation of remaining fatigue life of materials.

CMSoft presents an easy-to-use powerful tool to compute the DFT using the Fast Fourier Transform algorithm using OpenCL, both single-precision and double-precision. Source code containing examples is available.

View OpenCL FFT

Java Bindings to OpenCL

JOCL enables applications running on the JVM to use OpenCL for massively parallel, high performance computing tasks, executed on heterogeneous hardware (GPUs, CPUs, FPGAs etc) in a platform independent manner.

View Java Bindings to OpenCL

OpenCL Support Vector Machine

Support Vector Machine (SVM) is a statistical learning tool considered to be the state-of-the art classifier for many applications today, including medical research and text categorization.

CMSoft brings an OpenCL accelerated SVM implementation that can be used for general-purpose classification. Source code is provided showing classification of the MNIST handwritten database.

View OpenCL Support Vector Machine

OpenCL Marching Cubes

Marching Cubes is an algorithm used in a very wide range of applications, including:

  • Medical visualizations such as CT and MRI scan images;
  • Special 3D effects and 3D modelling of metaballs or metasurfaces:
  • Analysis of oil reservoirs in the oil and gas industry;
  • Reconstitution of surfaces whose data has been acquired through seismic methods.

CMSoft brings a versatile and useful tool, Marching Cubes, adapted to GPU acceleration using OpenCL. Sample source code is available. OpenCLTemplate Marching Cubes is another resource made available to users who want to have easy access to GPU accelerated mathematical tools.

View OpenCL Marching Cubes

ODE system solving with OpenCL

Differential equations are crucial to all exact sciences, such as engineering, physics, chemistry and even economics. There packages use GPUs to compute solutions to problems such as solving linear systems and computing FFT. This work covers an easy-to-use ordinary differential equation system solver for scientific applications and games. Examples include calculating trajectories and collision of particles in game engines, electron-proton interactions, gravitational calculations, dynamic modeling of deformable bodies and many more.

View ODE system solving with OpenCL

CLyther - an OOP extension to OpenCL language definition

CLyther, is a Python tool similar to Cython. CLyther is a python language extension that makes writing OpenCL code as easy as Python itself. CLyther currently only supports a subset of the Python language definition but adds many new features to OpenCL. CLyther exposes both the OpenCL C library as well as the OpenCL language to python.

CLyther Features:

  • Fast prototyping of OpenCL code.
  • Create OpenCL code using the Python language definition.
  • Strong OOP programming in OpenCL code.
  • Passing functions as arguments to kernel functions.
  • Python emulation mode of OpenCL code.
  • Fancy indexing of arrays
  • Device memory management
  • Dynamic compilation at runtime

View CLyther - an OOP extension to OpenCL language definition

Ruby-OpenCL

Ruby-OpenCL is a ruby binding of OpenCL providing classes for OpenCL programing on the host. Currently works with ATI Stream SDK 2.0 beta.

View Ruby-OpenCL

OpenCL for PLT Scheme

A complete binding of OpenCL for PLT Scheme.

View OpenCL for PLT Scheme

PyOpenCL

PyOpenCL is a complete, object-oriented language binding of OpenCL to Python. It has full documentation available and is licensed under the liberal MIT license.

View PyOpenCL

The Open Toolkit library

Cross-platform OpenGL, OpenGL ES, OpenAL and OpenCL bindings for .Net/Mono. Compatible with Windows, Linux and Mac OS X and usable by all .Net languages (C#, VB.Net, C++/CLI, ...)

View The Open Toolkit library

Frameworks & Libraries

CLOGS

CLOGS is a library for higher-level operations on top of the OpenCL C++ API. It is designed to integrate with other OpenCL code, including synchronization using OpenCL events. Currently only two operations are supported: radix sorting and exclusive scan. Radix sort supports all the unsigned integral types as keys, and all the built-in scalar and vector types suitable for storage in buffers as values. Scan supports all the integral types. It also supports vector types, which allows for limited multi-scan capabilities.

View CLOGS

OpenCLIPP

OpenCL Integrated Performance Primitives - A library of optimized OpenCL image processing functions

View OpenCLIPP

OpenCL.NET

Cross-platform .NET interface to OpenCL 1.2 with full API coverage.

View OpenCL.NET

CLFORTRAN

Pure Fortran interface to OpenCL and fully compatible with C API.

View CLFORTRAN

GoCL

GLib/GObject wrapper for OpenCL

View GoCL

Altera SDK for OpenCL

An OpenCL implementation for Altera FPGA devices.

View Altera SDK for OpenCL

OpenCL binding for Erlang

View OpenCL binding for Erlang

clFFT Library

clMATH is a software library containing FFT and BLAS functions written in OpenCL. clFFT library is a software library containing FFT functions written in OpenCL

View clFFT Library

clBLAS library

clMATH is a software library containing FFT and BLAS functions written in OpenCL. clBLAS library is a software library containing BLAS functions written in OpenCL

View clBLAS library

Bolt C++ Template Library

Bolt provides C++ developers with an STL compatible library of high level constructs for accelerating data parallel applications. Code written using STL or other C++ template libraries (example: TBB) can be converted to Bolt in minutes.The Bolt library contains accelerated kernel code for many useful functions like Sort, Scan, Reduce and others, so you won’t need to learn OpenCL™ or C++ AMP APIs to get the benefits of heterogeneous acceleration

View Bolt C++ Template Library

Intel IPP

The preview release package is currently focused delivering Intel® Graphics support for advanced image processing and computer vision function

View Intel IPP

cf4ocl - C Framework for OpenCL

The C Framework for OpenCL (cf4ocl) is a pure C99 set of libraries and utilities for speeding-up the development and benchmarking of OpenCL programs.

View cf4ocl - C Framework for OpenCL

OpenCLIntegration

An OpenCL wrapper class and a SCons build library to simplify integration of OpenCL code in C++, C, Fortran and Perl

View OpenCLIntegration

ProjCL

Map projection and geodesic library

View ProjCL

AMUSE

AMUSE, framework for large-scale simulations of dense stellar systems.

View AMUSE

VirtualCL Cluster Platform

Virtual OpenCL (VCL) is a cluster platform that allows unmodified OpenCL applications to transparently utilize many OpenCL devices in a cluster, as if all the devices are on the local computer.

View VirtualCL Cluster Platform

amgcl - generic algebraic multigrid (AMG) hierarchy builder

amgcl is a simple and generic algebraic multigrid (AMG) hierarchy builder (and a work in progress). The constructed hierarchy may be used as a standalone solver or as a preconditioner with some iterative solver. Several iterative solvers are provided, and it is also possible to use generic solvers from other libraries, e.g. ViennaCL.

View amgcl - generic algebraic multigrid (AMG) hierarchy builder

AMD - Accelerated Parallel Processing (APP) SDK

AMD APP technology is a set of advanced hardware and software technologies that enable AMD graphics processing cores (GPU), working in concert with the system’s x86 cores (CPU), to execute heterogeneously to accelerate many applications beyond just graphics.

View AMD - Accelerated Parallel Processing (APP) SDK

RaijinCL

RaijinCL is a library for matrix operations for OpenCL.

View RaijinCL

SimpleOpenCL

SimpleOpenCL is a library written in ANSI C and born in the needs of scientific research test development. It has been originated while developing different OpenCL codes for Linux and Apple test machines, with single device performance and portability goals in mind. The main goal of SimpleOpenCL has been reducing the code needed to run the experiments on the GPU, but also supports managing CPU devices. As this is an open source project, we welcome any contribution, from code correction and functionality suggestions/contributions to documentation or even contribution system proposals.

View SimpleOpenCL

OpenCL data parallel primitives library

clpp is an OpenCL Data Parallel Primitives Library. It is a library of data-parallel algorithm primitives such as parallel-prefix-sum (“scan”), parallel sort and parallel reduction. Primitives such as these are important building blocks for a wide variety of data-parallel algorithms, including sorting, stream compaction, and building data structures such as trees and summed-area tables.

View OpenCL data parallel primitives library

AccelerEyes ArrayFire math library

ArrayFire is a GPU software acceleration library. GPU acceleration is awesome, but writing fast CUDA or OpenCL code is hard. With ArrayFire, you get access to hundreds of functions already optimized for speed by top GPU computing experts. It is easy to integrate into your C, C++, or Fortran application. You can use ArrayFire with any CUDA code on NVIDIA GPUs or any OpenCL code on AMD, Intel, or other devices.

View AccelerEyes ArrayFire math library

MAGMA linear algebra library

MAGMA linear algebra library

View MAGMA linear algebra library

Accelerated Parallel Processing Math Libraries (APPML)

AMD Accelerated Parallel Processing Math Libraries are software libraries containing FFT and BLAS functions written in OpenCL and designed to run on AMD GPUs. The libraries support running on CPU devices to facilitate debugging and multicore programming.

View Accelerated Parallel Processing Math Libraries (APPML)

VexCL

VexCL is a vector expression template library for OpenCL. It has been created for ease of OpenCL developement with C++. VexCL strives to reduce amount of boilerplate code needed to develop OpenCL applications. The library provides convenient and intuitive notation for vector arithmetic, reduction, and sparse matrix-vector multiplication. Multi-device and even multi-platform computations are supported.

View VexCL

SnuCL OpenCL framework ( freely available )

SnuCL is an OpenCL framework and freely available, open-source software developed at Seoul National University. It naturally extends the original OpenCL semantics to the heterogeneous cluster environment. The target cluster consists of a single host node and multiple compute nodes. They are connected by an interconnection network, such as Gigabit and InfiniBand switches. The host node contains multiple CPU cores and each compute node consists of multiple CPU cores and multiple GPUs. For such clusters, SnuCL provides an illusion of a single heterogeneous system for the programmer. A GPU or a set of CPU cores becomes an OpenCL compute device. SnuCL allows the application to utilize compute devices in a compute node as if they were in the host node.

View SnuCL OpenCL framework ( freely available )

libCL

libCL is an open-source parallel algorithm library written in C++ and OpenCL, released under the Apache 2.0 license. Based on a thin layer of wrapper classes for OpenCL and OpenGL are implementations of parallel algorithms ranging from simple primitives such as sorting, searching and algebra to complex systems of algorithms for computational research and visualization. libCL emerged out of OpenCL Studio, and as such integrates well with the development environment and its rich prototyping and visualization capabilities.

View libCL

OpenCL/GL Framework

Interoperation between OpenCL and OpenGL allows programmers to efficiently perform complex manipulation of data directly in the GPU memory. CMSoft brings to developers the new GLRender tool in OpenCLTemplate that automates the creation of an OpenGL scene coupled with a derived OpenCL context.

It is possible to create and display buffer objects in the OpenGL scene as well as manipulate them using OpenCL interoperation with very little effort using a pre-configured OpenCL environment. Two source codes are provided to demonstrate the framework capabilities: one that draws a Mandelbrot fractal set and another to show the capabilities of the 3D mouse in a CL/GL shared environment.

View OpenCL/GL Framework

ViennaCL - Linear Algebra and Iterative Solvers using OpenCL

The Vienna Computing Library (ViennaCL) is a scientific computing library written in C++ and based on OpenCL. It allows simple, high-level access to the vast computing ressources available on parallel architectures such as GPUs and is primarily focused on common linear algebra operations (BLAS level 1 and 2) and the solution of large systems of equations by means of iterative methods. In ViennaCL 1.0.x, the following iterative solvers are implemented:

* Conjugate Gradient (CG)
* Stabilized BiConjugate Gradient (BiCGStab)
* Generalized Minimum Residual (GMRES)

An optional ILU preconditioner can be used, which is in ViennaCL 1.0.x precomputed on the CPU and may thus not lead to overall performance gains. Under the hood, ViennaCL uses OpenCL for accessing and executing code on compute devices. Therefore, ViennaCL is not tailored to products from a particular vendor and can be used on many different platforms.

View ViennaCL - Linear Algebra and Iterative Solvers using OpenCL

OpenCL .Net

OpenCL.net doubles as simple bindings to the flat OpenCL 1.0 API, usable from all .net languages, and a higher level framework that simplifies some aspects of OpenCL programming.

Some examples included to jump start development.

MIT open source licensed for maximum flexibility.

View OpenCL .Net

Tutorials, Technical Whitepapers and How to Guides

OpenCL Optimization Guide

The Intel SDK for OpenCL Applications - Optimization Guide describes the optimization guidelines of OpenCL applications targeting the 3rd Generation Intel Core Processor across both Intel Processors (CPU) and Intel Processor Graphics (GPU)

View OpenCL Optimization Guide

Case Study: Processing Kinect data with OpenCL

Interactive technologies have become extremely important in a world where busy users demand intuitive devices which demand little to no learning time. However, it is necessary to process large amounts of data in real time in order to implement such intelligent systems.

CMSoft brings a tutorial on how to create a C# framework to capture Microsoft Kinect sensor data and transfer it to an OpenCL GPU Device, thus enabling the development of software that can potentially process Kinect data hundreds of times faster when compared to pure CPU processing.

View Case Study: Processing Kinect data with OpenCL

VirtualCL Cluster Platform

Virtual OpenCL (VCL) is a cluster platform that allows unmodified OpenCL applications to transparently utilize many OpenCL devices in a cluster, as if all the devices are on the local computer.

View VirtualCL Cluster Platform

Software Occlusion Culling

This article details an algorithm and associated sample code for software occlusion culling. The technique divides scene objects into occluders and occludees, and culls occludees based on a depth comparison with the occluders, software rasterized to the depth buffer. The sample code is optimized with SSE* and multi-threading, and uses frustum culling to achieve an 8X performance speedup compared to a non-culled display of the sample scene.

View Software Occlusion Culling

OpenCL “Hello World” Tutorial

OpenCL™ is a young technology, and, while a specification has been published, there are currently few documents that provide a basic introduction with examples. This article helps make OpenCL™ easier to understand and implement.

View OpenCL “Hello World” Tutorial

Case Study: heat transfer simulation using CLGL interop

Heat transfer and, more generally, parabolic partial differential equations are a very important class of problems in physics and mathematics. Analytic solutions aren’t available for real-world problems where initial and boundary conditions can be arbitrary.

CMSoft presents video tutorials and source code with detailed information about how to create the CLGL shared context, use OpenCL to accelerate the simulation of heat transfer and share an OpenGL texture to present the results to the user.

View Case Study: heat transfer simulation using CLGL interop

Levering GPGPU and OpenCL Technologies for Natural User Interfaces

Natural User Interfaces (NUIs) are vastly more complex than traditional graphical user interfaces and require large computational power to provide an immersive experience for the user. We will examine NUI improvements and design challenges over traditional Graphical User Interfaces and the associated computational complexities encountered during implementation. In order to maximize the hardware’s capability for a NUI, making use of available Graphics Processing Unit (GPU) cycles to complement the Central Processing Unit (CPU) is crucial. This process of using GPUs as General Purpose Graphics Processing Units (GPGPU) has been traditionally limited to desktop computing platforms, but as portable devices are becoming more powerful, encompassing multiple-core CPU and GPU elements, implementation becomes crucial for efficient use of the hardware’s capability. When effectively utilizing both the GPU and CPU cycles a smooth, fluid experience can be maintained, as well as optimization for best case power consumption. This can be very challenging due to platform constraints, differing CPU / GPU architectures, implementation complexity and cost of integration. We will examine a few aspects of a NUI that could benefit from GPGPU computing and the associated implementation benefits.

View Levering GPGPU and OpenCL Technologies for Natural User Interfaces

OpenCL Tutorial Series

This tutorial series from Rob Farber and The Code Project focuses on bringing knowledgeable C and C++ programmers quickly up to speed so they can work with OpenCL to write efficient portable parallel programs.

View OpenCL Tutorial Series

OpenCL accelerated extraction and classification of Haar features with color

Computer vision has become pervasive in our modern society, with applications ranging from robotic vision, measurement of position, face identification and recognition, automatic detection of failures in industry and many more. Haar features are commonly used to describe objects.

CMSoft brings a study on how to use OpenCL to accelerate the extraction of color Haar-like wavelet features from color images. This process involves OpenCL acceleration of the computation of the image integral, generation of regions of sliding window in the target picture and preparing data structures to receive all features.

A complete color Haar feature extraction software, including source code, is available.

View OpenCL accelerated extraction and classification of Haar features with color

Anjuta Project Wizards for AMD, NVidia and Intel OpenCL SDK

Aiming at increasing the OpenCL developing, I created some wizards to start up an OpenCL application project using the SDK from NVidia, AMD or Intel. I’ve used Anjuta DevStudio on Linux. There is a lack of OpenSource IDE and tools to develop GPU applications, these wizards help us to create OpenCL applications based on templates and, thus, to decrease the learning curve.

View Anjuta Project Wizards for AMD, NVidia and Intel OpenCL SDK

OpenCL quickstart tutorials

Quick start OpenCL tutorial with lots of examples.

View OpenCL quickstart tutorials

CMSoft Image2D Tutorial

Image2D variables play a very important role in OpenCL because they use the texture caching and samplers of the GPU and thus are very suitable to store large arrays and/or data that has to be accessed very often.

CMSoft Image2D tutorial covers how to manipulate Image2Ds and how to reinterpret a vector as a Image2D variable, with useful C99 source code to retrieve vector data from a Image2D.

View CMSoft Image2D Tutorial

OpenCL Getting Started Tutorial

Part 1

This tutorial series is aimed at developers trying to learn OpenCL from the bottom up, with a focus on practicality (i.e. I’m still learning, I’m sharing what I’ve found to work).

Part 1.5

This part is a reworking of my first tutorial using the OpenCL C++ Bindings.

Part 2

This installment introduces OpenCL context sharing with OpenGL.

View OpenCL Getting Started Tutorial

OpenCL / GL Interop Tutorial

Using OpenCL to manipulate OpenGL objects has important advantages: the GPU is usually faster and data transfer from Host memory to Device memory is kept to a minimum.

CMSoft OpenCL/GL interop tutorial shows detailed implementation of circular wave interference simulation using CL/GL interop, including commented source code available for download.

View OpenCL / GL Interop Tutorial

OpenCL Tutorial

OpenCL is a great open standard which is going to become the future of parallel processing.

This tutorial shows basics of setting up OpenCL with a pre-made initializer OpenCLTemplate and covers from very basic vector-sum topics to a real-life OpenCL sample application.

This tutorial presents OpenCL C99 sample code.

View OpenCL Tutorial

GPGPU Programming (OpenCL)

This sites primary focus is GPGPU programming. It contains an overview describing the basic concepts, code examples (OpenCL, CUDA), links to useful OpenCL and CUDA resources, links to GPU based products, and news aggregation of GPGPU tools.

View GPGPU Programming (OpenCL)

OpenCL Tutorial - Introduction - Fundamentals

  1. Episode 1 - OpenCL Tutorial - Introduction to OpenCL
  2. Episode 2 - OpenCL Fundamentals
  3. Episode 3 - Building an OpenCL Project
  4. Episode 4 - Memory Access and Layout
  5. Episode 5 - Questions and Answers to Episode Four
  6. Episode 6 - Shared Memory Kernel Optimization

View OpenCL Tutorial - Introduction - Fundamentals

Introduction to OpenCL tutorial

Introductory tutorial showing how to write the ‘Hello World’ in OpenCL.

View Introduction to OpenCL tutorial

Presentations & Videos

Intel Launches SDK for OpenCL Applications at SIGGRAPH 2012

SIGGRAPH 2012 was the launch pad for computer graphics and animation development, including the beta release of the Intel® SDK for OpenCL. We caught up with Ryan Tabrah from the Intel

View Intel Launches SDK for OpenCL Applications at SIGGRAPH 2012

Webinar: Getting Started with Intel® SDK for OpenCL* Applications

Join Arnon Peleg, Intel® SDK for OpenCL* Applications Product Marketing Manager, to learn how OpenCL and the Intel® SDK for OpenCL Applications can help you utilize the full resources of Intel® processors and Intel® HD Graphics

View Webinar: Getting Started with Intel® SDK for OpenCL* Applications

Example Code

cf4ocl - C Framework for OpenCL

The C Framework for OpenCL (cf4ocl) is a pure C99 set of libraries and utilities for speeding-up the development and benchmarking of OpenCL programs.

View cf4ocl - C Framework for OpenCL

Intel® SDK for OpenCL* Applications Samples

The samples provide source code examples, accompanied with whitepapers to help you get started with Intel® SDK for OpenCL* Applications

View Intel® SDK for OpenCL* Applications Samples

Mali OpenCL SDK

The Mali OpenCL SDK provides developers a framework and series of samples for developing OpenCL 1.1 application on ARM Mali bases platforms such as the Mali-T600 family of GPUs. The samples cover a wide range of uses cases that utilize the Mali GPU to achieve a significant improvement in performance when compared to running on the CPU alone.

View Mali OpenCL SDK

Sample Code to Showcase load balancing compute across both CPU and Graphics

The NBody sample by Intel features a load balancing approach to compute an NBody simulation across both the CPU and Intel HD Graphics. This sample illustrates how to maximize the efficiency of the processor by being able to use both the CPU and Intel HD Graphics simultaneously on a platform. The end result is not only being the sum of the performance on both devices, but also largely improving application power performance on platforms such as Ultrabook. Source code is available and is accompanied with graphics visualization of the job distribution between the devices.

View Sample Code to Showcase load balancing compute across both CPU and Graphics

Software Occlusion Culling

This article details an algorithm and associated sample code for software occlusion culling. The technique divides scene objects into occluders and occludees, and culls occludees based on a depth comparison with the occluders, software rasterized to the depth buffer. The sample code is optimized with SSE* and multi-threading, and uses frustum culling to achieve an 8X performance speedup compared to a non-culled display of the sample scene.

View Software Occlusion Culling

OpenCL accelerated extraction and classification of Haar features with color

Computer vision has become pervasive in our modern society, with applications ranging from robotic vision, measurement of position, face identification and recognition, automatic detection of failures in industry and many more. Haar features are commonly used to describe objects.

CMSoft brings a study on how to use OpenCL to accelerate the extraction of color Haar-like wavelet features from color images. This process involves OpenCL acceleration of the computation of the image integral, generation of regions of sliding window in the target picture and preparing data structures to receive all features.

A complete color Haar feature extraction software, including source code, is available.

View OpenCL accelerated extraction and classification of Haar features with color

floatCL

A 2 simple openCL projects that make use of the GPU to find the largest element in an array of floats in O(1) and sort a list O(n). Will be useful for beginners to understand openCL API and writing code for SIMD execution.

View floatCL

OpenCL Color Tracking

Tracking a set of colors in a video is a first approximation and initial guess for many applications. In fact, determining what parts of an image belong to skin, for example, is very important to track faces or hands. CMSoft’s color tracking case study presents a technique that is robust to motion-blur and that can perform real-time tracking thanks to OpenCL acceleration. Source code is provided showing how to implement a flashlight mouse, i.e., how to use the webcam and a flashlight to perform mouse movement and clicking.

View OpenCL Color Tracking

Semaphors using Atomics

CMSoft C99 Atomics tutorial covers important aspects of OpenCL atomic functions and shows a practical easy-to-use semaphor implementation through atomic exchange function (source code available).

Atomic operations are a very important aspect of parallel processing and synchronization. Among other uses, they are really useful to manage shared resources and create semaphors.

View Semaphors using Atomics

Real time filtering with OpenCL

This OpenCL example code shows how to filter images without using OpenCL extensions, thus making it suitable for use with any GPU that supports OpenCL while still being much faster than non-GPU accelerated implementations.

Additionally, the source code shows how to execute real time webcam image filtering with a 7x7 filter.

View Real time filtering with OpenCL

OpenCL low poly collision detection

As part of CMSoft OpenCL tutorial, this source code example shows an implementation of low polygon collision detection algorithm suitable for engineering assembly analises.

OpenGL and Lab3D are used to display the 3D models and OpenCL C99 source code is presented and made available for download.

View OpenCL low poly collision detection

Utilities & Projects

AMD CodeXL

AMD CodeXL is a comprehensive tool suite that enables developers to harness the benefits of AMD CPUs, GPUs and APUs. It includes powerful GPU debugging, comprehensive GPU and CPU profiling, and static OpenCL™ kernel analysis capabilities, enhancing accessibility for software developers to enter the era of heterogeneous computing

View AMD CodeXL

OpenCL CodeBench

OpenCL code generator and Eclipse editor plugin

View OpenCL CodeBench

The Rodinia Benchmark Suite, version 2.2

The Rodinia Benchmark Suite is designed for heterogeneous computing infrastructures with OpenCL, OpenMP and CUDA implementations.

View The Rodinia Benchmark Suite, version 2.2

ClusterChimps OpenCL Compiler Tools

OCLTools is a powerful, yet compact, suite of tools that provides developers with more alternatives to kernel compilation. OCLTools enables you to eliminate costly kernel compilation time from the runtime of your application.

View ClusterChimps OpenCL Compiler Tools

OCL-MLA

OCL-MLA is exactly what its name implies: a mid-level set of abstractions to make OpenCL development easier.

View OCL-MLA

The Static Kernel Analyzer (SKA)

The static kernel analyzer (SKA) combines a static, linear pipeline simulator (similar to the IBM spu_timing tool) with architectural heuristics to model in-order instruction issue and pipeline behavior. Pipeline simulations take as input the intermediate representation (IR) from Low-Level Virtual Machine (LLVM). LLVM and LLVM IR are adopted standards in the HPC community, allowing SKA to support a wide range of source inputs. SKA also has support for the upcoming OpenCL Standard Portable Intermediate Representation (SPIR), which is based on LLVM IR version 3.1.

View The Static Kernel Analyzer (SKA)

OpenCL Installable Client Driver (ICD) Loader

The OpenCL ICD extension (cl_khr_icd) allows multiple implementations of OpenCL to co-exist on the same system. The OpenCL ICD Loader Library allows applications to choose a platform from the list of installed platforms and dispatches OpenCL API calls to the underlying implementation.

Source code for the ICD loader library is available in the Khronos Registry. Consult LICENSE.txt in the tarball for full terms and conditions.

View OpenCL Installable Client Driver (ICD) Loader

Acelera ACEMD

ACEMD is a production bio-molecular dynamics (MD) software specially optimized to run on graphics processing units (GPUs) on NVIDIA graphics cards. ACEMD is the world’s fastest MD engine for a single workstation. ACEMD reads CHARMM/NAMD and AMBER input files on a simple, yet powerful configuration interface. ACEMD is the computational engine behind one of the largest distributed computing project worldwide GPUGRID. Thousands of MD simulations are run daily which makes ACEMD an extremely reliable code. ACEMD is compatible with CUDA and OpenCL, the new standard framework for parallel and high-performance computing over different architectures.
Follow this link to know more about it.

View Acelera ACEMD

Computing Language Utility (CLU)

The Computing Language Utility (CLU) is a lightweight API designed to help programmers explore, learn, and rapidly prototype programs with OpenCL.  This API reduces the complexity associated with initializing OpenCL devices, contexts, kernels and parameters, etc. while preserving the ability to drop down to the lower level OpenCL API at will when programmers wants to get their hands dirty. The CLU release includes an open source implementation along with documentation and samples that demonstrate how to use CLU in real applications.  It has been tested on Windows 7 with Visual Studio.

View Computing Language Utility (CLU)

OpenCL accelerated extraction and classification of Haar features with color

Computer vision has become pervasive in our modern society, with applications ranging from robotic vision, measurement of position, face identification and recognition, automatic detection of failures in industry and many more. Haar features are commonly used to describe objects.

CMSoft brings a study on how to use OpenCL to accelerate the extraction of color Haar-like wavelet features from color images. This process involves OpenCL acceleration of the computation of the image integral, generation of regions of sliding window in the target picture and preparing data structures to receive all features.

A complete color Haar feature extraction software, including source code, is available.

View OpenCL accelerated extraction and classification of Haar features with color

OpenCL Compiler Tools (OCLTools)

OCLTools is a powerful, yet compact, suite of tools that provides developers with more alternatives to kernel compilation. OCLTools enables you to eliminate costly kernel compilation time from the runtime of your application. With OCLTools developers can embed the source code of their kernels (clear text or encrypted) directly into their program binaries eliminating the need to distribute kernel source code in the open while still maintaining the flexibility of runtime compilation. Not only can you embed source code into your OpenCL binaries but you can embed precompiled kernels as well effectively eliminating the additional kernel compilation overhead from the run time of your application.

OCLTools comes with an offline OpenCL compiler (oclcc), ELF file generator (oclelf), encryption tool (oclcrypt), and utility library to help streamline the OpenCL kernel compilation process. “The ClusterChimps Guide to Offline OpenCL Compiling and Linking” not only serves as a reference for the tools but it also walks you step by step through each usecase with example code showing you how it’s done.

View OpenCL Compiler Tools (OCLTools)

3D Picture Viewer and Converter

A very promising trend in the field of photography is the possibility of shooting stereoscopic pairs of pictures for vivid, realistic 3D visualization by using 3D cameras. However, few people have the special equipment to visualize these pictures or software to easily manipulate and convert them to more popular formats.

CMSoft Stereoscopic Picture Editor and Converter is a tool designed to view 3D photographs using OpenGL to render the stereo pair in an animated form called “wiggle stereo”, with zoom and crop capabilities. Advanced users can also create their own custom filters in C language using OpenCL technology.

OpenCL source code used in the filters and sample .MPO images are available.

View 3D Picture Viewer and Converter

OpenCL Environment

A series of utilities aimed at making OpenCL easier to use. Includes clCompiler which generates both binary outputs and precompiled headers which can be used in conjunction with clEnvironment. clQuery allows you to print all known information about a OpenCL data type. clPid, clYUV clImgFilter are all examples of how to use the utilities to create a compile time kernel make it a dependency in you makefiles and then use the clEnvironment to call your kernel.

View OpenCL Environment

Lab3D - 3D Laboratory

Lab3D is a 3D Laboratory project that uses OpenCL and OpenGL to display and manipulate 3D models created from regular 3D files and mathematical equations.

Some of Lab3D features are OBJ and DXF (AutoCAD) models loading, dynamic and static 3D models creation, stereoscopic visualization and WiiMote interaction.

Buffer objects are used to accelerate animations in systems that support it.

View Lab3D - 3D Laboratory

OpenCL Studio

OpenCL Studio is a development environment for high performance computing and visualization using OpenCL and OpenGL. The editor hides much of the complexity of the underlying APIs while providing flexibility via an interactive scripting language. Integrated source code editors for OpenCL, GLSL, and Lua, as well as a toolbox of 2D user interface widgets and an extensible plug-in architecture provide a powerful development framework for a wide range of high performance computing applications.

View OpenCL Studio

OpenCL Kernel Compiler

A compiler for OpenCL Kernel files designed to be used during OpenCL application development.

The use of this tool alleviates the need for building compiler diagnostic message retrieval code into applications that use OpenCL. It allows developers to spot compilation errors during source builds instead of at run-time.

View OpenCL Kernel Compiler