jCUDA provides access to CUDA for Java programmers, exploiting the full power of GPU hardware from Java based applications. Using jCUDA you can create cross-platform CUDA solutions, that can run on any operating system supported by CUDA without changing your code.
What do I need ?
Which are the magic steps?
1. Download and install all the requirements
Check if you have a CUDA-enabled graphic card. If you don’t have one, you need to buy it. Then install the CUDA drivers for your graphic card and the Java SDK.
Download the jCuda library and use it together with your favorite IDE (if you have one).
2. Write your Java application
You can write your Java application normally and access to your GPU through jCUDA library. However you need to learn how to use it. This example will illustrate how to detect available GPUs and how to execute your CUDA program on them.
Note that CUDA program is stored inside a folder called resources, and carful with the number of memory bytes occupied by pointers. You will find that size of pointer often corresponds to the bit-architecture of target machine. That is, if the compiler is for 32 bit architecture, then the pointer would occupy 32 bits (4 bytes), and if the compiler is for 64 bit architecture, then the pointer would occupy 64 bits(8 bytes).
3. Write your CUDA code
The following example is written in CUDA, which is very similar to C language. If you can’t understand this example, well, all I can tell you is: the result is equal to (src + amount) * (src + amount).
4. Compile your CUDA code
Before running the application, your CUDA code must be compiled to an assembly intermediate language, PTX and then assembled in the cubin file format.
Here is the simplified compilation process:

If you are using Windows, you can execute the following command.
If you prefer to use Ant. Use this script together with your favorite IDE.
4. Compile and run your Java application
Don't forget, GPUs are connected to the main computer processor by relatively slow connection, the bus. For this reason moving data on and off of the GPU is an expensive task if compared to performing calculation directly on the GPU. This can create critical bottlenecks for instance when one wants to perform a calculation on the CPU using some data, then perform further calculation on the GPU and then use the output of the GPU to use once more the CPU. The overhead introduced by data transfers through the bus can overwhelm the benefits of fast GPU computation.
What do I need ?
- Recent NVIDIA hardware with a CUDA driver
- Java SDK
- Eclispe (optional)
- jCUDA library
Which are the magic steps?
1. Download and install all the requirements
Check if you have a CUDA-enabled graphic card. If you don’t have one, you need to buy it. Then install the CUDA drivers for your graphic card and the Java SDK.
Download the jCuda library and use it together with your favorite IDE (if you have one).
2. Write your Java application
You can write your Java application normally and access to your GPU through jCUDA library. However you need to learn how to use it. This example will illustrate how to detect available GPUs and how to execute your CUDA program on them.
Note that CUDA program is stored inside a folder called resources, and carful with the number of memory bytes occupied by pointers. You will find that size of pointer often corresponds to the bit-architecture of target machine. That is, if the compiler is for 32 bit architecture, then the pointer would occupy 32 bits (4 bytes), and if the compiler is for 64 bit architecture, then the pointer would occupy 64 bits(8 bytes).
CUDA cuda = new CUDA(true);
int count = cuda.getDeviceCount();
if(count == 0){
System.out.println("No GPU devices found");
return;
}
System.out.println("Total number of devices: " + count);
for (int i = 0; i < count; i++) {
CUdevice dev = cuda.getDevice(i);
String name = cuda.getDeviceName(dev);
System.out.println("Name: " + name);
int version[] = cuda.getDeviceComputeCapability(dev);
System.out.println("Version: " + String.format("%d.%d", version[0], version[1]));
CUdevprop prop = cuda.getDeviceProperties(dev);
System.out.println("Clock rate: " +
prop.clockRate + " MHz");
System.out.println("Max threads per block: " +
prop.maxThreadsPerBlock);
}
/** Select 1st device */
cuda.getDevice(0);
/** Create a context (necessary) */
cuda.createContext();
/** Load the module */
File cubinFile = new File("resources", "sub_module.cubin");
cuda.loadModule(cubinFile.getAbsolutePath());
/** Get the function we want */
cuda.getModuleFunction("subtract");
/** Now we should allocate the necessary memory */
int memSize = Integer.SIZE / 8 * 64;
CUdeviceptr devicePtr = cuda.allocate(memSize);
int originalData[] = new int[64];
for (int i=0; i<originalData.length; i++)
originalData[i] = i;
/** Copy the original array to device */
cuda.copy(devicePtr, originalData, memSize);
/**
* Setup function parameters
* 1st Parameter is pointer to device memory
* Note on 64 bit platforms, each pointer consumes 8 bytes,
* on 32 bit only 4.
*/
int offset = 0;
int delta = 13;
cuda.setParameter(offset, devicePtr);
/** Change to 4 if you are using a 32 bits platform */
offset = 8;
cuda.setParameter(offset, delta);
/**
* A simple int scalar takes only 4 if 32 bits
* or 8 bytes if 64 bits.
*/
offset += Integer.SIZE / offset;
cuda.setParameterSize(offset);
/** Now, configure the execution configuration */
cuda.setFunctionBlockShape(originalData.length, 1, 1);
cuda.launch();
/** Wait for all operations to complete */
cuda.synchronizeStream(new CUstream(0));
/** Copy results back */
cuda.copy(originalData, devicePtr, memSize);
/** Verify results.... */
boolean correct = true;
for (int i=0; i<originalData.length; i++) {
if (originalData[i] != (i + delta)*(i + delta)) {
System.out.println("Error at " + i +":"+ originalData[i]);
correct = false;
}
}
if (correct)
System.out.println("Test passed");
else
System.out.println("Test failed");
/** Release resources */
cuda.free(devicePtr);
3. Write your CUDA code
The following example is written in CUDA, which is very similar to C language. If you can’t understand this example, well, all I can tell you is: the result is equal to (src + amount) * (src + amount).
extern "C" __global__ void subtract(int *src, int amount){
src[threadIdx.x] += amount;
src[threadIdx.x] *= src[threadIdx.x];
}
4. Compile your CUDA code
Before running the application, your CUDA code must be compiled to an assembly intermediate language, PTX and then assembled in the cubin file format.
Here is the simplified compilation process:

If you are using Windows, you can execute the following command.
nvcc.exe -cuda -I $(SDK)/common/inc resources/sub_module.cu
If you prefer to use Ant. Use this script together with your favorite IDE.
<?xml version="1.0" encoding="UTF-8"?> <project name="GPGPU" basedir="." default="all"> <target name="all"> <exec executable="nvcc.exe"> <arg value="resources/sub_module.cu" /> <arg value="--cubin" /> </exec> </target> </project>
4. Compile and run your Java application
Don't forget, GPUs are connected to the main computer processor by relatively slow connection, the bus. For this reason moving data on and off of the GPU is an expensive task if compared to performing calculation directly on the GPU. This can create critical bottlenecks for instance when one wants to perform a calculation on the CPU using some data, then perform further calculation on the GPU and then use the output of the GPU to use once more the CPU. The overhead introduced by data transfers through the bus can overwhelm the benefits of fast GPU computation.




0 comments:
Post a Comment