OpenCL

Docker (Mac)

MacのDockerで動かすには、VirtualBoxの設定が必要
% VBoxManage setextradata "boot2docker-vm" VBoxInternal/CPUM/SSE4.1 1
% VBoxManage setextradata "boot2docker-vm" VBoxInternal/CPUM/SSE4.2 1

Linux

ドライバ
sudo sh NVIDIA-Linux-x86_64-340.96.run

CUDA Toolkit 5.5
ドライバも含まれてる気がする
https://developer.nvidia.com/cuda-toolkit-55-archive

sudo dpkg -i cuda-repo-ubuntu1210_5.5-0_amd64.deb
sudo apt-get update
sudo apt-get install cuda

global_work_sizeを8にしたら動いた。16以上だと動かなかった

Tips

長時間処理しすぎるとOSがドライバを自動で再起動するらしい
https://devtalk.nvidia.com/default/topic/501409/cl_invalid_command_queue-error-on-clfinish-command-a-lot-of-operations-in-each-kernel-driver-crash/

ベンチマーク

-O2 plus32
done in 0.137091 ms.

-O2 M128i
done in 0.110865 ms.

GeForce GTX 260 (Linux)

% LD_LIBRARY_PATH=/home/nickle/dnetc516-linux-amd64-cuda22/lib ./dnetc
...
dnetc v2.9107-516-CTR-09122713 for CUDA 2.2 on Linux (Linux 3.5.0-21-generic).
...
[Mar 13 09:47:47 UTC] Automatic processor type detection found
                      a GeForce GTX 260 (27 MPs) processor.
[Mar 13 09:47:47 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[Mar 13 09:47:47 UTC] RC5-72: Loaded D4:ABB7FE31:00000000:64*2^32
[Mar 13 09:47:47 UTC] RC5-72: 24 packets (1511.00 stats units) remain in
                      buff-in.r72
[Mar 13 09:47:47 UTC] RC5-72: 0 packets are in buff-out.r72
[Mar 13 09:47:47 UTC] 1 cruncher has been started.
.....10%.....20%.....30%.....40%.....50%.....60%.....70%.....80%.....90%....100
[Mar 13 10:07:28 UTC] RC5-72: Completed D4:ABB7FE31:00000000 (64.00 stats units)
                      0.00:19:41.01 - [232,746,782 keys/s]
[Mar 13 10:07:28 UTC] RC5-72: Loaded D4:ABB7FE71:00000000:64*2^32
[Mar 13 10:07:28 UTC] RC5-72: Summary: 1 packet (64.00 stats units)
                      0.00:19:41.01 - [232.75 Mkeys/s]
[Mar 13 10:07:28 UTC] RC5-72: 23 packets (1447.00 stats units) remain in
                      buff-in.r72
                      Projected ideal time to completion: 0.07:14:06.00
[Mar 13 10:07:28 UTC] RC5-72: 1 packet (64.00 stats units) is in
                      buff-out.r72

Intel HD Graphics 4000 (Mac)

...
dnetc v2.9111-520-CTR-14081118 for OpenCL on Mac OS X (Darwin 13.4.0).
...
[Mar 13 09:50:15 UTC] Automatic processor detection found 1 processor.
[Mar 13 09:50:16 UTC] Automatic processor type detection did not
                      recognize the processor (tag: "HD Graphics 4000 ")
[Mar 13 09:50:16 UTC] RC5-72: Running micro-bench to select fastest core...
[Mar 13 09:50:37 UTC] RC5-72: using core #2 (CL 2-pipe).
...
[Mar 13 11:54:19 UTC] RC5-72: Completed D4:9B497480:00000000 (64.00 stats units)
                      0.02:03:42.25 - [37,034,289 keys/s]
[Mar 13 11:54:19 UTC] RC5-72: Loaded D4:9B4974C0:00000000:64*2^32
[Mar 13 11:54:19 UTC] RC5-72: Summary: 1 packet (64.00 stats units)
                      0.02:03:42.25 - [37.03 Mkeys/s]