差分表示

  • 最後の更新で追加された行はこのように表示します。
  • 最後の更新で削除された行はこのように表示します。

* OpenCL

** Docker (Mac)
MacのDockerで動かすには、VirtualBoxの設定が必要
 % VBoxManage setextradata "boot2docker-vm" VBoxInternal/CPUM/SSE4.1 1
 % VBoxManage setextradata "boot2docker-vm" VBoxInternal/CPUM/SSE4.2 1

** Linux
ドライバ
 sudo sh NVIDIA-Linux-x86_64-340.96.run

CUDA Toolkit 5.5
ドライバも含まれてる気がする
https://developer.nvidia.com/cuda-toolkit-55-archive
 sudo dpkg -i cuda-repo-ubuntu1210_5.5-0_amd64.deb
 sudo apt-get update
 sudo apt-get install cuda

global_work_sizeを8にしたら動いた。16以上だと動かなかった

** Tips
長時間処理しすぎるとOSがドライバを自動で再起動するらしい
https://devtalk.nvidia.com/default/topic/501409/cl_invalid_command_queue-error-on-clfinish-command-a-lot-of-operations-in-each-kernel-driver-crash/

** ベンチマーク

-O2 plus32
    done in 0.137091 ms.

-O2 M128i
    done in 0.110865 ms.

*** GeForce GTX 260 (Linux)

% LD_LIBRARY_PATH=/home/nickle/dnetc516-linux-amd64-cuda22/lib ./dnetc
...
dnetc v2.9107-516-CTR-09122713 for CUDA 2.2 on Linux (Linux 3.5.0-21-generic).
...
[Mar 13 09:47:47 UTC] Automatic processor type detection found
                      a GeForce GTX 260 (27 MPs) processor.
[Mar 13 09:47:47 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).
[Mar 13 09:47:47 UTC] RC5-72: Loaded D4:ABB7FE31:00000000:64*2^32
[Mar 13 09:47:47 UTC] RC5-72: 24 packets (1511.00 stats units) remain in
                      buff-in.r72
[Mar 13 09:47:47 UTC] RC5-72: 0 packets are in buff-out.r72
[Mar 13 09:47:47 UTC] 1 cruncher has been started.
.....10%.....20%.....30%.....40%.....50%.....60%.....70%.....80%.....90%....100
[Mar 13 10:07:28 UTC] RC5-72: Completed D4:ABB7FE31:00000000 (64.00 stats units)
                      0.00:19:41.01 - [232,746,782 keys/s]
[Mar 13 10:07:28 UTC] RC5-72: Loaded D4:ABB7FE71:00000000:64*2^32
[Mar 13 10:07:28 UTC] RC5-72: Summary: 1 packet (64.00 stats units)
                      0.00:19:41.01 - [232.75 Mkeys/s]
[Mar 13 10:07:28 UTC] RC5-72: 23 packets (1447.00 stats units) remain in
                      buff-in.r72
                      Projected ideal time to completion: 0.07:14:06.00
[Mar 13 10:07:28 UTC] RC5-72: 1 packet (64.00 stats units) is in
                      buff-out.r72

*** Intel HD Graphics 4000 (Mac)
...
dnetc v2.9111-520-CTR-14081118 for OpenCL on Mac OS X (Darwin 13.4.0).
...
[Mar 13 09:50:15 UTC] Automatic processor detection found 1 processor.
[Mar 13 09:50:16 UTC] Automatic processor type detection did not
                      recognize the processor (tag: "HD Graphics 4000 ")
[Mar 13 09:50:16 UTC] RC5-72: Running micro-bench to select fastest core...
[Mar 13 09:50:37 UTC] RC5-72: using core #2 (CL 2-pipe).
...
[Mar 13 11:54:19 UTC] RC5-72: Completed D4:9B497480:00000000 (64.00 stats units)
                      0.02:03:42.25 - [37,034,289 keys/s]
[Mar 13 11:54:19 UTC] RC5-72: Loaded D4:9B4974C0:00000000:64*2^32
[Mar 13 11:54:19 UTC] RC5-72: Summary: 1 packet (64.00 stats units)
                      0.02:03:42.25 - [37.03 Mkeys/s]
* OpenCL



** Docker (Mac)

MacのDockerで動かすには、VirtualBoxの設定が必要

 % VBoxManage setextradata "boot2docker-vm" VBoxInternal/CPUM/SSE4.1 1

 % VBoxManage setextradata "boot2docker-vm" VBoxInternal/CPUM/SSE4.2 1



** Linux

ドライバ

 sudo sh NVIDIA-Linux-x86_64-340.96.run



CUDA Toolkit 5.5

ドライバも含まれてる気がする

https://developer.nvidia.com/cuda-toolkit-55-archive

 sudo dpkg -i cuda-repo-ubuntu1210_5.5-0_amd64.deb

 sudo apt-get update

 sudo apt-get install cuda



global_work_sizeを8にしたら動いた。16以上だと動かなかった



** Tips

長時間処理しすぎるとOSがドライバを自動で再起動するらしい

https://devtalk.nvidia.com/default/topic/501409/cl_invalid_command_queue-error-on-clfinish-command-a-lot-of-operations-in-each-kernel-driver-crash/



** ベンチマーク



 -O2 plus32

 done in 0.137091 ms.



 -O2 M128i

 done in 0.110865 ms.



*** GeForce GTX 260 (Linux)

---(

% LD_LIBRARY_PATH=/home/nickle/dnetc516-linux-amd64-cuda22/lib ./dnetc

...

dnetc v2.9107-516-CTR-09122713 for CUDA 2.2 on Linux (Linux 3.5.0-21-generic).

...

[Mar 13 09:47:47 UTC] Automatic processor type detection found

                      a GeForce GTX 260 (27 MPs) processor.

[Mar 13 09:47:47 UTC] RC5-72: using core #0 (CUDA 1-pipe 64-thd).

[Mar 13 09:47:47 UTC] RC5-72: Loaded D4:ABB7FE31:00000000:64*2^32

[Mar 13 09:47:47 UTC] RC5-72: 24 packets (1511.00 stats units) remain in

                      buff-in.r72

[Mar 13 09:47:47 UTC] RC5-72: 0 packets are in buff-out.r72

[Mar 13 09:47:47 UTC] 1 cruncher has been started.

.....10%.....20%.....30%.....40%.....50%.....60%.....70%.....80%.....90%....100

[Mar 13 10:07:28 UTC] RC5-72: Completed D4:ABB7FE31:00000000 (64.00 stats units)

                      0.00:19:41.01 - [232,746,782 keys/s]

[Mar 13 10:07:28 UTC] RC5-72: Loaded D4:ABB7FE71:00000000:64*2^32

[Mar 13 10:07:28 UTC] RC5-72: Summary: 1 packet (64.00 stats units)

                      0.00:19:41.01 - [232.75 Mkeys/s]

[Mar 13 10:07:28 UTC] RC5-72: 23 packets (1447.00 stats units) remain in

                      buff-in.r72

                      Projected ideal time to completion: 0.07:14:06.00

[Mar 13 10:07:28 UTC] RC5-72: 1 packet (64.00 stats units) is in

                      buff-out.r72

---)



*** Intel HD Graphics 4000 (Mac)

---(

...

dnetc v2.9111-520-CTR-14081118 for OpenCL on Mac OS X (Darwin 13.4.0).

...

[Mar 13 09:50:15 UTC] Automatic processor detection found 1 processor.

[Mar 13 09:50:16 UTC] Automatic processor type detection did not

                      recognize the processor (tag: "HD Graphics 4000 ")

[Mar 13 09:50:16 UTC] RC5-72: Running micro-bench to select fastest core...

[Mar 13 09:50:37 UTC] RC5-72: using core #2 (CL 2-pipe).

...

[Mar 13 11:54:19 UTC] RC5-72: Completed D4:9B497480:00000000 (64.00 stats units)

                      0.02:03:42.25 - [37,034,289 keys/s]

[Mar 13 11:54:19 UTC] RC5-72: Loaded D4:9B4974C0:00000000:64*2^32

[Mar 13 11:54:19 UTC] RC5-72: Summary: 1 packet (64.00 stats units)

                      0.02:03:42.25 - [37.03 Mkeys/s]

---)