We finally got the most wanted Jetson K1 board in the house! In this post we show how to turn a just unboxed tiny board into fully-functional CUDA development node.

alt text

From now on our training center also offers CUDA developer certification on Jetson K1.

1. Connect to board remotely over SSH

SSH is enabled by default, so simply connect ethernet and power cables, power on and search for the board in your local network, e.g. using fping. I use another Ubuntu laptop to login to Jetson K1: $ ssh ubuntu@ So no display is really needed, unless you really want it. X server does not work out of the box though (binary driver is missing, see step 3). Note the display port is HDMI-only (no D-Sub). In case you’d want to connect keyboard/mouse/usb stick - note there is only one USB port available.

Some earlier Jetson K1 shipments reportedly had Ubuntu 13 onboard, ours (received 12 June) has fresh Ubuntu 14.04 LTS (April 2014, Long-Time Support release): $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04 LTS Release: 14.04 Codename: trusty

2. Turn on software repositories

By default you will not be able to install almost any extra packages. In order to solve this, go to sources list and uncomment everything:

$ sudo vi /etc/apt/sources.list
$ sudo apt-get update
$ sudo apt-get install mc
$ mc

So now we have midnight commander, sweet! alt text

3. Install NVIDIA GPU compute/graphics driver

First thing to note in mc is ~/NVIDIA-INSTALLER

$ sudo ./installer.sh
Extracting the BSP...
Installing NVIDIA binaries
Using rootfs directory of: /
Extracting the NVIDIA user space components to /
Extracting the NVIDIA gst test applications to /
Extracting the configuration files for the supplied root filesystem to /
Creating a symbolic link nvgstplayer pointing to nvgstplayer-0.10
Adding symlink libcuda.so --> libcuda.so.1.1
Extracting the firmwares and kernel modules to /
Installing the board *.dtb files into //boot
Removing installation files...
Configuring the desktop, please reboot to take effect

4. Install CUDA

Download and install CUDA for L4T (Linux for Tegra) from here: (requires Registered Developer login/password):

$ sudo dpkg -i ./cuda-repo-l4t-r19.2_6.0-42_armhf.deb
$ sudo apt-get update
\$ sudo apt-get install cuda-toolkit-6.0

CUDA 6.0 toolkit will be installed to /usr/local/cuda-6.0, but still not connected to the command line. So, we need to:

  • Create file /etc/profile.d/cuda.sh (as admin) with: export PATH=\$PATH:/usr/local/cuda/bin
  • Create file /etc/ld.so.conf.d/cuda.conf
  • Reboot

Now nvcc is in command line!

\$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Sat_Mar_15_02:05:29_PDT_2014
Cuda compilation tools, release 6.0, V6.0.1

5. Compile and run deviceQuery sample

$ cd /usr/local/cuda
$ sudo chmod o+w samples/ -R
$ cd samples/1_Utilities/deviceQuery
$ make
$ ../../bin/armv7l/linux/release/gnueabihf/deviceQuery

Here’s how deviceQuery output compares to laptop graphics and Tesla card:

  GK20A GeForce GTX 680M Tesla K20c
CUDA Capability Major/Minor version number 3.2 3.0 3.5
Total amount of global memory 1746 Mbytes 2048 Mbytes 4800 Mbytes
Multiprocessors 1 7 13
CUDA Cores/MP 192 192 192
GPU Clock rate 852 MHz (0.85 GHz) 758 MHz (0.76 GHz) 706 MHz (0.71 GHz)
Memory Clock rate 924 Mhz 1800 Mhz 2600 Mhz
Memory Bus Width 64-bit 256-bit 320-bit
L2 Cache Size 131072 bytes 524288 bytes 1310720 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536) 1D=(65536) 1D=(65536)
  2D=(65536, 65536) 2D=(65536, 65536) 2D=(65536, 65536)
  3D=(4096, 4096, 4096) 3D=(4096, 4096, 4096) 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) 1D=(16384), 2048 1D=(16384), 2048 1D=(16384), 2048
Maximum Layered 2D Texture Size, (num) 2D=(16384, 16384), 2D=(16384, 16384), 2D=(16384, 16384),
layers 2048 2048 2048
Total amount of constant memory 65536 bytes 65536 bytes 65536 bytes
Total amount of shared memory per block 49152 bytes 49152 bytes 49152 bytes
Total number of registers available per 32768 65536 65536
Warp size 32 32 32
Maximum number of threads per 2048 2048 2048
Maximum number of threads per block 1024 1024 1024
Max dimension size of a thread block (1024, 1024, 64) (1024, 1024, 64) (1024, 1024, 64)
Max dimension size of a grid size (x,y,z) (2147483647, 65535, (2147483647, 65535, (2147483647,65535,
  65535) 65535) 65535)
Maximum memory pitch 2147483647 bytes 2147483647 bytes 2147483647 bytes
Texture alignment 512 bytes 512 bytes 512 bytes
Concurrent copy and kernel execution Yes with 1 copy Yes with 1 copy Yes with 2 copy
  engine(s) engine(s) engine(s)
Run time limit on kernels No No No
Integrated GPU sharing Host Memory Yes No No
Support host page-locked memory mapping Yes Yes Yes
Alignment requirement for Surfaces Yes Yes Yes
Device has ECC support Disabled Disabled Enabled
Device supports Unified Addressing (UVA) Yes Yes Yes

In our next post we will show how typical data processing applications perform on Jetson K1 GPU.