Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

BLIS for the Web:

HPC in a Web browser

Marat Dukhan

Presentation on BLIS Retreat 2014

Why Web applications

Why Web applications

Distribution

Standalone application
  1. Go to the Web site
  2. Find download link
  3. Download the application
  4. Run the installer
  5. Enter admin password
  6. Wait
  7. Wait
  8. Wait
  9. Run the application
Web application
  1. Go to the Web site

Why Web applications

Portability

Standalone application

Web application

JavaScript: Introduction

JavaScript: Types

JavaScript is dynamically typed

var a = 10;
typeof(a); /* -> number */
a += "text"
typeof(a); /* -> string */

JavaScript has no integer type

var a = 10;
typeof(a); /* -> number */
var b = 10.0;
typeof(b); /* -> number */

JavaScript: Arrays

All arrays are hash tables

var a = Array(2);
a[0] = 1; /* OK */
a[1] = 2; /* OK */
a["hello"] = "world"; /* OK! */
var b = a[10]; /* OK!! */
b; /* -> undefined!!! */

Elements are dynamically typed

var a = Array();
a[0] = 42;
a[0] = "GATech"; /* OK */

JavaScript: Long operations

No blocking operations supported

callLongRunningOperation(normalArgs, function () {
    /* Callback code */
});

Event handlers must finish fast

Naive DGEMM in JavaScript

function DGEMM(a, b, c, size) {
    for (var i = 0; i < size; ++i) {
        for (var j = 0; j < size; ++j) {
            var sum = c[i*size+j];
            for (var k = 0; k < size; ++k) {
                sum += a[i*size+k] * b[k*size+j];
            }
            c[i*size+j] = sum;
        }
    }
};

Naive ZGEMM in JavaScript

function ZGEMM(a, b, c, size) {
    for (var i = 0; i < size; ++i) {
        for (var j = 0; j < size; ++j) {
            var sumReal = c[2*(i*size+j)];
            var sumImag = c[2*(i*size+j)+1];
            for (var k = 0; k < size; ++k) {
                var aReal = a[2*(i*size+k)];
                var aImag = a[2*(i*size+k)+1];
                var bReal = b[2*(k*size+j)];
                var bImag = b[2*(k*size+j)+1]
                sumReal += aReal * bReal - aImag * bImag;
                sumImag += aReal * bImag + aImag * bReal;
            }
            c[2*(i*size+j)] = sumReal;
            c[2*(i*size+j)+1] = sumImag;
        }
    }
};

WebGL: Introduction

WebGL: Typed Arrays

Typed arrays provide a way of compact storage for numeric types

/* 1KB memory chunk */
var ab = ArrayBuffer(1024);
/* View of the memory chunk as 256 32-bit integers */
var i32Array = Int32Array(ab);
i32Array[7] = 10;
/* View of the memory chunk as 256 single-precision floats */
var f32Array = Float32Array(ab);
f32Array[8] = 3.14159265359;
/* View of the memory chunk as 128 double-precision floats */
var f64Array = Float64Array(ab);
f64Array[0] = 2.7182818284590452353;

Emscripten: Introduction

Emscripten is a clang-based compiler that compiles C and C++ to Asm.js, a subset of JavaScript.
Emscripten is widely used to port desktop software to run in a browser.

Emscripten: Asm.js

Emscripten compiles to Asm.js

double dotProduct(double* x, int len) {
    double s = 0.0;
    for (int i = 0; i < len; ++i) {
        s += *x++;
    }
    return s;
}

Asm.js is, in fact, statically typed

function dotProduct(x, len) {
    x = x|0;
    len = len|0;
    var i = 0, s = 0.0;
    for (i = 0; (i|0) < (len|0); i = (i + 1)|0) {
        s = s + (+HEAP64F[x>>3]);
        x = (x + 8)|0
    }
    return +s;
}

Linear Algebra with Asm.js

Portable Native Client: Intro

Portable Native Client enables Web browser to run web apps pre-​compiled to LLVM-based bitcode.

PNaCl is independent of JavaScript, and is free of its limitations.

Portable Native Client vs Asm.js

Portable Native Client (PNaCl)Emscripten/Asm.js
SIMDYes (SP only)Experimental
Shared memory parallelismYes (pthreads)In discussion
64-bit integersSupported nativelyEmulated
with 32-bit ops
Can run inChromiumAll browsers
(fast in Firefox)

Linear Algebra with PNaCl

Web Workers: Introduction

Web Workers enable multi-core execution in JavaScript

Web Workers can do long-running computations, but can not access or modify the content of the Web page

Web Workers: Communication

Web Workers are not threads, they are processes: message passing is the only way of communication

Objects can be copied or transfered to a Web Worker, but not shared

Furious.js: Motivation

Heterogenuity of APIs for Web computing

Furious.js: Architecture

Furious.js: API

The API is intentionally similar to numpy


furious.init(function(context) {
	var A = context.array([[1, 2],
	                       [2, 4]]);
	var L = context.cholesky(A, "L");
	var y = context.ones([2]);
	var z = context.solveTridiagonal(L, y, "L");
	var x = context.solveTridiagonal(L, z, "L", "N");
	var m = x.max();
	context.get(m, function(mValue) {
		show(mValue);
	})
});
					

Acknowledgements

This material is based upon work supported by the U.S. Department of Energy (DOE), Office of Science, Office of Advanced Scientific Computing Research, X-Stack program under Award Number DE-FC02-10ER26006/DE-SC0004915; as well as grants from the U.S. National Science Foundation (NSF) Award Number 1339745 and CAREER Award Number 0953100.


Special thank you to AMD, who hired me for a summer internship to start the Furious.js project and made it open-source.

Resources

BLIS ports for Portable Native Client and Emscripten are available from BLIS repo as pnacl and emscripten configurations

BLIS for Portable Native Client is also available from NaClPorts as blis package

GEMM benchmarks used in this presentation are released on GitHub.com/Maratyszcza/blis-bench

Furious.js is currently developed on GitHub.com/HPCGarage/Furious.js

Furious.js demos can be located here and there