Super Fast OpenGL 3.2 Gaussian Blur

1 post / 0 new
DeVsh
Super Fast OpenGL 3.2 Gaussian Blur
This method needs documenting, as it is so awesome...

It takes advantage of:
1) Separability of the Filter
2) Hardware Bi-Linear Interpolation
3) OpenGL 3.2 -> GLSL 1.40 new texture fetching functions

Suppose we want an NxN pixel tap blur (i.e. blurs 9x9 pixels together)

1: Separability of the filter
Due to amazing maths which I will not explain here, Convolving the image in N-orthogonal directions with identical distribution Gaussian functions has exact same result as convolving the image with an N-dimensional Gaussian function

This means we first blur in the X direction and then we blur the result of that we blur in the Y direction

This way we can get exact same result as NxN pixel blur with only performing 2xN reads

2: Hardware Bi-Linear Interpolation

Bilinear interpolation is free on the GPU, everytime you read a pixel the GPU can interpolate between the nearest 4 to get an approximate value of the inbetween areas.

If we read along a line between 2 pixels (not diagonal) then only the values of these two will be interpolated
The interpolation is:
A*(1-c)+B*c=out
where c is fraction of the way from A to B which must be between 0.0 and 1.0

now if we look how the gaussian blur is performed
A*gaussWeight(positionA)+B*gaussWeight(positionB)

if positionB = positionA+1 (next pixel from a)
Then c can be set so c = gaussWeight(positionA+1)/(gaussWeight(positionA)+gaussWeight(positionA+1))

and then if we multiply "out" then we get essentially the same result

A*gaussWeight(positionA)+B*gaussWeight(positionB) = out*(gaussWeight(positionA)+gaussWeight(positionB))

Assuming Bilinear is free (or almost free) this cuts down our texture reads to N+2 for a NxN filter
with a radius of 8 pixels, 17x17 filter, 18 texture reads instead of 289

3) GLSL 1.40 fetches

We can make an observation that the c=gaussWeight(positionA+1)/(gaussWeight(positionA)+gaussWeight(positionA+1))
is usually such that 0.45<c<0.5 for most sensible filters

Because we can count on not that much precision being involved in bilinear interpolation

We can actually always read at ~0.47 away from the first pixel and not introduce too much error

That means we will read in intervals of exactly 2 pixels

This brings textureOffset() reading function into play which enables the GPU to fetch texture samples faster (optimizing cache usage and the bilinear filter)