Intuitive Guide to Convolution

Like making engineering students squirm? Have them explain convolution and (if you're barbarous) the convolution theorem. They'll mutter something about sliding windows as they try to escape through one.

Convolution is usually introduced with its formal definition:

\displaystyle{ (f * g )(t) = \int_{-\infty}^\infty f(\tau) g(t - \tau) d\tau }

Yikes. Let's startwithoutcalculus:卷积是一种奇特的乘法。

Part 1: Hospital Analogy

Imagine you manage a hospital treating patients with a single disease. You have:

  • A treatment plan:[3]Every patient gets 3 units of the cure on their first day.
  • A list of patients:[1 2 3 4 5]你一周的病人数(周一1人,周二2人,等等)。

Question: How much medicine do you use each day? Well, that's just a quick multiplication:

Plan * Patients = Daily Usage [3] * [1 2 3 4 5] = [3 6 9 12 15]

Multiplying the plan by the patient list gives the usage for the upcoming days:[3 6 9 12 15]. Everyday multiplication (3 x 4) means using the plan with a single day of patients:[3] * [4] = [12].

Intuition For Convolution

Let's say the disease mutates and requires a multi-day treatment. You create a new plan:Plan: [3 2 1]

That means 3 units of the cure on the first day, 2 on the second, and 1 on the third. Ok. Given the same patient schedule[1 2 3 4 5], what's our medicine usage each day?

Uh... shoot. It's not a quick multiplication:

  • On Monday, 1 patient comes in. It's her first day, so she gets 3 units.
  • On Tuesday, the Monday gal gets 2 units (her second day), but two new patients arrive, who get 3 each (2 * 3 = 6). The total is 2 + (2 * 3) = 8 units.
  • On Wednesday, it's trickier: The Monday gal finishes (1 unit, her last day), the Tuesday people get 2 units (2 * 2), and there are 3 new Wednesday people... argh.

The patients are overlapping and it's hard to track. How can we organize this calculation?

An idea: imagineflippingthe patient list, so the first patient is on the right:

Start of line 5 4 3 2 1

Next, imagine we have 3 separate rooms where we apply the proper dose:

Rooms 3 2 1

第一天,你走进第一个房间,拿了3个单位的药。第二天,你走进2号房间,拿到了2个单元。最后一天,你走进3号房间,拿了一个单元。之后就没有房间了,你的治疗也结束了。

To calculate the total medicine usage, line up the patients and walk them through the rooms:

Monday ---------------------------- Rooms 3 2 1 Patients 5 4 3 2 1 Usage 3

On Monday (our first day), we have a single patient in the first room. She gets 3 units, for a total usage of 3. Makes sense, right?

On Tuesday, everyone takes a step forward:

Tuesday ---------------------------- Rooms 3 2 1 Patients -> 5 4 3 2 1 Usage 6 2 = 8

The first patient is now in the second room, and there's 2 new patients in the first room. We multiply each room's dose by the patient count, then combine.

Every day we just walk the list forward:

Wednesday ---------------------------- Rooms 3 2 1 Patients -> 5 4 3 2 1 Usage 9 4 1 = 14 Thursday ----------------------------- Rooms 3 2 1 Patients -> 5 4 3 2 1 Usage 12 6 2 = 20 Friday ----------------------------- Rooms 3 2 1 Patients -> 5 4 3 2 1 Usage 15 8 3 = 26

Whoa! It's intricate, but we figured it out, right? We can find the usage for any day by reversing the list, sliding it to the desired day, and combining the doses.

The total day-by-day usage looks like this (don't forget Sat and Sun, since some patients began on Friday):

Plan * Patient List = Total Daily Usage [3 2 1] * [1 2 3 4 5] = [3 8 14 20 26 14 5] M T W T F M T W T F S S

This calculation is theconvolutionof the plan and patient list. It's a fancy multiplication between a list of input numbers and a "program".

Interactive Demo

Here's alive demo. Try changingF(the plan) orG(the patient list). The convolution $c(t)$ matches our manual calculation above.

(我们定义函数$f(x)$和$g(x)$来填充每个列表,并从1开始调整列表索引。)

You can do a quick convolution withWolfram Alpha:

ListConvolve[{3, 2, 1}, {1, 2, 3, 4, 5}, {1, -1}, 0] {3, 8, 14, 20, 26, 14, 5}

(The extra{1, -1}, 0aligns the lists and pads with zero.)

Application: COVID Ventilator Usage

I started this article 5 years ago (intuition takes a while...), but unfortunately the analogy is relevant today.

Let's use convolution to estimate ventilator usage for incoming patients.

  • Set $f(x)$ as the percent of patients needing ventilators. For example,[.05 .03 .01]这意味着第一周有5%的病人需要呼吸机,第二周3%,第三周1%。
  • Set $g(x)$ as the weekly incoming patients, in thousands.
  • 卷积$c(t) = f * g$,表示每周需要多少台呼吸机(以千为单位)。c(5)美元是5周后需要多少呼吸机。

Let's try it out:

  • F = [.05, .03, .01]is the ventilator use percentage by week
  • G = [10, 20, 30, 20, 10, 10, 10], is the incoming hospitalized patients. It starts at 10k per week, rises to 30k, then decays to 10k.

With these numbers, we expect a max ventilator use of 2.2k in 2 weeks:

Convolution_Demo

The convolution drops to 0 after 9 weeks because the patient list has run out. In this example, we're interested in the peak value the convolution hits, not the long-term total.

其他涉及的计划可能包括药物剂量、疫苗预约(今天一个,一个月后另一个)、再感染以及其他复杂的相互作用。

The hospital analogy is the mental model I wish I had when learning. Now that we've tried it with actual numbers, let's add the Math Juice and convert the analogy to calculus.

Part 2: The Calculus Definition

So, what happened in our example? We had a list of patients and a plan. If the plan were simple (single day[3]), regular multiplication would have worked. Because the plan was complex, we had to "convolve" it.

Time for some Fun Facts™:

  • Convolution is written $f * g$, with an asterisk. Yes, an asterisk usually indicates multiplciation, but in advanced calculus class, it indicates a convolution. Regular multiplication is just implied ($fg$).

  • The result of a convolution is a newfunctionthat gives the total usage for any day ("What was the total usage on day $t=3$?"). We can graph the convolution over time to see the day-by-day totals.

Now the big aha:Convolution reverses one of the lists!Here's why.

Let's call our treatment plan $f(x)$. In our example, we used[3 2 1].

患者列表(输入)为$g(x)$。However, we need toreversethis list as we slide it, so the earliest patient (Monday) enters the hospital first (first in, first out). This means we need to use $g(-x)$, the horizontal reflection of $g(x)$.[1 2 3 4 5]becomes[5 4 3 2 1].

Now that we have the reversed list, pick a day to compute ($t = 0, 1, 2...$). To slide our patient list by this much, we use: $g(-x + t)$. That is, we reverse the list ($-x$) and jump to the correct day ($+t$).

We have our scenario:

  • $f(x)$ is the plan to use
  • $g(-x + t)$ is the list of inputs (flipped and slid to the right day).

To get the total usage on day $t$, we multiply each patient with the plan, and sum the results (an integral). To account for any possible length, we go from -infinity to +infinity.

Now we can describe convolution formally using calculus:

convolution

(Likecolorized math? There's more.)

Phew! That's quite few symbols. Some notes:

  • We use a dummy variable $\tau$ (tau) for the intermediate computation. Imagine $\tau$ as knocking on each room ($\tau={0, 1, 2, 3...}$), finding the dosage [$f(\tau)$], the number of patients [$g(t - \tau)$], multiplying them, and totaling things in the integral. Yowza. The so-called "dummy" variable $\tau$ is likeiin aforloop: it's temporary, but does the work. (By analogy, $t$ is a global variable has a fixed value during the loop: it's the day we're calculating the usage for, such ast = Day 5).
  • In the official definition, you'll see $g(t - \tau)$ instead of $g(- \tau+ t)$. The second version shows the flip ($-\tau$) and slide ($+t$). Writing $g(t - \tau)$ makes it seem like we're interested in the difference between the variables, which confused me.
  • The treatment plan (program to run) is called thekernel: you convolve a kernel with an input.

Not too bad, right? The equation is a formal description of the analogy.

Part 3: Mathematical Properties of Convolution

We can't discover a new math operation without taking it for a spin. Let's see how it behaves.

Convolution is commutative: f * g = g * f

在计算过程中,我们翻转了病人名单,并保持计划不变。我们能改变计划吗?

You bet. Imagine the patients are immobile, and stay in their rooms:[1 2 3 4 5]. To deliver the medicine, we have 3 medical carts that go to each room and deliver the dose. Each day, they slide forward one position.

Carts -> 1 2 3 1 2 3 4 5 Patients

As before, though our plan is written[3 2 1](3 units on the first day), we flip the order of the carts to[1 2 3]. That way, a patient gets 3 units on their first day, as we expect. Checking with Wolfram Alpha, thecalculation是相同的。

ListConvolve[{1, 2, 3, 4, 5}, {3, 2, 1}, {1, -1}, 0] {3, 8, 14, 20, 26, 14, 5}

Cool! It looks like convolution is commutative:

\displaystyle{f * g = g * f}

and we can decide to flip either $f$ or $g$ when calculating the integral. Surprising, right?

The integral of the convolution

When all treatments are finished, what was thetotalmedicine usage? This is theintegral of the convolution. (A few minutes ago, that phrase would have you jumping out of a window.)

But it's a simple calculation. Our plan gives each patientsum([3 2 1]) = 6units of medicine. And we havesum([1 2 3 4 5]) = 15patients. The total usage is just6 x 15 = 90units.

Wow, that was easy: the usage for theentireconvolution is just the product of the subtotals!

\displaystyle{\int (f * g) = \int f \cdot \int g }

I hope this clicks intuitively. Note that this trick works for convolution, but not integrals in general. For example:

\displaystyle{ \int (x \cdot x) \ne \int x \cdot \int x }

If we separate $x \cdot x$ into two integrals we get:

  • $ \int (x \cdot x) = \int x^2 = \frac{1}{3} x^3 $
  • $\int x \cdot \int x = \frac{1}{2}x^2 \cdot \frac{1}{2}x^2 = \frac{1}{4}x^4$

它们是不一样的。(如果我们能像这样分割积分的话,微积分就简单多了)It's strange, but $\int (f * g)$ is probably easier to solve than $\int (fg)$.

Impulse Response

如果我们把一个病人送进医院会怎么样?卷积就是那天的计划。

Plan * Patients = Convolution [3 2 1] * [1] = [3 2 1]

In other words, convolving with[1]gives us the original plan.

In calculus terms, a spike of[1](and 0 otherwise) is theDirac Delta Function. In terms of convolutions, this function acts like the number 1 and returns the original function:

\displaystyle{f(t) * \delta(t) = f(t)}

我们可以将脉冲函数延后T,这也会使卷积函数延后。Imagine our single patient shows up a week late ($\delta(t - T)$), so our medicine usage gets delayed for a week too:

\displaystyle{f(t) * \delta(t - T) = f(t - T)}

Part 4: Convolution Theorem & The Fourier Transform

TheFourier Transform(written with a fancy $\mathscr{F}$) converts a function $f(t)$ into a list of cyclical ingredients $F(s)$:

\displaystyle{f(t) \xrightarrow{\mathscr{F}} F(s)}

作为一个操作符,可以写成$\mathscr{F}\lbrace F \rbrace = F$。

In our analogy, we convolved the plan and patient list with a fancy multiplication. Since the Fourier Transform gives us lists of ingredients, could we get the same result by mixing theingredient lists?

Yep, we can:Fancy multiplication in the regular world isregularmultiplication in the fancy world.

In math terms, "Convolution in the time domain is multiplication in the frequency (Fourier) domain."

Mathematically, this is written:

\displaystyle{f * g \xrightarrow{\mathscr{F}} FG}

or

\displaystyle{\mathscr{F}\lbrace f * g \rbrace = FG}

其中$f(x)$和$g(x)$是要卷积的函数,与转换$f(s)$和$g(s)$。

We canprove this theoremwith advanced calculus, that uses theorems I don't quite understand, but let's think through the meaning.

Because $F(s)$ is the Fourier Transform of $f(t)$, we can ask for a specific frequency ($s = 2\text{Hz}$) and get thecombined interactionof every data point with that frequency. Let's suppose:

\displaystyle{F(2) = 3 + i}

That means after every data point has been multiplied against the 2Hz cycle, the result is $3 + i$. But we could have kept each interaction separate:

\displaystyle{F(2) = 3 + i = c_0 + c_1 + c_2 + c_3 ... + c_t}

其中$c_t$是数据点$t$对2Hz频率的贡献。同样,我们可以将$G(s)$扩展为与2Hz成分的交互列表。Let's suppose $G(2) = 7 - i$:

\displaystyle{G(2) = 7 - i = d_0 + d_1 + d_2 + d_3 ... + d_t}

The Convolution Theorem is really saying:

\displaystyle{f * g \xrightarrow{\mathscr{F}} FG = (c_0 + c_1 + c_2 + ...)(d_0 + d_1 + d_2 + ...)}

常规域的卷积涉及到很多交叉乘法。In the fancy frequency domain, westillhave a bunch of interactions, but $F(s)$ and $G(s)$ have consolidated them. We can just multiply $F(2)G(2) = (3 + i)(7-i)$ to find the 2Hz ingredient in the convolved result.

By analogy, suppose you want to calculate:

\displaystyle{(1 + 2 + 3 + 4)(5 + 6 + 7 + 8) = ?}

It's a lot of work to cross-multiply every term: $(1 \cdot 5) + (1\cdot 6) + (1\cdot 7) + ...$

It's better to consolidate the groups into $(1 + 2 + 3 + 4) = 10$ and $(5 + 6 + 7 + 8) = 26$, andthenmultiply to get $10 \cdot 26 = 260$.

这种细微差别给我带来了很多困惑。看起来$FG$是一个单独的乘法,而$f * g$包含一堆中间项。我忘了$F$已经完成了将一堆条目合并成一个条目的工作。

Now, we aren'tquitedone.

\displaystyle{f * g \xrightarrow{\mathscr{F}} FG \xrightarrow{\mathscr{F^{-1}}} \ ?}

We can convert $f * g$ in the time domain into $FG$ in the frequency domain, but we probably need it back in the time domain for a usable result:

\displaystyle{f * g = \mathscr{F}^{-1} \lbrace FG \rbrace}

You have a riddle in English ($f * g$), you translate it to French ($FG$), get your smart French friend to work out that calculation, then convert it back to English ($\mathscr{F}^{-1}$).

And in reverse...

The convolution theorem works this way too:

\displaystyle{fg \xrightarrow{\mathscr{F}} F * G}

Regular multiplication in the regular world is fancy multiplication in the fancy world.

Cool, eh? Instead of multiplying two functions like some cave dweller, put on your monocle, convolve the Fourier Transforms, and and convert to the time domain:

\displaystyle{fg = \mathscr{F}^{-1} \lbrace F*G \rbrace}

I'm not saying this is fun, just that it's possible. If your French friend has a gnarly calculation they're struggling with, it might look like arithmetic to you.

Mini proof

Remember how we said the integral of a convolution was a multiplication of the individual integrals?

\displaystyle{\int (f * g) = \int f \cdot \int g }

傅里叶变换是一个非常具体的积分,对吧?

\displaystyle{\int \rightarrow \mathscr{F}}

So (handwaving), it seems we could swap the general-purpose integral $\int$ for $\mathscr{F}$ and get

\displaystyle{\mathscr{F} (f * g) = \mathscr{F} (f) \cdot \mathscr{F} (g) }

也就是卷积定理。I need a deeper intuition for theproof, but this helps things click.

Part 5: Applications

The trick with convolution is finding a useful "program" (kernel) to apply to your input. Here's a few examples.

Moving averages

Let's say you want a moving average between neighboring items in a list. That is half of each element, added together:

\displaystyle{\frac{a + b}{2} = \frac{a}{2} + \frac{b}{2}}

This is a "multiplication program" of[0.5 0.5]convolved with our list:

ListConvolve[{1, 4, 9, 16, 25}, {0.5, 0.5}, {1, -1}, 0] {0.5, 2.5, 6.5, 12.5, 20.5, 12.5}

我们只需一个操作就可以实现移动平均线。Neat!

A 3-element moving average would be[.33 .33 .33], a weighted average could be[.5 .25 .25].

Derivatives

The derivative finds the difference between neighboring values. Here's the plan:[1 -1]

ListConvolve[{1, 2, 3, 4, 5}, {1, -1}, {1, -1}, 0] {1, 1, 1, 1, 1, -5} // -5 since we ran out of entries ListConvolve[{1, 4, 9, 16, 25}, {1, -1}, {1, -1}, 0] {1, 3, 5, 7, 9, -25} // discrete derivative is 2x + 1

With a simple kernel, we can find a useful math property on a discrete list. And to get a second derivative, just apply the derivative convolution twice:

F * [1 -1] * [1 -1]

As a shortcut, we can precompute the final convolutions ([1 -1] * [1 -1]) and get:

ListConvolve[{1, -1}, {1,-1}, {1, -1}, 0] {1, -2, 1}

Now we have asinglekernel[1, -2, 1]that gets the second derivative of a list:

ListConvolve[{1, 4, 9, 16, 25}, {1, -2, 1}, {1, -1}, 0] {1, 2, 2, 2, 2, -34, 25}

Excluding the boundary items, we get the expected second derivative:

\displaystyle{x^2 \xrightarrow{d/dx} 2x \xrightarrow{d/dx} 2}

Blurring / unblurring images

An image blur is essentially a convolution of your image with some "blurring kernel":

\displaystyle{\text{blurred} = \text{image} * \text{blur}}

The blur of our 2D image requires a2D average:

Kernel__image_processing__-_Wikipedia

Can we undo the blur? Yep! With our friend the Convolution Theorem, we can do:

\displaystyle{\text{blurred} = \text{image} * \text{blur}}

\displaystyle{\mathscr{F} \lbrace \text{blurred} \rbrace = \mathscr{F} \lbrace \text{image} * \text{blur} \rbrace}

\displaystyle{\mathscr{F} \lbrace \text{blurred} \rbrace = \mathscr{F} \lbrace \text{image} \rbrace \mathscr{F} \lbrace \text{blur} \rbrace}

\displaystyle{\frac{ \mathscr{F} \lbrace \text{blurred} \rbrace }{\mathscr{F} \lbrace \text{blur} \rbrace} = \mathscr{F} \lbrace \text{image} \rbrace }

\displaystyle{\mathscr{F}^{-1} \lbrace \frac{ \mathscr{F} \lbrace \text{blurred} \rbrace }{\mathscr{F} \lbrace \text{blur} \rbrace} \rbrace = \text{image}}

Whoa! We can recover the original image by dividing out the blur. Convolution is a simple multiplication in the frequency domain, anddeconvolutionis a simple division in the frequency domain.

image-blur

A short while back, the concept of "deblurring by dividing Fourier Transforms" was gibberish to me. While it can be daunting mathematically, it's getting simpler conceptually.

More reading:

Algorithm Trick: Multiplication

What is a number? A list of digits:

1234 = 1000 + 200 + 30 + 4 = [1000 200 30 4] 5678 = 5000 + 600 + 70 + 8 = [5000 600 70 8]

那么什么是普通的小学乘法呢?一个卷积digit-by-digit !We sweep one list of digits by the other, multiplying and adding as we go:

Source

We can perform the calculation by convolving the lists of digits (wolfram alpha):

ListConvolve[{1000, 200, 30, 4}, {8, 70, 600, 5000}, {1, -1}, 0] {8000, 71600, 614240, 5122132, 1018280, 152400, 20000} sum {8000, 71600, 614240, 5122132, 1018280, 152400, 20000} 7006652

Note that we pre-flip one of the lists (it gets swapped in the convolution later), and the intermediate calculations are a bit different. But, combining the subtotals gives the expected result.

Faster Convolutions

Why convolve instead of doing a regular digit-by-digit multiplication? Well, the convolution theorem lets us substitute convolution with Fourier Transforms:

\displaystyle{f * g = \mathscr{F}^{-1} \lbrace FG \rbrace}

The convolution ($f * g$) has complexity $O(n^2)$. We have $n$ positions to process, with $n$ intermediate multiplications at each position.

The right side involves:

  • 两个傅里叶变换,通常是O(n^2)$。However, the Fast Fourier Transform (adivide-and-conquer approach) makes them $O(n\log(n))$.
  • Pointwise multiplication of the final result of the transforms ($\sum a_n \cdot b_n$), which is $O(n)$
  • An inverse transform, which is $O(n\log(n))$

And the total complexity is: $O(n\log(n)) + O(n\log(n)) + O(n) + O(n\log(n)) = O(n\log(n))$

Regular multiplication in the fancy domain isfasterthan a fancy multiplication in the regular domain. Our French friend is no slouch. (More)

Convolutional Neural Nets (CNN)

Machine learning is about discovering the math functions that transform input data into a desired result (a prediction, classification, etc.).

Starting with an input signal, we could convolve it with a bunch of kernels:

\displaystyle{\text{input} * k_1 * k_2 * k_3 ... = \text{result}}

Given that convolution can do complex math (moving averages, blurs, derivatives...), it seemssomecombination of kernels should turn our input into something useful, right?

Convolutional Neural Nets (CNNs) process an input with layers of kernels, optimizing their weights (plans) to reach a goal. Imagine tweaking the treatment plan to keep medicine usage below some threshold.

CNNs are often used with image classifiers, but 1D data sets work just fine.

LTI System Behavior

A linear, time-invariant system means:

  • Linear: Scaling and combining inputs scales and combines outputs by the same amount
  • Time invariant: Outputs depend on relative time, not absolute time. You get 3 units onyourfirst day, and it doesn't matter if it's Wednesday or Thursday.

A fancy phrase is "A LTI system is characterized by its impulse response". Translation: If we send asinglepatient through the hospital[1], we'll discover the treatment plan. Then we can predict the usage foranysequence of patients by convolving it with the plan.

\displaystyle{\text{system response} = \text{impulse response} * \text{inputs}}

If the system isn't LTI, we can't extrapolate based on asingleperson's experience. Scaling the inputs may not scale the outputs, and the actual calendar day, not relative day, may impact the result (imagine fewer rooms available on weekends).

Engineering Analogies

FromDavid Greenspan: "Suppose you have a special laser pointer that makes a star shape on the wall. You tape together a bunch of these laser pointers in the shape of a square. The pattern on the wall now is the convolution of a star with a square."

常规乘法为您提供一个输入的单一比例副本。卷积创建多个重叠的副本,这些副本遵循您指定的模式。

Real-world systems have squishy, not instantaneous, behavior: they ramp up, peak, and drop down. The convolution lets us model systems that echo, reverb and overlap.

Now it's time for the famous sliding window example. Think of a pulse of inputs (red) sliding through a system (blue), and having a combined effect (yellow): the convolution.

(Source)

Summary

Convolution has an advanced technical definition, but the basics can be understood with the right analogy.

Quick rant: I study math for fun, yet it took years to find a satisfying intuition for:

  • Why is one function reversed?
  • Why is convolution commutative?
  • Why does the integral of the convolution = product of integrals?
  • Why are the Fourier Transforms multiplied point-by-point, and not overlapped?

Why'd it take so long? Imagine learning multiplication with $f \times g = z$ instead of $3 \times 5 = 15$. Without an example I can explorein my head, I could only memorize results, not intuit them. Hopefully this analogy can save you years of struggle.

Happy math.

Other Posts In This Series

  1. A Visual, Intuitive Guide to Imaginary Numbers
  2. Intuitive Arithmetic With Complex Numbers
  3. Understanding Why Complex Multiplication Works
  4. Intuitive Guide to Angles, Degrees and Radians
  5. Intuitive Understanding Of Euler's Formula
  6. An Interactive Guide To The Fourier Transform
  7. Intuitive Guide to Convolution
  8. Intuitive Understanding of Sine Waves
  9. An Intuitive Guide to Linear Algebra
  10. A Programmer's Intuition for Matrix Multiplication
  11. Imaginary Multiplication vs. Imaginary Exponents
  12. Intuitive Guide to Hyperbolic Functions

Topic Reference

Join 450k Monthly Readers

喜欢这篇文章吗?还有很多方法可以帮助你对数学建立持久、直观的理解。加入时事通讯的奖金世界杯比利时vs摩洛哥亚盘内容和最新的更新。