Conic Sections

Recently I had a conversation about that classical area of geometry, the study of conic sections: ellipses, parabolae, and hyperbolae. Conics are pretty much the simplest plane curves that one can imagine after straight lines, and have some lovely connections to physical motions, the most well-known of which is that planets follow elliptical orbits. This post will quickly cover some of the basics.

Simply put, a conic section is the intersection of the double cone

K := { (x, y, z) ∈ ℝ3 | x2 + y2 = z2 }

with a (two-dimensional) plane P. The plane P is completely described by two things: its unit normal ν = (a, b, c) ∈ ℝ3 (“unit” because we insist that a2 + b2 + c2 = 1) and a signed distance d ∈ ℝ from the origin in ℝ3 to P along the direction ν. In this notation,

P = { (x, y, z) ∈ ℝ3 | ax + by + cz = d }.

So, we’re interested in C := KP, where K is the standard cone defined above and P is a plane in ℝ3. For the most part, this intersection is a one-dimensional smooth curve in P; there are a few degenerate cases that will get pointed out along the way. As it turns out, while the normal ν has a great influence on the shape of the curves C, the distance d is relatively unimportant: you get the same curve C simply scaled up or down if you make d larger or smaller; the only thing to watch out for is that things do change radically at d = 0. (Also, I’m restricting attention to the standard cone; if I were being really complete, then I’d consider the possibility that the cone might degenerate into a cylinder.)

One useful trick that I’m going to use here is to use capital letters X and Y to denote coordinates on the plane P. I don’t want to go through the rigmarole of precisely relating X and Y to x, y and z except to say that X and Y will always be orthogonal coordinate directions, with the origin (X, Y) = (0, 0) placed somewhere convenient in P. The following picture, taken from Wikipedia, nicely illustrates how the different conic curves arise as slices of the cone at various angles — although this picture shows a single, not double, cone:

A natural place to start is with circles. A circle is what arises when the plane P is perpendicular to the z-axis, i.e. the axis about which the cone K is rotationally symmetric. In fact, if you stare at the definition of K for a second, you’ll see that if we fix z = r, thereby defining a plane, the portion of K that has z = r is simply a circle of radius |r|. Thus, C is the circle with the equation

C: X2 + Y2 = r2

or, to make it look more like the ellipse that’s coming next,

C: (Xr)2 + (Yr)2 = 1.

Of course, there’s a degenerate case here, which is r = 0, in which case the “curve” C is just the single point (X, Y) = (0, 0). It may seem trivial, but circles have a nice optical property: if you build a circular mirror about the origin and emit light rays from the origin, then all the light rays will bounce right back to the origin; furthermore, since they all travel the same distance 2r (from the centre to the edge and back) at the same speed, they all arrive back at the origin at the same time as one another.

The next thing to do is to tilt the plane P a little bit. An ellipse is basically just a distorted circle, and arises when the plane P has its normal vector ν = (a, b, c) inside the cone K in the sense that

a2 + b2 < c2;

circles were simply the special case ν = (0, 0, 1). The equation for an ellipse is

C: (XA)2 + (YB)2 = 1

A and B are positive parameters: the larger one is called the semi-major radius (and the corresponding X or Y direction the semi-major axis), and the smaller one is the semi-minor radius / axis. Of course there is the usual degeneration to a single point if P passes through the origin, in which case A = B = 0; the case of only one of A or B being 0, and the curve C being a line, will come up in the next paragraph on parabolae. Ellipses have a rather cool optical property. They have two foci: if you build an elliptical mirror and send out light from one focus to reflect off the mirror, then the light all arrives at the same time at the other focus. This principle is actually used in so-called “whisper galleries”, elliptical chambers in which the two conversation partners stand at the relevant foci. One can also construct an ellipse in such a way by placing two pins (the foci) in a piece of paper, running a string between them, pulling it taut with a pen, and moving the pen around, all the while keeping the string taut. That gives us another equation for the ellipse in terms of the foci F1 and F2 and the Euclidean distance:

dist((X, Y), F1) + dist((X, Y), F2) = 2 dist(F1, F2).

It’s a nice exercise to convert between the two forms and get a formula for A and B in terms of the focal positions, and vice versa; it’s helpful to work in the special case A >; B, in which case the foci are on the X-axis, so that both their second coordinates are 0. Circles, of course, correspond to A = B or, equivalently, F1 = F2.

The next case is a parabola, which is unlike the ellipse in that it is not a closed curve. In terms of the plane P, a parabola happens when the normal ν to P lies exactly on the surface of the cone K, i.e. when

a2 + b2 = c2.

The equation for a parabola is

C: Y2 = 4AX.

The factor of 4 is entirely a matter of convention; it just simplifies a few common calculations. The positive parameter A controls how “pointy” C is: when A is nearly 0, C is very sharply pointed; when A is large, C is very blunt. The nasty degeneracy to look out for here is when P passes through the origin as well as having its normal lie on the surface of K: when that happens, the plane P is tangent to the cone K and C is just a line, say Y = 0. This is the situation of A = 0 in the parabola equation. Whereas an ellipse (at least, one that isn’t a circle) has two foci, a parabola has one: the optical interpretation is that rays of light coming in parallel to the X-axis all get focused onto that one focal point; it’s almost like the other focus has been moved off to “X = +∞”.

A hyperbola arises when the plane P intersects both the upper and lower halves of K, which happens when the normal ν to P lies exactly outside the cone K, i.e. when

a2 + b2 > c2.

Generically, this means that the curve C has two disconnected components, or branches; the exception is when P passes through the origin in ℝ3, in which case C consists of two straight lines that cross one another. The equation for the hyperbola is

C: (XA)2 − (YB)2 = 1,

with positive parameters A and B. In the degenerate case, the equation of C becomes X2Y2 = 0, or |X| = |Y|. Note that whereas parabola has unbounded slope as X → +∞, a hyperbola has bounded slope, and, indeed, is asymptotic to the lines Y = ±(BA)X. Hyperbolae have an optical property, too. Whereas the optical property of the ellipse was that rays of light coming from one focus end up at the other, the optical property of the hyperbola is that rays of light coming from outside the hyperbola and directed at one focus are reflected to the other focus. There’s also a nice description of the hyperbola as the set of points with constant difference of distances to the two foci:

| dist((X, Y), F1) − dist((X, Y), F2) | = 2 dist(F1, F2).