The Weak Mordell-Weil Theorem

Let A be an abelian variety over a field K. A basic object to investigate is the group $A(K)$. Let K be a number field. In light of the Birch and Swinnerton-Dyer conjecture and it’s relation to the class number formula, one should think of A(K) as the analog of the class group of a global field.

Thus, one might conjecture some finiteness properties of this group. It is not true that A(K) is finite as can bee seen by looking at some examples of elliptic curves but it is true that A(K) is finitely generated as an abelian group and this is the content of the Mordell-Weil Theorem.

The proof is usually broken up into two parts:

  • Weak Mordell-Weil Theorem: A(K)/nA(K) is finite for any integer n.
  • Descent using a Height Function: Deduce the full theorem from the above using a measure of size on the points of A(K).

I will focus on the first part in this section and prove it in a motivated (but sophisticated) fashion. This proof is largely the same as the one given in Milne’s Elliptic Curves notes but I find the current presentation far easier to understand.

I will free use general theory about Abelian Varieties, Algebraic geometry and Galois Cohomology. The point of this post is not to fill in the details but to show a framework that makes the proof seem natural.

The Kummer Sequence:

Recall that K is a number field and A is an abelian variety over it. I will denote by A[n] the kernel of the multiplication by n map on A. I will often identify A,A[n] with the \overline K points of A,A[n] respectively. Also denote by G_K the Galois group of \overline K over K.

Now, we can consider the exact sequence of G_K modules:

0 \to A[n](\overline K) \to A(\overline K) \xrightarrow{\times n} A(\overline K) \to 0

Taking Galois invariants, we get the long exact sequence:

0\to A[n](K) \to A(K) \xrightarrow{\times n} A(K) \to H^1(G_K, A[n](\overline K)) \to \dots

and truncating the series, we have an injection:

0 \to A(K)/nA(K) \to H^1(G_K,A[n](\overline K)).

So, we now see that to show that A(K)/nA(K) is finite, we simply need to show that H^1(G_K, A[n](\overline K)) is finite. Unfortunately, this is not true! However, we will show that A(K)/nA(K) actually lands in a subgroup of the cohomology group and this subgroup can easily be shown to be finite.

Ramification of Galois Cohomology groups:

A little more precisely, let S be a finite set of primes of K and G_{K,S} the Galois group of the extension of K that is unramified away from S. In other words, for a prime \mathfrak p not in S with inertia group I_{\mathfrak p} \subset G_K, I_{\mathfrak p} is in the kernel of the quotient map G_K \to G_{K,S} and this characterizes G_{K,S}.

We want to show that there is a finite set of primes S such that the image of A(K)/nA(K) lands in H^1(G_{K,S}, A[n](\overline K)) (Note that this is a canonical subgroup of H^1(G_{K}, A[n](\overline K)) by the restriction map. ) This is by far the hardest part of the proof:


Controlling the image of the boundary map:

To do this, let us examine the first boundary map in the Kummer sequence more closely. For an element x \in A(K), the boundary map takes x to a cocycle f_x in the following way:

Pick a y \in A(\overline K) such that ny=x and define f_x(g) = gy - y. One can check that this is independent of the choice of y and that it is indeed a cocycle.

Now we see that for the image to land in H^1(G_{K,S}, A[n](\overline K)), we need to be able to find, for each x \in A(K), a y \in A(K), ny = x such that the inertia groups at primes away from S fix y.

We will pick S to be the set v of places of K where either A has bad reduction or v lies above n.  Now, fix a valuation v not in S and let \tilde K_v denote the maximal unramified extension of the local field K_v. If the residue field of K at v is k_v, then the residue field of \tilde K_v is \overline k_v.

Since \tilde K_v is henselian, the reduction map A(\tilde K_v) \to A(\overline k_v) is surjective. Furthermore, we have chosen S precisely so that the reduction map is a bijection on the n– torsion and so the kernel of the reduction map M has no n– torsion. (One could also see this by looking at formal groups).

That is, we have the following diagram:


By what we have said above, the first vertical map is injective while the other two are surjective.  The following easy lemma is all we need to complete the proof:

Lemma: Let P \in A(\tilde K_v) with reduction P' \in A(\overline k_v). Then P = nQ if and only if P' = nQ'.

Proof: One direction is trivial. Assume then that there exists Q'  such that P' = nQ' and lift Q' to some element Q'' of $A(\tilde K_v)$. Then, P-nQ'' maps to 0 under the reduction map and so lies in the kernel M. However, this is a n- divisible group and so we can find some R \in M such that P-nQ'' = nR or in other words, P = n(Q'' + R).


Therefore, we can define our cocycle by \sigma \to \sigma(Q''+R) - (Q'' + R) and the inertia group at v maps to 0 by construction. Since v was arbitrary, we are done with this part of the proof. We only need to show


Finiteness of H^1(G_{K,S}, A[n](\overline K)):

Since there are only finitely many points in A[n](K) and A[n] is etale over a residue field away from S, we can find a finite, unramified away from $S$, extension L of K that splits H^1(G_{K,S}, A[n](\overline K)). That is, H^1(G_{L,S}, A[n](\overline K)) = Hom(G_{L,S}, A[n](\overline K)). Moreover, this is finite by Class field theory since we are looking for abelian extensions with bounded degree (and hence bounded ramification).

Now, the rest of the proof follows easily on considering the inflation-restriction sequence:

 0 \to  H^1(G_{L,S}, A[n](\overline K)) \to H^1(G_{K,S}, A[n](\overline K)) \to H^1(Gal(L/K), A[n](\overline K))

The first set is finite as discussed above while the third group is finite since both the group and the module are finite. This establishes finiteness in the middle as required.



Closing Thoughts:

This proof is, in spirit, exactly the same as the one in standard references like Milne that proceed by bounding the Selmer group. However, they tend to use very little Cohomological machinery and do everything by hand. I find the presentation here to be much easier to understand and haven’t seen it elsewhere.



Congruent Numbers and Elliptic Curves

A congruent number n is a positive integer that is the area of a right triangle with three rational number sides. In equations, we are required to find rational positive numbers a,b,c such that:

\displaystyle a^2+b^2 = c^2    and    \displaystyle n = \frac12 ab.                       (1)

The story of congruent numbers is a very old one, beginning with Diophantus. The Arabs and Fibonacci knew of the problem in the following form:

Find three rational numbers whose squares form an arithmetic progression with common difference k.

This is equivalent to finding integers X,Y,Z,T with T\neq 0 such that Y^2 - X^2 = Z^2 - Y^2 = k which reduces to finding  a right triangle with rational sides

\displaystyle \frac{Z+X}{T}, \frac{Z-X}{T}, \frac{2Y}{T}

with area k. This is the congruent number problem for k. The Arabs knew several examples of congruent numbers and Fermat stated that no square is a congruent numbers. Since we can scale triangles to assume that n is square free, this is equivalent to saying that 1 is not a congruent number.

As with many other problems in number theory, the proof of this statement had to wait four centuries for Fermat. The problem led Fermat to discover his method of infinite descent.

In more recent times, the problem has been fruitfully translated into one about Elliptic Curves. We perform a rational transformation of the defining equations (1) for a congruent number in the following way. Set x = n(a+c)/b and y = 2n^2(a+c)/b. A calculation shows that:

\displaystyle y^2 = x^3 - n^2x.                                          (2)

and y \neq 0. If y = 0, then a=-c and b = 0 but then n = \frac12 ab = 0. Conversely, given x,y satisfying (2), we find a = (x^2-y^2)/y, b = 2nx/y and c = x^2+y^2/n and one can check that these numbers satisfy (1).

The projective closure of (2) defines an elliptic curve that we will call E_n. We are interested in finding rational points on it that do not satisfy y=0. I will prove that n is a congruent number precisely when E_n has positive rank.

The proof is an interesting use of Dirichlet’s Theorem on Arithmetic Progressions and some neat ideas about Elliptic Curves and their reductions modulo primes. I will essentially assume the material in Silverman’s first book and the aforementioned Dirichlet’s Theorem.

So far, we know that n is a congruent number if and only if E_n has rational points (x,y) with y \neq 0. Recall that for an elliptic curve in the standard Weierstrass form (as in (2)), y = 0 if and only if (x,y) has 2-torsion.

Therefore, our problem reduces to showing that the only torsion of E_n is 2-torsion for all n. In fact, we always have non-trivial $2$-torsion and the points are given by (0,0), (n,0),(-n,0) and the point at infinity.

Denote the m-torsion my E_n[m]. The rough outline of the proof is as follows:

  1. E_n[m] maps injectively into the reduction of E_n modulo a prime p for all but finitely many primes.
  2. The number of \mathbb F_p points of E_n is independent of p and equal to p+1 whenever p \equiv 3 \pmod 4.
  3. This would imply that m|p+1 for a set of primes $p$ of density $1/2$ but by Dirichlet’s Unit Theorem, the set of such primes is of density 1/\varphi(m) < 1/2 for all m>4.

Step 1:

Recall that the size of E_n[m](\mathbb Q) is finite (and has exactly m^2 elements in \overline{\mathbb Q}. Also, recall that if (x_1,y_1,z_1) and (x_2,y_2,z_2) are two points in the projective plane (over any field k), then they are equal if and only if

\displaystyle x_1y_2-x_2y_1, x_1z_2-x_2z_1, y_1z_2-y_2z_1    (3)

are all 0. Thinking of our E_n has embedded in the projective plane, reduction modulo p is simply reduction on each of the co-ordinates. Therefore, if we pick any finite set of \mathbb Q points on $E_n$, then for any prime p greater than any of the prime divisors of (3) , it’s reduction will be non-zero.

Since E_n[m] is a group, this is sufficient to show that the reduction map is injective for all but finitely many primes p (once we fix m).

Step 2:

Fix a prime p. One could prove this by an explicit calculation involving quadratic characters but there is a neater way assuming some knowledge about the endomorphism ring R_n =   \mathrm{End}_{\mathbb F_p}(E_n). The relevant facts are the following:

  1. Over a finite field \mathbb F_q, the endomorphism ring is either an order in a quadratic imaginary field or an order in a quaternion algebra.
  2. Further, over \mathbb F_p with p>5, the latter case occurs precisely when the number of elements on the curve is p+1 and the curve is called supersingular.
  3. In either case, for any endomorphism f, there is a dual \hat f such that N(f) =  f\circ\hat f is multiplication by the degree of f, which is always non-negative.

A few words about the above statements: They are in true for all Elliptic curves and not just E_n. A reference for all of the above is the chapter on Finite Fields in Silverman’s first book on Elliptic Curves.

(3) is true over any field. The function N occurring in (3) is the usual norm function on a number field or a quaternion algebra.

One can show using the Tate Module that the dim of R_n\otimes \mathbb Q is at most 4 over \mathbb Q. Since there is always a Frobenius element \varphi such that N(\varphi) = p, this rules out the case of \mathbb Z where the N function maps to the squares. This proves (1).

A sufficient (and necessary) condition for R_n being 4 dimensional is that \hat\varphi be inseparable but this implies a_p = \varphi + \hat\varphi \equiv 0 \pmod p. As a consequence of the Hasse-Weil inequality which says a_p \leq 2\sqrt p, this would show that a_p = 0 p>5. One further proves that |E_n(\mathbb F_p)| = p+1-a_p which shows (2).

Now, given the (3) statements, it is easy to complete step 2. Note that (x,y,z) \to (-x,iy,z) is always an endomorphism of E_n as long as i = \sqrt{-1} is defined in \mathbb F_p, equivalently, as long as p\equiv -1 (4). Therefore, if R_n were two dimensional, it would have to be an order in \mathbb Z[i].

As mentioned before, we also have the Frobenius \mathbb \varphi with norm p. This shows that \varphi is a prime in the ring. However, the norm of any prime element in \mathbb Z[i] is either a square or congruent to 1 modulo 4. This forces R_n to be 4-dimensional and we are done by (2).

Step 3:

If E_n(\mathbb Q) has any torsion point P that is not 2-torsion, then we can find some point Q of order more than 4. Let this order be m.

By step (1), Q has order m for all but finitely many primes and therefore, since E_n(\mathbb F_p) is a group of size p+1 (for p\equiv 4\pmod 4 by step (2)), m|p+1 for all but finitely many primes p \equiv -1 \pmod 4.

Since the (Dirichlet) density of primes of the form 3k+4 has density 1/2 and a finite set of primes does not contribute to (Dirichlet) density, we can put this differently in the following way:

The  Dirichlet density of primes p \equiv -1\pmod m is at least 1/2.

However, we know that the density of primes of the form km+1 is exactly 1/\varphi(m) which is strictly less than 1/2 whenever m > 4. This provides the required contradiction and completes the proof.

The statements about the densities of various primes above all follow from Dirichlet’s Theorem on Arithmetic Progressions.


I would like to end by talking about a conjectural algorithm to detect congruent numbers. It is called Tunnell’s algorithm and is based on the idea above: A number n is congruent if and only if E_n(\mathbb Q) has positive rank.

The algorithm is easy to describe (and execute) but it’s correctness depends on a very deep theorem about Elliptic Curves (the Birch and Swinnerton-Dyer conjecture).

The algorithm is as follows: For a square free integer n, define:

{\begin{matrix}A_{n}&=&\#\{(x,y,z)\in {\mathbb  {Z}}^{3}|n=2x^{2}+y^{2}+32z^{2}\}\\B_{n}&=&\#\{(x,y,z)\in {\mathbb  {Z}}^{3}|n=2x^{2}+y^{2}+8z^{2}\}\quad \\C_{n}&=&\#\{(x,y,z)\in {\mathbb  {Z}}^{3}|n=8x^{2}+2y^{2}+64z^{2}\}\\D_{n}&=&\#\{(x,y,z)\in {\mathbb  {Z}}^{3}|n=8x^{2}+2y^{2}+16z^{2}\}.\end{matrix}}

Tunnell proved unconditionally that if n is an odd congruent number, then 2A_n = B_n and if n is an even congruent number, then 2C_n = D_n. The converse is true assuming the Birch Swinnerton-Dyer conjecture.

More precisely, if we denote by L (s)= L(E_n,s) the L-function for E_n over $\mathbb Q$ at s=1, recall that the Birch-Swinnerton Dyer conjecture states that the order of vanishing of L(s) at s = 1 is the rank of E_n(\mathbb Q).

Therefore, n is a congruent number if and only if L(E_n,s) = 0. What Tunnel showed was that:

L(E_n) = \begin{cases}\gamma(2A_n-B_n) & n \text{is odd}\\\gamma(2C_n-D_n) & n \text{ is even}\end{cases}

where \gamma is a non-zero constant. Note that E_n always has complex multiplication over \mathbb Q since (x,y,z) \to (-x,iy,z) is an automorphism of order 4.

The unconditional direction of Tunnel’s criterion follows from the following theorem of Coates-Wiles in 1976:

Theorem[Coates-Wiles(1976)] If an Elliptic Curve over \mathbb Q has complex multiplication by a ring of integers with class number 1 and has positive rank over \mathbb Q, then the corresponding L-function vanishes at s=1.


Descent on Vector Spaces and Cohomology

It is quite often of interest to study the properties of some variety {X/\mathbb Q}. However, it is generally much easier to study varieties over algebraically closed fields and so we need some way of translating a property of {X_{\overline{\mathbb Q}}/\overline{\mathbb Q}} to {X/\mathbb Q} . This idea is known as descent and in this post, I would like to say a little bit about the simplest example of descent – over vector spaces.

Let {L/K} be an extension of fields and {V} a vector space over {K} . Consider {W = V\otimes_K L } . The Galois Group {G = \mathop{Gal}(L/K)} acts on {W} through the second factor one can consider the {K} -vector space {W^G} . This is the vector space fixed by {G} .

Theorem 1 (Descent of Vector Spaces) The natural map {W^G\otimes_K L \rightarrow W} is an isomorphism. In particular, if {W} is finite dimensional, then {\dim_K W^G = \dim_L W} .

It is not hard to prove this theorem directly but I would like to relate it to another theorem. This is also well known and is a generalization of the famous Hilbert’s Theorem 90. Let {{GL}_n(L)} be the group of invertible {n} -dimensional matrices over {L} and consider the cohomology {H^1(G,{GL}_n(L))} . This is not a group unless {n=1} since {{GL}_n(L)} is non-commutative in general. However, it is a pointed set and we have the following theorem:

Theorem 2 (Hilbert’s 90) \displaystyle H^1(G,{GL}_n(L)) = \{0\}.

Hilbert stated the above theorem (in a disguised form) for {n=1} and {L/K} a finite cyclic extension. Noether generalized the theorem to arbitrary extensions. I do not know who is responsible for the generalization to general linear groups but I saw this theorem first in Serre’s “Galois Cohomology”.

In this post, I will show that the above theorems are equivalent on the following sense:

Let {W} be a {L} -vector space. We will say that a group {G} acts semi-linearly on it if \sigma(lv) = \sigma(l)\sigma(v) \text{ for all }\sigma \in G.

The typical example is when {G = \mathop{Gal}(L/K)} acts co-ordinate wise on {W = L^n} or equivalently {W = V\otimes_K L} for a {K} -vector space {V} . We will show that this is essentially the only example by proving:

Theorem 3 There is a bijection:

\displaystyle H^1(G,{GL}_n(L)) \longleftrightarrow \frac{\{\text{n-dimensional L- vector spaces with semilinear G-action}\}}{\text{isomorphisms}}

Proof: I will use {x^g} to denote {g} acting on {x} throughout:

Let us first establish the maps. Given a 1-cocyle {\eta:G \rightarrow \ mathop{GL}_n(L)} , let the corresponding vector space {W_\eta} be {L^n} with the action for {(g,w) \in G\times W} being given by {(g,w) \rightarrow \eta_g(w^g)} where {w^g} stands for the action of {g} co-ordinate wise. It is easy to verify that this is well defined:

Given {l\in L} , {\eta_g((lw)^g) = \eta_g(l^gw^g) = l^g\eta_g(w^g)} . This shows that the action is semi-linear. Then, given a one-cocycle cohomologous to {\eta} (that is, given {\tau_g = A^{-1}\eta_g A^g} ), we have the following isomorphism of vector spaces with group actions given by:

\displaystyle W_\eta \rightarrow W_\tau, w \rightarrow A^{-1}w.

That is, {A\tau_g((A^{-1}w)^g) = \eta_g(w^g)} as can be easily checked. This establishes that {\eta \rightarrow W_\eta} is well defined.

To construct the inverse, let {W} be a {L} -vector space with a semilinear {G} -action. Fix a basis {e_1,\dots, e_n} . Denote the column vector corresponding to this basis as {[e]} . Define {\eta_g} to be the unique transformation such that {\eta_g[e]^g = [e]} . To check that this is a 1-cocycle, note that:

\displaystyle \eta_g\eta_h^g[e]^{gh} = \eta_g(\eta_h[e]^h)^g = \eta_g[e]^g = [e]

and hence by uniqueness {\eta_{gh} = \eta_g\eta_h^g} .

Finally, it is easy to verify that these maps really are inverses. \Box

To see that we really have shown that Theorem 1 and Theorem 2 are equivalent, note that Theorem 1 is clearly true for {L^n = K^n\otimes_KL} . Then, if {H^1(G,{GL}_n(L)) = \{0\}} , there is a unique {L} -dimensional vector space with a semi-linear action and we can check Theorem {1} on this unique vector space.

Conversely, if Theorem {1} is true, then {H^1(G,{GL}_n(L))} is a one-element set.

Proofs for Theorem 1 and Theorem 2 can be found in many places. Serre’s Galois Cohomology is a good place to read about Group Cohomology generally and Theorem 2 in particular.

UPDATE: I later discovered that the content of this post is Exercise 1.9 in Poonen’s “Rational Points on Varieties”. He also proves Theorem 1 as Lemma 1.3.10.