So this morning was unusual not in my having an interesting idea, but in pursuing it. This idea, not likely to be wholly or even partly original—I don’t want to search through journal articles only to disappoint myself just yet, & wouldn’t know where to look anyway—is about the clustering of points.

(This sort of thinking is something I do a lot of; when introducing myself at meetings, if there are too many people who say they study ‘the clustering of galaxies’, I usually wheel out ‘the clustering of points that represent galaxies’, ostensibly to demonstrate my understanding of the difference between theory and observation. Sometimes folks chuckle. But I digress: one may presume in what follows that I am talking about galaxies, or dark matter haloes, or whatever you like, really.)

The Delaunay triangulation is not unheard of in studies of cosmological structure. More common, though, is the use of its dual graph; see, *e.g.*, Marinoni *et al.* (2001) for one application. My assumption in using these statistics—which I am going to call statistics of the triangulated galaxy distribution, to distinguish them from, say, statistics of the smoothed galaxy distribution or indeed the raw distribution of points itself—is that when points are more clustered (relative to a Poisson distribution) this will show up as a relative overabundance of distorted triangles (or Voronoi cells or some such). Voronoi methods used in group finding don’t really focus on this, instead using adjacent Voronoi cells to associate points as members of a larger structure. I know that Voronoi methods at least are mentioned in the book by Martinez, which I don’t have to hand, but I can’t recall about triangulation.

But enough introduction! Here is the plan of attack: for a distribution of points (in two dimensions here for ease of presentation, but things generalise), calculate the Delaunay triangulation (easy), then characterise each triangle with a handful of functionals (should be easy; haven’t really solved it yet), then determine how the distributions of such functionals differ for clustered and non-clustered distributions (we’ll come back to that one). Matlab will once again be the tool of choice.

Question: what is the complete set of quantities that characterise the triangle for the task at hand? To be clear: one isn’t interested in the orientation of the triangle (though the relative orientation of adjacent triangles might be worth pursuing at a later time), which is too ‘geometric’ to be of use here, or in things like, say, the number of sides, which is obviously telling us only about the definition of the word ‘triangle’. Here is a great point to get out a pencil and try the exercise for yourself. When you’re ready, I’ll describe what I tried. After that, you can leave corrections in the comments.

So, this is what I tried at first: I think it is obvious that area is going to be one of the quantities. I think that perimeter might be one as well. I mean, they’re different aren’t they? Well, let’s see. Here is the m-script:

%% Delaunay triangle playground #1 % Written in Matlab 04/11/08 by J. Berian James clear all; close all; clc; format compact; % This is known as 'Brewer booting' the script. %% Make random points np = 1e3; x = rand(np,1); % Or use randn for a simple non-uniform distribution y = rand(np,1); %% Calculate and plot triangulation TRI = delaunay(x,y); nt = numel(TRI)/3; triplot(TRI,x,y); %% Get triangle areas and perimeters for i = 1:nt % First, get co-ordinates c1 = [x(TRI(i,1)) y(TRI(i,1))]; c2 = [x(TRI(i,2)) y(TRI(i,2))]; c3 = [x(TRI(i,3)) y(TRI(i,3))]; % Then, get lengths of triangle sides v1 = sqrt( sum( (c1-c2).^2 ) ); v2 = sqrt( sum( (c2-c3).^2 ) ); v3 = sqrt( sum( (c3-c1).^2 ) ); % Now get meta-statistics p(i) = v1 + v2 + v3; s = p(i)/2; % Semi-perimeter A(i) = sqrt( s*(s-v1)*(s-v2)*(s-v3) ); % Area (Heron's formula) end

I hope that is clear enough; I guess the only quirks are the use of Heron’s formula (thanks Maths Challenge!) and to point out that TRI is an (nt x 3) structure containing the indicies of the points that make up the vertices of each of the nt triangles in the triangulation! Phew! I’ve used this code twice: the first time using rand to generate the points uniformly; the second time, using randn for a normal distribution.

So what happens? Well, here are the triangulations:

There are obvious visual differences: the uniform sampling is pretty in a stained-glass sort of way, while the normal distribution triangulation looks like those webs made by caffienated spiders. Next, I measure the distributions of areas and perimeters of the member triangles, normalising by the total volume of the convex hull and total length of the triangular mesh respectively. The results, in log scale (here using 10^5 points to beat down the noise):

Well what now? I have two questions:

- Are area and perimeter the whole story? There is a result, from memory due to Hadwiger (1957), that an (n-1)-dimensional surface embedded in n-dimensional space can be characterised completely up to topological equivalence by (n+1) functionals. I am aware of this via the n = 3 case arising in the context of the topology of the smoothed galaxy distribution, where the functionals are volume (boring!), area (boring!), integrated mean curvature (hmm!) and integrated Gaussian curvature (a.k.a. the genus statistic; winner!). Is there something similar to be used here? At what level is one trying to characterise this triangle, or the triangulation-at-large? Specifying the locations of the vertices of the triangle is too much—one doesn’t care about absolute location or absolute orientation—but I’m not convinced that two numbers does the trick either. And even then, are area and perimeter the best choices?
- With that decision made, how to proceed? If differences are apparent even with basic distributions, it seems reasonable to conclude that it might be possible to characterise those differences either in relation to the underlying probability fields (by, say, using cumulants, Edgeworth expansion-style) or perhaps to polyspectra. Applying any of this to data would be difficult.

That is as far as I can take this in a few hours of work, and it’s enough fun for one day. It’s poll time!

on November 7, 2008 at 7:16 am |OwenHe he, Brewer booting – sounds like there’s a story behind that one!

I am forced to wonder whether the n+1 functionals are unique i.e. you say for n=3 the functionals are volume etc. Are these the only functionals that work? or could there be a different set of 4 functionals (that we perhaps haven’t invented yet, and certainly would be more difficult to work with than volume and area) that would do the trick?

This may be hampered by my lack of understanding of what, exactly, a ‘functional’ is.

However, it certainly seems that one could definitely get somewhere with this – if the points are more ‘clustery’ then you’d get relatively more triangles with smaller area.

on November 7, 2008 at 11:08 am |Brendon BrewerSounds promising. Does this only work in two dimensions?

Choice of the “best” summary statistics is an interesting topic that I have some philosophical thoughts on, but I haven’t gone much further than that.

Oh, and I picked up the ‘Brewer booting’ habit from Nigel Marks. 🙂

on November 12, 2008 at 10:50 pm |T. J. McKenzieArea and perimeter aren’t enough on their own. Imagine drawing the base of a triangle first. The endpoints of the base will be two of the vertices. Then, to get the right area (half base times height), you need to get the height right, so the third vertex must be on a line parallel to the base and at the correct distance from it. Varying the position of the vertex along this line will vary the perimeter continuously without changing the area, so as long as the length of the base wasn’t absurd, you should be able to get whatever perimeter you want. In particular, there will probably be infinitely many lengths that you can choose for the base, without making it impossible to get the correct area and perimeter.

I don’t think any two (real) numbers will be sufficient, if you want things to be nice and continuous. You mentioned that six numbers would specify the coordinates of all three vertices, but that this is more than you want. The six numbers put us in a 6-dimensional something-or-other over the reals. Because you’re not interested in the absolute position, we can throw away two dimensions, since we could recover the position of the triangle by specifying the x and y coordinates of a particular vertex. Since you’re not interested in the orientation, either, we can throw away another dimension, since we could recover the orientation by specifying an angle. This leaves us in a three-dimensional something-or-other over the reals, so I think you’ll need three real numbers to characterize the shape and size of the triangle.

Which three numbers? Area, perimeter, and length of the shortest side should do it. Then you can reconstruct the original triangle as in my first paragraph. The length of the longest side would also work. Or, if you’re worried about whether you might end up reconstructing the mirror-image, then you could take the length of the side immediately anticlockwise from the longest side. You might also be able to do it by taking the area, perimeter, and geometric mean of the lengths of the sides, or any other mean (except the arithmetic mean, which duplicates information that the perimeter gives you). You can certainly do it if your three numbers are the lengths of the sides of the triangle.

My instinct suggests that any three real numbers should be sufficient, as long as they don’t duplicate information you already have. Obviously, I haven’t been very rigorous about what I mean by “duplicating information”, but hopefully its intuitive meaning is relatively clear and not too misleading.

on August 6, 2013 at 6:48 am |WEb Design MinneapolisHello there! I could have sworn I’ve been to this blog before but after checking through some of the post I realized it’s new to me.

Anyhow, I’m definitely delighted I found it and I’ll be bookmarking and checking back

often!