Feeds:
Posts

## Archive for July, 2010

PRL report the theoretical discovery of a time-reversed laser, i.e., one that absorbs light monochromatically. Both the report and the paper itself are worth reading. Given the astounding breadth of laser applications in modern technology and industry, I find it very hard to believe that this is not a major advance.

## Public Speaking For Scientists 2a: Jokes

Having considered what scientists can learn from comedians, now is a good time to give a few guidelines on the place of humour in scientific talks. Here are my opinions on the matter:

1. No separate feedline

The classic joke format consists of a feedline and a punchline. Typically, the feedline invites the audience to make an assumption, which is then shown to be erroneous in the punchline. For example, here’s Jimmy Carr:

Feedline: I can’t forgive the Germans for the way they treated my grandfather during the war … (more…)

## Endorsement: Vote Greens in the Senate

(N.b. Especially with this post, these remarks reflect views that belong to Berian and should not be taken to reflect the views of the co-authors of this blog, or of the institutions with which Berian is affiliated.)

There will be an Australian Federal Election on the 21st of August. For the first time, I am endorsing support for the Australian Greens, particularly in the Upper House, and in the strongest terms I can muster. Regrettably, this is equal measure endorsement of some of their policies and dis-endorsement of the policies of the two opposing parties. I would like to explain, briefly, why I think the Greens deserve support and why this election in particular is a good opportunity to vote for them.

## Public Speaking For Scientists 2: How Comedians Do It

Continuing my series on public speaking for scientists, we look at what we can learn from comedians. There is no more attentive audience then that watching a world-class comedian. They are completely within his or her power, hanging off every word. The easy answer to the question “how do they do it?” is “by being funny”. While we could throw one or two jokes in, a scientist can’t turn a conference presentation into a comedy routine. However, there are a few lessons to be learned.

1. Learn From Others

The book “Comic Insights: The Art of Stand-up Comedy” by Franklyn Ajaye begins with these words:

“The first and most important step for anybody who wants to be a good and stand-up comedian is to make sure that you watch the good ones and study them intently”.

I found the same advice in almost every “how to be a stand-up comedian” guide. It’s what I’m attempting to do in these posts. Everyone knows what good public speaking is because we know when we have enjoyed a talk or lecture. So if you find yourself enjoying a talk or lecture, try to work out why you are enjoying it. Reflect on your undergraduate lectures: what made the good lecturers good and the bad lecturers bad? Remember: you are the audience. You are the ultimate judge of what constitutes good public speaking.

Here are some resources for great speakers: Great Speeches, American Rhetoric, some links, podcast.

2. Get Feedback (more…)

## Horse meat down under

It’s time to re-read one of Matt’s old posts, as Perth (Australia) butcher Vince Garreffa gets the all clear from the state authorities to vend equus, apparently under threat of death from grossed-out locals, who believe that their own revulsion is an effective argument against sale of the meat. I won’t expect to hear a butcher espouse meat-free eating, so instead I’ll praise Garreffa’s admirable consistency (h/t Crikey!):

“If somebody told me that we were going to start eating Jack Russells tomorrow, I would be horrified. I love my Jack Russell… but hey, emotions are one thing.”

This is a good demonstration of one of only two logically consistent positions with regard to eating meat.

P.S. Election time! Remember to check your enrollment with the AEC!

## Public Speaking for Scientists 1: Introduction

It’s time for another series – one that I promised quite a while ago. I’ve spent a lot of my life listening to scientists give talks in one form or another – four years of undergraduate lectures, a few weeks worth of conferences, a few hours a week in seminars and colloquia. I have long pondered this question:

Why are scientists, with precious few exceptions, such appalling public speakers?

(Thankfully, my fellow bloggers are some of the exceptions!)

At every public speaking course I’ve attended, the attendees have complained that the advice given was too obvious. And yet, as I think over all the public speaking “laws”, I can’t think of a single one that isn’t regularly broken by scientists in front of an audience. If you are a scientist (or even if you have sat through enough university lectures), how often have you witnessed:

• Speakers talking too quickly, too softly, and addressing their remarks to the front row
• Monotone voices, and a single speed of delivery
• No variety of content
• Speakers who don’t emphasize the important points, and present a summary slide that would take 5 minutes to read
• Talks that consist of a single, half-hour-long sentence, constructed by taking a normal talk and replacing all the full stops with “and”, “I mean” or “um”.
• Mindless, nauseating, impenetrable, replaceable jargon
• Half an hour of the back of the speakers head as he or she talks exclusively to the projector screen
• Plots displayed: too small, too crowded, too briefly, in invisible colours, with lines too thin to see
• Slides that look like an entire presentation has been swallowed and vomited back onto the screen
• A speaker whose every word, tone, gesture, posture, expression and slide betray their complete indifference to their audience?

Why does it feel like a chore to attend a talk about astronomy when I’m an astronomer? I am constantly flabbergasted by the ability of speakers to make a subject in which I am intensely interested sound incredibly dull. In a profession where getting your work known in the community, sharing your ideas and generally making a name for yourself is of great importance, why do so many care so little about being interesting, concise, non-coma-inducing?

The best way to learn is by example. Over the next few posts I’m going to look at five professions in which public speaking is held in high regard. They are: (more…)

## A Tale of Two Entropies

For those of us who work with degree-of-plausibility (“Bayesian”) probabilities, two situations regularly arise. The first is the need to update probabilities to take into account new information. This is usually done using Bayes’ Rule, when the information comes in the form of a proposition that is known to be true. An example of such a proposition is “The data are 3.444, 7.634, 1.227″.

More generally, information is any justified constraint on our probabilities. For example, “P(x > 3) should be 0.75″ is information. If our current probability distribution $q(x)$ doesn’t satisfy the constraint, then we better change to a new distribution $p(x)$ that does. This doesn’t mean that any old $p(x)$ will do – our $q(x)$ contained hard-won information and we want to preserve that. To proceed, we choose the $p(x)$ that is as close as possible to $q(x)$, but satisfies the constraint. Various quite persuasive arguments (see here) suggest that the correct notion of closeness that we should maximise is the relative entropy:

$H(p; q) = -\int p(x) \log \frac{p(x)}{q(x)} dx$

With no constraints, the best possible $p(x)$ is equal to $q(x)$.

Another situation that arises often is the need to simplify complex problems. For example, we might have some probability distribution $q(x)$ that is non-Gaussian, but for some reason we only want to use Gaussians for the rest of the calculation, perhaps for presentation or computational reasons. Which Gaussian should we choose to become our $p(x)$? Many people recommend maximising the relative entropy for this also: in the literature, this is known as a variational approximation, variational Bayes, or the Bogoliubov approximation (there are also variations (pun not intended) on this theme).

There are known problems with this technique. For instance, as David MacKay notes, the resulting probability distribution $p(x)$ is usually narrower than the original $q(x)$. This makes sense, since the variational approximation basically amounts to pretending you have information that you don’t actually have. This issue raises the question of whether there is something better that we could do.

I suggest that the correct functional to maximise in the case of approximating one distribution by another is actually the relative entropy, but with the two distributions reversed:

$H(q; p) = -\int q(x) \log \frac{q(x)}{p(x)} dx$

Why? Well, for one, it just works better in extreme examples I’ve concocted to magnify (a la Ed Jaynes) the differences between using $H(p; q)$ and $H(q; p)$. See the figure below:

If the blue distribution represented your actual state of knowledge, but out of necessity you could only use the red or the green distribution, which would you prefer? I find it very hard to imagine an argument that would make me choose the red distribution over the green. Another argument supporting the use of this ‘reversed’ entropy is that it is equivalent to generating a large number of samples from q, and then doing a maximum likelihood fit of p to these samples. I know maximum likelihood isn’t the best, most principled thing in the world, but in the limit of a large number of samples it’s pretty hard to argue with.

A further example supporting the ‘reversed’ entropy is what happens if $q(x)$ is zero at some points. According to the regular entropy, any distribution $p(x)$ that is nonzero where $q(x)$ is zero, is infinitely bad. I don’t think that’s true, in the case of approximations – some leakage of probability to values we know are impossible is no catastrophe. This is manifestly different to the case where we have legitimate information – if $q(x)$ is zero somewhere then of course we want to have $p(x)$ zero there as well. If we’re updating probabilities, we’re trying to narrow down the possibilities, and resurrecting some is certaintly unwarranted – but the goal in doing an approximation is different.

Maximising the reversed entropy also has some pretty neat properties. If the approximating distribution is a Gaussian, then the first and second moments should be chosen to match the moments of $q(x)$. If the original distribution is over many variables, but you want to approximate it by a distribution where the variables are all independent, just take all of the marginal distributions and product them together, and there’s your optimal approximation.

If $H(p; q)$ isn’t the best thing to use for approximations, that means that something in the derivation of $H(p; q)$ applies to legitimate information but does not apply to approximations. Most of the axioms (coordinate independence, consistency for independent systems, etc) make sense, and both entropies discussed in this post satisfy those. It is only at the very end of the derivation that the reversed entropy is ruled out, and by some pretty esoteric arguments that I admit I don’t fully understand. I think the examples I’ve presented in this post are suggestive enough that there is room here for a proof that the reversed entropy $H(q; p)$ is the thing to use for approximations. This means that maximum relative entropy is a little less than universal, but that’s okay – the optimal solutions to different problems are allowed to be different!