A New Year of Study Begins!

September has come around already!

My virology work placement came and went so quickly. Great experience. I was certainly not expecting it to be such a creative process. Or should I say "necessarily creative process". It seems with research like this, you really do need to be mindful of other results popping out of your work. If the tangent appears to be more important/meaningful than the original work, then it's best to follow it!

Of course this creative approach gives rise to a bit of a problem. The project has the potential to meander. I suppose this would be fine on a longer time scale, but for my all-too-brief 8 weeks it meant I wasn't able to neatly draw a line under it by the end of the placement.

Though having said that, the work that I was looking at wasn't time-dependant so I still have however long I want to complete the project in my spare time. After this next year of study I'd love to return to it.

Speaking which... my next year of study has started! I won't be informed of my assigned tutor for about another month, but I have my learning materials. So learning has begun! This year it's complex analysis. Very excited to be looking at this subject, and to be writing up the areas I have difficulty with on here.

(Side note: Wow, I've been writing on this blog for 5 years?!)

A Brief Change

Normally I'd take a well-earned break from mathematics during the Summer. Recharge for my next module that starts up in September.

Not this year!

This year, I've managed to take a short career break from my normal job to work as a work placement student in mathematics research!

So for eight weeks I'll be getting a taste of real life mathematics research! I've been lucky enough to be accepted into the mathematical biology research group at the University of York. Specifically, I'll be looking at mathematical virology, but the relevance to the current times is purely by chance: I first started arranging this placement about a year ago.

In my placement I'll be using group theory and linear algebra to produce predictions of virus structure.

It seems that all viruses appear to have the same symmetry as an icosahedron. But it turns out that you can find more icosahedral symmetry by translating an icosahedron along its axes of symmetry to create a larger non-crystallographic structure. When you do this according to strict rules, it turns out that you can start to predict overall virus structure. You can predict not just what it looks like on the outside, but what it may look like inside too.

I'm very early on in the position, but it's already fascinating. I'll be updating here when I can about how I get on with the experience.

Depth of Field

I'm glad I finally managed to get this working. I have almost no experience in real-time graphics like these, so this was a bit of a battle. Really interesting effect though. It's not perfect, as it's some kind of 2D blur process, but I think it's effective for what it is!

Your browser does not support the canvas tag. This is a static example of what would be seen.

GL Lines

Thought I'd play a bit more with WebGL through three.js. I wanted to play with lines a bit more, as wireframe just looks cool.

Your browser does not support the canvas tag. This is a static example of what would be seen.

Exams

I had my statistics exam at the beginning of the week.

This exam was weirder than most. Because of the pandemic, the OU had decided to turn the usual three-hour sit down exam into an "end-of-module" assignment that could be done at home within a period of 24 hours.

As soon as I heard this news, I had mostly negative feelings. The exam at the end of a 9-month module is a chance to really show off what you've learned. In three hours you have to recall, at speed, a very large assortment of problem solving skills and take full advantage of nurtured intuition. Some people can just stroll into an exam and do well, but I need to work very hard to walk into that exam hall with any confidence.  Preparing for these exams for me is like training for a marathon, or a mountain climb. It's exhausting.

I start attempting past papers under exam conditions on Saturdays and Sundays four to five weeks before exam day. Then the week before the exam I take a whole week off work to spend practically ten days straight doing past papers, marking them harshly, then reviewing them and revising further.

By the time I arrive at the exam in the exam hall in June, those three hours feel exactly like I'm running that marathon or climbing that mountain I've been training for.

Walking out of an exam realising like you were prepared and knowing it's all over is an enormous feeling. The final punctuation of nine months hard work.

Hearing that that wasn't happening this year was a let down. I'd be denied completing my marathon.

Though despite the fact that the "exam" was to be completed at home, I trained just the same. To the point where I felt I couldn't have been more prepared. I was comfortable and determined to complete the at-home exam in three hours regardless of how long I was given.

However.

On the day, I downloaded the exam pdf. I scrolled through it. And I realised that they had changed the distribution of the questions in the sections just enough that I was not prepared for it in the way that I was hoping. For the past seven years of past papers, you could guarantee certain topics would appear.

Not here.

For the past seven years, you should guarantee that within each topic, you'd be given a certain set of sub-topics.

Not here.

Immediately I was glad that I was not running my marathon. If I had been, I would have had to be stretchered away from the starting line by medics.

That day was a battle. Over the entire course of the exam (which took way way longer than three hours) I thought "how was I not prepared for this?". It shook me, and it was the only thing I could think about.

Here, in the third and final stage of this degree, I may have found that there is something fundamentally wrong with the way I learn.

Being kind to myself, this was generally a hard exam. It was statistics, which by its nature is non-intuitive (a lot of people find it so anyway). I did think that this module was aimed at students studying an actual degree in (just) statistics, and that I probably didn't have the background knowledge that other stats students did. And as my mathematician friend has pointed out, it's unlikely it was hard for just me. If an exam is hard, it's generally hard for everyone.

So where do I go from here? It's difficult isn't it. Amongst those 500 or so pages I learn from, should I pay attention and make notes on 'the fleeting comments on page 274 that I never got tested on once and seemed insignificant'? ......Regardless, it seems my revision technique as it stands isn't sufficient.

Assuming I will be in an actual physical exam hall for three hours in June 2021 for my complex analysis exam, I need a better revision strategy.

Statistics Assignment 3

So I've finally completed all my assignments! I've just had the very last one returned to me and again, although I did well, there are very obviously some areas for improvement:

 

Models for Populations

Very small thing here, but when asked to describe the shape of the age-specific death rate, it's important to describe it in terms of the rate (of change).

 

Genetics

Wow. My weakest area by a long way here... It'd be wise for me to avoid any exam questions on genetics... But for the moment, let's examine what I did wrong to try and understand the assignment questions better at least. (It's mainly an issue with conditional probabilities).

Proportions

When calculating the probabilities of genotype combinations of parents, if you're given the genotype of one parent you don't need to use it in the calculation! eg: Despite the proportion of a genotype in a population being 0.2, if you're given the genotype of one parent, then the chance of them being that genotype is 1.0, not 0.2! Making a mistake like this obviously has a knock-on effect on working out probabilities for the children's genetics, insofar as the probabilities of the children will be incorrect too.

But I compounded my issue with the children. It took me a while to review the next bit to work out where I went wrong, but here we go...

When working out the parent-child genetics, you start by working out two sets of probabilities:

  1. The probability of the parents being certain combinations of genotype (easy in this specific case, as the probability of one parent is 1.0). The probability of the parent of unknown genotype follows from the Hardy-Weinberg law. We'll call the probability of all the mating types P(E_{i}).
  2. The offspring probabilities, which follow from Mendel's first law. Though we'll talk about them in terms of phenotype, so P(\text{Hilary} M) means the probability of Hilary being of phenotype M.

So the next question asked just that: What is the probability of Hilary being phenotype M (which was just one genotype "MM").

This question I managed to get correct based on my initial incorrect probabilities of the parents, but it's important to explain it for the next question. So it turns out that:

P(\text{Hilary M}) = \sum^{\text{3}}_{i=1} P(\text{Hilary M}| E_{i})P(E_{i})

So you multiply each offspring probability, (the probability of Hilary being phenotype M given the mating type) with the associated mating type probability. And you sum them across all mating types. Easy.

But the next question was:

Calculate the probability that sisters Hilary and Jane both have phenotype M. This was the bit that I got completely wrong that took me a while to review. I ended up squaring the result I got from the last question. Very not correct. 🙁 From the above, we know we start with:

P(\text{Hilary M and Jane M})

= \sum^{\text{3}}_{i=1} P(\text{Hilary M and Jane M}| E_{i})P(E_{i})

and it turns out:

= \sum^{\text{3}}_{i=1} P(\text{Hilary M}| E_{i})P(\text{Jane M}| E_{i})P(E_{i})

Which suddenly makes it all very very clear. I suppose this goes to show that when you come across something convoluted, it's worth taking extra time out to run through it in depth and make detailed notes on it. Doing so here would've paid off. I think the problem I have with genetics questions is that there are quite a number of ways in which these questions can be phrased.

 

Writing Conditional Probabilities

Well this went really wrong. This is probably my weakest area, and is related to the above slip-ups in the questions with Hilary and Jane.

"Show the that proportion of male offspring for the second mating that you should expect to have plain wings (gene contains dominant allele A) is \frac{3}{4}."

Here, I wrote the definition incorrectly, but calculated the correct result. Kind of double-bad. 🙁 Here, I wrote:

P(male A)
(which is the joint probability of a male having the allele A)

When I should have written:
P(A | male)
(the conditional probability of offspring having the allele A given that they're male.)

 

The Hardy-Weinberg Law

A lengthier title to this subsection would be: "When to calculate the proportions of subsequent generations of a certain type using Hardy-Weinberg, and when to use your own table of probabilities".

As above, the table of probabilities includes the probabilities of the parents of certain types mating, and the probabilities of the associated offspring genotypes.

The question:

"One male and one female are chosen at random from all the offspring of the mating, and are themselves mated. What is the proportion of female offspring of the second mating to have a dominant allele?"

In this case, there were two genotypes which had a dominant allele, AA and Aa. But how do I parse this question? This question is asking about grandchildren of the initial parents! It's also asking about "proportion" which hints that I should be using Hardy-Weinberg proportions. Turns out not. It seems that you can only use the Hardy-Weinberg law when you're given the proportion of three genotypes of a starting generation.

So what are we left with?

P(AA | female) AND P(Aa | female)

Which in this case is equivalent to:

\sum^{\text{4}}_{i=1} P(\text{female AA}| E_{i})P(E_{i}) + \sum^{\text{4}}_{i=1} P(\text{female Aa}| E_{i})P(E_{i})

Notice how this differs from the sum in the last section (the Hilary and Jane example), because there's no assumption made about them both having the same father.

Last related one here that tripped me up was:

"What is the proportion of dominant-alleled females in this second mating would you expect to be AA?"

Again, I used the Hardy-Weinberg law to calculate this, when I should've been using conditional probability.

So it seems I needed to go through the process of parsing the question, and translating it into stats language: "What's the probability of offspring being genotype AA given that they're a female with a dominant allele?". The probability we require here is:

P(AA | dominant allele female)

Using the standard, straight-forward rule for conditional probability I learned in my first section back in September, this is equivalent to:

\frac{P(AA \cap \text{dominant allele female})}{P(\text{dominant allele female})}

What's the numerator here? The probability of being AA and a dominant-allele female? Well yeah, AA is dominant, we know that. So this is just the probability of being AA and female:

\sum^{\text{4}}_{i=1} P(\text{female AA}| E_{i})P(E_{i})

It's just one part of the previous question.

Then what's the denominator? The probability of being (proportion of) a dominant-allele female generally? So AA female and Aa female?  Well that was the actual answer to the last question!

So that's it. There's a lot of parsing that needs to be done generally:

Have I been given proportions? Use Hardy-Weinberg.
No proportions? Use a table of parents and offspring probabilites.
What am I given, what don't I have to calculate?
What are they asking me, is the probability conditional?
If it's conditional, I can separate it out but then I need to parse what each of these new probabilities mean.

Armed with this little checklist, I may have done a bit better in my genetics questions!

General Stuff

Range

If your answer is an equation in terms of x, always state the range of possible values of x:

Q(x) =1-\frac{x^{2}}{100},\:\:\:\: 0\leq x < 10

Variance

Annoying oversight here. When stating the variance of the lifetime of something was 42.92 months, I should've said it was 42.92 \text{months}^{2}. Not often you think of months-squared, but here, it's relevant. Variance!

Log and Ln

Concentrate when typing one or the other into your calculator. There's a big difference, people... Thankfully I only slipped up once here.

 

And that's it! Now it's just revision time until my exam on the 8th of June. Of course, due to our new friend covid-19, I'll be taking my exam at home which will be a bit weird. Plenty to revise though, so I'll get started...

 

International Lockdown Effectiveness

"How effective is each country's lockdown strategy?"

Once again, I'm thinking about covid-19 statistics rather than doing any actual statistics homework...

There are suddenly a lot of very large numbers flying around, and a lot of graphs saying "country A" is worse off than that "country B". -though none of them have appeared clear to me. Some of them try to compare too much, and others specifically attack "country A" for "reasons".

So I decided to take a snapshot of several countries from today, April 23rd 2020, to try and paint a more accurate picture of where we currently are. Adjust the populations below as you see fit, but they're more or less accurate. I just wanted to get a general idea.

First off, the UK. Currently 133,495 infected, population 67.82 million. That's 0.2% of the population infected.

Looking at just England's stats (99,137 infected) I thought damn, the English are doing really badly here. That's 99,137 of the total 133,495 in the whole of the UK! But then I compared all the countries in the UK together... Check this out:

Country Infected Population (million) Percentage of Population infected
UK 133495 67.82 0.20%
England 99137 55.98 0.18%
Scotland 9038 5.45 0.17%
Wales 8124 3.136 0.26%
Northern Ireland 2874 1.88 0.15%

 

No joke. England actually has 0.18% infected, while Wales has shot ahead with 0.26%.

So that got me thinking further... what about Europe? Germany's doing really badly in the UK's press at the moment...

Country Infected Population (million) Percentage of Population infected
UK 133495 67.82 0.20%
France 119151 65.25 0.18%
Germany 148046 83.73 0.18%
Italy 187327 60.48 0.31%

 

Oh.

Well America's always in the news at the moment! Trump's screwing that country, right?

Country Infected Population (million) Percentage of Population infected
United States 800926 330.64 0.24%
Washington State 12494 7.62 0.16%
California State 35396 39.51 0.09%
New York State 258589 19.45 1.33%

 

Oh. Right, I guess we're not that far behind them.

Wait, can we even talk about how "The United States" is doing? I don't think so, not when you have California at 0.09% and New York at 1.33%. Rather than "Trump and America", we should probably be talking about "State Governor and State".

Well New Zealand is doing very well, right? Press is practically hailing Prime Minster Ardern as a hero.

Country Infected Population (million) Percentage of Population infected
New Zealand 1112 4.89 0.02%

 

Okay, I suppose this is kind of what we expected.

So how's China at the moment? They've been the benchmark this entire time.

Country Infected Population (million) Percentage of Population infected
China 84302 1433.78 0.01%

 

Wow, what?

I found all this eye-opening. Here's all of them together in percentage order:

Country Infected Population (million) Percentage of Population infected
China 84302 1433.78 0.01%
New Zealand 1112 4.89 0.02%
California State 35396 39.51 0.09%
Northern Ireland 2874 1.88 0.15%
Washington State 12494 7.62 0.16%
Scotland 9038 5.45 0.17%
Germany 148046 83.73 0.18%
England 99137 55.98 0.18%
France 119151 65.25 0.18%
UK 133495 67.82 0.20%
United States 800926 330.64 0.24%
Wales 8124 3.136 0.26%
Italy 187327 60.48 0.31%
New York State 258589 19.45 1.33%

 

This showed me that as members of the public, we don't really have the whole statistical picture at the moment. But this only begins to look at lockdown effectiveness. How is actual healthcare working in each country, what are the percentage of deaths in each country?

What are the percentages of infectives and deaths in each country over time? How can we compare effectiveness of strategy?

Why isn't any of this being reported?
[UPDATE as of May 27th, 2020. Now they're reporting it... Godamn.]

 

Resources:

https://who.maps.arcgis.com/apps/opsdashboard/index.html#/ead3c6475654481ca51c248d52ab9c61
https://covid19.who.int/
https://nymag.com/intelligencer/article/new-york-coronavirus-cases-updates.html
https://www.doh.wa.gov/emergencies/coronavirus
https://www.worldometers.info/population/world/