Conversation
|
OK, have figured it out. format_name in the md file has to be |
PCA.md
Outdated
|
|
||
| ```{code-cell} ipython3 | ||
| # load required libraries | ||
| import msprime as msp |
There was a problem hiding this comment.
I think I would prefer simply import msprime, which matches the rest of the documentation
| fstmat | ||
| ``` | ||
|
|
||
| ## PCA 'by hand' |
There was a problem hiding this comment.
Should we perhaps give a quick demo of ts.pca first, in case people just want to copy/paste the code? I know that doesn't quite fit with your logical flow, however.
There was a problem hiding this comment.
I agree that would be better tutorial-wise, simple approach first, then details
|
A few comments, one minor, the other a bit more major (should we demo the built-in method quickly first?). Happy to merge either way, and we can change later. One thing though: could you possibly squash all the commits into one? Thanks @hannesbecher ! |
PCA.md
Outdated
| demography=dmg, | ||
| random_seed=1234, | ||
| sequence_length=1e6, | ||
| recombination_rate=1e-9) |
There was a problem hiding this comment.
Maybe increase to the same order as mut rate?
There was a problem hiding this comment.
Good, makes the plots look clearer.
| This command should do the trick: | ||
| ``` | ||
| $ R -e 'install.packages(c("reticulate", "IRkernel")); IRkernel::installspec()' | ||
| $ R -e 'options(repos=c(CRAN="http://cran.r-project.org")); install.packages(c("reticulate", "IRkernel")); IRkernel::installspec()' |
There was a problem hiding this comment.
Isn't this the default anyway?
There was a problem hiding this comment.
Failed on my Mac without that addition. I guess it is not going to do any harm?
| fstmat = np.zeros([nPop,nPop]) | ||
| for i in range(nPop-1): | ||
| for j in range(i+1,nPop): | ||
| fstmat[i,j] = tsm.Fst([range(i*nHap,(i+1)*nHap), range((i+1)*nHap,(i+2)*nHap)]) |
There was a problem hiding this comment.
Would there be value in computing branch Fst too? I assume that below you will compute branch and site PCA ...
There was a problem hiding this comment.
I would leave this for the stats tutorial(s).
PCA.md
Outdated
|
|
||
| ```{code-cell} ipython3 | ||
| # haplotypes, each sample haplotype is ues by default | ||
| hapBranchPca=ts.pca(num_components=10) |
There was a problem hiding this comment.
This is a nitpick, but above you used ht/gt to denote haplotypes and genotypes, so maybe use the same style for results too, htBranchPca etc.
| c=np.repeat([1,2,3,4,5], [nHap] * nPop)) | ||
| axs[0, 0].set_title('Haplotypes (sites)') | ||
|
|
||
| axs[0,0].set_ylabel("PC2") |
There was a problem hiding this comment.
Do you need to add PC1 axis label too?
There was a problem hiding this comment.
Maybe not ... doing this in a browser;)
There was a problem hiding this comment.
I think it's beautiful as it is, GG.
|
|
||
| The plots on the left show one dot per haplotype. These have twice as many dots as the plots on the right, which show individuals. The colours indicate from which of the five islands a haplotype or individual was sampled. As expected with low geneflow, there is some grouping by island. Feel free to re-run with higher or lower values of `migRate` to see how the separations between the island samples changes. | ||
|
|
||
| +++ |
There was a problem hiding this comment.
I quite like it. But happy should somebody send a PR...
| ## Empirical data | ||
| Here, we demonstrated using simulated data how SNPs and ARG branches lead to equivalent PCA results. For empirical data, the ancestral states of variant sites are not known a priori, which will in practice often lead to polarisation differences. That may affect the outcome of PCA. | ||
|
|
||
| **TODO:** Extend Tutorial to empirical data. |
There was a problem hiding this comment.
It might be simplest to demonstrate this by randomly? repolarising SNP matrices and repeating the PCA on these?
There was a problem hiding this comment.
Or major allele -> ancestral, etc. Random as in 50:50, did not make much difference as far as I remember from the randPedPCA project. Needs more planning.
Absolutely. I can do this with the merge button, though, correct, @hyanwong ? |
Added tutorial on comparing branch and SNP-based PCA using msprime and tskit, including code examples for simulating ARGs and performing PCA. Add PCA file to the table of contents Modify R installation command in README Updated R package installation command to specify CRAN repository. Refactor PCA.md for clarity and organization Updated kernel specifications and code formatting for clarity. Added comments and organized code cells for better readability. Revise PCA tutorial and update Jupyter metadata Updated Jupyter Notebook metadata and improved PCA tutorial content. Update kernelspec display name and language in PCA.md Update PCA.md Co-authored-by: Gregor Gorjanc <gregor.gorjanc@gmail.com> Update PCA.md Co-authored-by: Gregor Gorjanc <gregor.gorjanc@gmail.com> Update PCA.md Co-authored-by: Gregor Gorjanc <gregor.gorjanc@gmail.com> PR comments implemented heading
Hi,
here is my proposed PCA tutorial. This covers:
Looking forward to opinions/comments/requests!
HELP please: for some reason the cell outputs of this tutorial don't render when I run
makeon my system (but they do for all other existing python tutorials). What I might be going on? I had written this as an ipynb and then converted to md.Cheers,
Hannes