1. Phenotype
- Study these phenes (for example, mood, hallucinations, suicidality ) in high –risk populations ( bipolar schizophrenia), which provide an enriched pool.
- Study these phenes longitudinally, over time. Wireless devices and mobile health applications are crucial.
2. Cohorts
- When enrolling subjects, for reliability of phene measures use an longitudinal within-subject design whenever possible.- Validate phene measures by the convergence of internal feelings and thoughts ( as measured by self-report scales) and of external actions and behaviors ( as measured by external raters).
- Separate cohorts by gender and by ethnicity, as this homogeneity leads to a reduction in noise.
3. Gene expression (biomarker discovery) is much more powerful than genetics (mutations discovery)
- One expressed gene may integrate the effects of up to ~ 103 SNPs, epigenetic changes, as well as the current effects of the environment.
- Focus on discovering first state biomarkers, correlated with phenes measured at the time of biomarker testing, not trait biomarkers, unless you have good longitudinal phene data. State over time is trait.
4. Study design
- A within -subject design is the best, as it factors out genetic variability. You can do aggregates of n of 1 studies. You may need n~101 for gene expression studies, and n~ 104 for family based genetic studies (the closest you can come to “within-subject” in genetics).
- A case-case design is second best, as it factors out some disease related variability. You may need n~102 for gene expression studies, and n~ 105 for genetic studies.
- A case-control design is the least powerful, due to heterogeneity and noise that is not factored out, i.e. many of the differences do not have something to do directly with the phenotype you are studying. You may need n ~103 for gene expression studies, and n~ 106 for genetic studies.
5. Convergent Functional Genomics (CFG) at a gene level
- ~ 102 more reproducibility at a gene level than at a SNP level.- Reproducibility in independent cohorts is more important than strength of signal in the discovery cohort, as that could be a fit-to-cohort effect.
- CFG is like a magnet that finds the biomarker “needle” in the genetic or blood gene expression “haystack”. It uses in a Bayesian way the whole prior body of work in the field to identify, prioritize and give credence to disease-related genes from the long lists of differentially tagged genes in genetic association studies ( GWAS), and from differentially expressed genes in the brain or blood.
-Moreover, the genes and biomarkers prioritized by CFG are fit-to-disease, not fit to cohort. Because of that, they travel well, reproduce and are predictive in independent cohorts, which is the ultimate litmus test for any genetic or biomarker finding.