abstract |
The present invention relates to methods, apparatus and computer systems for assigning a numerical value to a genotype at a single- or multi-base segment in an individual's genome to denote the presence of a match or a mismatch of a nucleic acid base sequence of one or more chromosomal copies of the segment, as compared to the nucleic acid base sequence at a reference genome segment that corresponds to the segment of the individual's genome. The methods involve assigning a single digit numerical value to the match or the mismatch of each chromosomal copy of the segment in the genome, so that the numerical value assigned to a mismatch is greater than the numerical value of the match. A null symbol is assigned to a no call determination. The assigned numerical values are summed and a total numerical value which is a single digit or a fixed number of digits is obtained. The steps are repeated to create a vector of total numerical values for the segment among the set of genomes, to thereby obtain a segment-specific pattern of genotype match/mismatch between a set of genomes and the nucleic acid base sequence at the reference genome segment. The segment-specific pattern, also referred to as a "diff pattern" can be used to filter or uncover specific trends or sub-patterns across a set of genomes, and more quickly identify genotypic/phenotypic relationships by identifying sites where the distribution of genotypes in the set of genomes relates in a distinctive, causal way to the distribution of a given phenotype among the individuals whose genomes are under study. |