Pseudomarker - Data Format Specifications
September 25, 2001 - Genevieve Monsees, modified by Saunak Sen
To use Pseudomarker, your data files must be in a specific format. The tools need five different files residing in the same directory that should be named geno.dat, pheno.dat, chrid.dat, mnames.txt, and markerpos.txt. Optionally, you may have a file called pnames.txt. All of these files must be stored as text files. To make the format easier to follow, the data file below (which may be stored in non-text format, such as in Excel), will be used as an example.
Sample Data File
Mouse_number | Phenotype | Covariate | D1Mit49 | D1Mit102 | D2Mit2 | D2Mit297 | D3Mit164 | D3Mit203 |
1 | 0.1 | 0.25 | AA | AA | BB | BB | AB | - |
2 | 0.2 | - | AB | AB | AA | - | AA | AA |
3 | 0.17 | 0.3 | BB | BB | AA | AB | AA | AA |
4 | 0.45 | 0.5 | AB | AB | AA | AA | AB | AB |
5 | 0.2 | 0.3 | - | BB | AB | AB | AA | AA |
Genotypes (geno.dat)
The genotype array, geno.dat contains one row for each marker, and
one column for each subject. Genotypes Should be coded as follows:
Homozygous AA=0
Heterozygous AB=1
Homozygous BB=2
Missing Genotypes=9
Given the sample F2 data, geno.dat would contain:
0 | 0 | 2 | 2 | 1 | 9 |
1 | 1 | 0 | 9 | 0 | 0 |
2 | 2 | 0 | 1 | 0 | 0 |
1 | 1 | 0 | 0 | 1 | 1 |
9 | 2 | 1 | 1 | 0 | 0 |
Phenotypes (pheno.dat)
The phenotype array, pheno.dat contains one column for each individual phenotype or covariate, and one row for each subject. Phenotypes chould be recorded numericaly. The code for missing phenotypes must be numeric and not equal to any given phenotype value. For example in a cross where the phenotypes range from 0 to 100, missing phenotypes can be coded as -999.
Given the sample data, pheno.dat would contain:
1 | 0.1 | 0.25 |
2 | 0.2 | -999 |
3 | 0.17 | 0.3 |
4 | 0.45 | 0.5 |
5 | 0.2 | 0.3 |
Chromosome ID (chrid.dat)
The file chrid.dat is a column of chromosome IDs for each marker in order.
Given the sample data, chrid.dat would contain:
1 |
1 |
2 |
2 |
3 |
3 |
Marker Names (mnames.txt)
The file mnames.txt is a column of the names of each marker used.
Given the sample data, mnames.txt would contain:
D1Mit49 |
D1Mit102 |
D2Mit2 |
D2Mit297 |
D3Mit164 |
D3Mit203 |
Marker Positions (markerpos.txt)
The file markerpos.txt is a column of the markerposition in cM for each marker used.
Given the sample data, markerpos.txt would contain:
D1Mit49 | 54.5 |
D1Mit102 | 73 |
D2Mit2 | 4 |
D2Mit297 | 29 |
D3Mit164 | 2.4 |
D3Mit203 | 11.2 |
Chromosome Committee Reports, Mouse Genome Database (MGD), Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, Maine are an excellent source of mouse marker positons.
Phenotype Names (pheno.txt)
The file pheno.txt has a list of the phenotype names. Please do not put any spaces in the phenotype names. For our sample data this would be
Mouse_number |
Phenotype |
Covariate |