Pseudomarker - Data Format Specifications

September 25, 2001 - Genevieve Monsees, modified by Saunak Sen

To use Pseudomarker, your data files must be in a specific format. The tools need five different files residing in the same directory that should be named geno.dat, pheno.dat, chrid.dat, mnames.txt, and markerpos.txt. Optionally, you may have a file called pnames.txt. All of these files must be stored as text files. To make the format easier to follow, the data file below (which may be stored in non-text format, such as in Excel), will be used as an example.

Sample Data File

Mouse_number Phenotype Covariate D1Mit49 D1Mit102 D2Mit2 D2Mit297 D3Mit164 D3Mit203
1 0.1 0.25 AA AA BB BB AB -
2 0.2 - AB AB AA - AA AA
3 0.17 0.3 BB BB AA AB AA AA
4 0.45 0.5 AB AB AA AA AB AB
5 0.2 0.3 - BB AB AB AA AA

 

Genotypes (geno.dat)

The genotype array, geno.dat contains one row for each marker, and one column for each subject. Genotypes Should be coded as follows:
Homozygous AA=0
Heterozygous AB=1
Homozygous BB=2
Missing Genotypes=9

Given the sample F2 data, geno.dat would contain:

0 0 2 2 1 9
1 1 0 9 0 0
2 2 0 1 0 0
1 1 0 0 1 1
9 2 1 1 0 0

 

Phenotypes (pheno.dat)

The phenotype array, pheno.dat contains one column for each individual phenotype or covariate, and one row for each subject. Phenotypes chould be recorded numericaly. The code for missing phenotypes must be numeric and not equal to any given phenotype value. For example in a cross where the phenotypes range from 0 to 100, missing phenotypes can be coded as -999.

Given the sample data, pheno.dat would contain:

1 0.1 0.25
2 0.2 -999
3 0.17 0.3
4 0.45 0.5
5 0.2 0.3

 

Chromosome ID (chrid.dat)

The file chrid.dat is a column of chromosome IDs for each marker in order.

Given the sample data, chrid.dat would contain:

1
1
2
2
3
3

 

Marker Names (mnames.txt)

The file mnames.txt is a column of the names of each marker used.

Given the sample data, mnames.txt would contain:

D1Mit49
D1Mit102
D2Mit2
D2Mit297
D3Mit164
D3Mit203

 

Marker Positions (markerpos.txt)

The file markerpos.txt is a column of the markerposition in cM for each marker used.

Given the sample data, markerpos.txt would contain:

D1Mit49 54.5
D1Mit102 73
D2Mit2 4
D2Mit297 29
D3Mit164 2.4
D3Mit203 11.2

Chromosome Committee Reports, Mouse Genome Database (MGD), Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, Maine are an excellent source of mouse marker positons.

Phenotype Names (pheno.txt)

The file pheno.txt has a list of the phenotype names. Please do not put any spaces in the phenotype names. For our sample data this would be

Mouse_number
Phenotype
Covariate