1. Home
  2. LAB
  3. Managing Genotypes

Managing Genotypes

The Genotypes Module

  • The left pane displays the genotype import folders, which hold all the genotypes that have been imported into the database.
  • The upper right pane displays the Genotype Import list. This is a list of all the imports that are contained in a specific genotype import folder. The pane displays the following information for an import:
    • Import Name
    • Date – Import date.
    • Marker Set – The name of the Marker set used for comparison and error checking the import.
    • Total – The total number of Markers in the set
    • Imported – Total number of Markers imported
    • MM – Total number of missing Markers (MM) in the import.
    • ZA – Total number of zero alleles (ZA) in the import.
    • ME – Total number of Mendelian errors (ME) in the import.
    • DE – Total number of discrepancies in the import.
    • CE – Total number of control errors (CE) in the import.
    • RJ – Total number of data lines rejected from the import.
  • The middle right pane displays the Rerun Import list, which is a list of all the imports that were rerun for an original import. The pane displays the same information for a rerun as the top pane displays for an original import.
  • The Error Check pane displays expanded information for each error check that was carried out for an original import or a rerun. To view the results of a specific type of error check for an import or rerun, select the import or rerun in the appropriate pane of the Genotypes window, and then open the needed tab in the Error Check pane.

To Import Genotypes

  1. On the navigation bar, click the Genotypes button to open the Genotypes module.
  2. Select the folder in which to store the imported genotypes.
  3. On the Genotypes module toolbar, click the New Import button to open the Genotype Import dialog box.
  4. In the top pane of the dialog box, select the folder in which to import the genotype data.
  5. In the Import text field, enter a name for the genotype import.
  6. On the Marker Set dropdown list, select the Marker set against which the genotype data is to be verified. Progeny verifies the imported genotype data by matching the Markers that are contained in the import against the Markers that are listed in the selected Marker Set. All error checking that is carried out on the import is based on this selected Marker set. For example, Markers that are defined in the Marker set, but are not found in the import file, are flagged as missing Markers.
  7. Indicate the level at which the genotype information is to be stored – Individual Level, Sample Level, or Both Sample and Individual.NOTE: If you select Both Sample and Individual, you are duplicating the genotype data in the database. The recommended option is Sample
  8. Select the type of error checking that is to be carried out – Missing Markers, Zero Alleles, Mendelian Errors, Discrepancies, and/or Control Errors.
  9. Continue to one of the following:
    • To complete a standard import
      1. Select one of the following standard import formats: Standard Import Format, Illumina Final Report Format, Affymetrix CHP File, Affymetrix GDAS Text Output Format, or Affymetrix GTYPE Text Output Format.NOTE: If you are using an Affymetrix Library, then you must also specify the Library Path by clicking Browse to open the browse for folder dialog box and browsing to the folder for the library.
      2. Click Browse… to browse to and select the genotype import files.
      3. Click Import. A dialog box opens, indicating the progress of the import. When the process is complete, a message opens indicating that the import was successfully completed.
      4. Click OK to close the message and the dialog box. You return to the Genotypes module. The import is listed in the top pane of the Genotypes module.
      5. To view the results of a specific type of error check for the import, select the import in the top pane of the Genotypes module, and then open the appropriate tab in the Error pane.
    • To complete a custom import.
      1. Select Custom Import Format for the file type.
      2. Click Browse to browse to and select the genotypes import file. An Import dialog box opens. The imported file is displayed in the Import Preview pane (bottom pane) of the dialog box.
      3. Indicate the File Format of your custom file.
      4. Select the File Delimiter that is used for the data in the import file. If you select Other, you must specify the character that is used for the delimiter.
      5. Indicate the Allele Format for the allele calls in the file.
        • Two columns per call—Each allele call is in a separate column.
        • One column per call—Both allele calls are in the same column.
        • Optionally, do one or both of the following:
          • Alleles delimited by – Specify the delimiter.
          • Custom Allele Values -Enter the appropriate allele values.
      6. Start import at row – Specify the starting row for the import.
      7. In the Import Preview pane (the bottom pane of the Import dialog box), for each field, right-click in the column header and manually assign the appropriate heading.
        • One row per call – Requires a Sample Name column and a Marker Name column. If Two columns per call is selected, you must indicate the Allele A and Allele B columns. If One Column per call is selected, you must indicate the Allele AB column.
        • One row per sample – Requires a Sample Name If Two columns per call is selected, you must indicate the Allele A and Allele B columns. If One Column per call is selected, you must indicate the Allele AB column.
        • One row per Marker – Requires a Marker Name. If Two columns per call is selected, you must indicate the Allele A and Allele B columns. If One Column per call is selected, you must indicate the Allele AB column.
      8. Click Import. A dialog box opens, indicating the progress of the import. When the process is complete, a message opens indicating that the import was successfully completed.
      9. Click OK to close the message and the dialog box. You return to the Genotypes module. The import is listed in the top pane of the Genotypes module.
      10. To view the results of a specific type of error check for the import, select the import in the top pane of the Genotypes module, and then open the appropriate tab in the Error Check (bottom) pane.

Manually Entering Genotype Data

Instead of importing genotype data into a Progeny database, you can manually enter data.

NOTE: If you manually enter genotype data, no error checking of the data is automatically carried out.

Four options are available for manually entering genotype data:

  • You can enter genotype data using icon Markers on a pedigree.
  • You can enter genotype data into the Marker fields on a datasheet for an individual.
  • You can enter genotype data directly into the Marker fields in a spreadsheet for an individual. For example, select the pedigree on the Pedigrees module and then on the toolbar, click the Indiv SS Select the system fields of Pedigree Name and UPN and the appropriate Marker fields, and then run the spreadsheet. The Marker field is automatically split into two fields representing the paternal allele in a1 and the maternal allele in a2.
  • You can import data from a text file into the Marker fields in a spreadsheet for an individual. For example, select a field that matches the individuals in the spreadsheet to the individuals in the text file such as Patient ID # and the appropriate Marker fields and then run the spreadsheet. The Marker field is automatically split into two fields representing the paternal allele in a1 and the maternal allele in a2. After running the spreadsheet, import the Marker data.

Importing Genotype Formats

Progeny Lab provides functionality for importing genotype files in one of three ways:

  • Using a Standard Import format.
  • Using one of the following third-party formats—Illumina or Affymetrix.
  • Using a Custom Import format.

The Custom Import Format is the preferred option because you can specify the exact layout of the file you are importing, which includes designating the data that is contained in each column of the file to ensure compatibility.

Standard Import format

A Standard Import format must be a tab-delimited text file with a specific structure.

  • Column 1 – The sample name. All alphanumeric characters are allowed and there is no limit to the number of characters.
  • Column 2 – The Marker name. All alphanumeric characters are allowed and there is no limit to the number of characters.
  • Column 3 – The value for allele 1. For microsatellites, there are no restrictions on the data that can be displayed in this column. For SNPs, the value can be either an A, B, C, G, or T.
  • Column 4 – The value for allele 2. For microsatellites, there are no restrictions on the data that can be displayed in this column. For SNPs, the value can be either an A, B, C, G, or T.

Illumina Final Report Format

The Illumina Final Report Format is an output format that is generated by the Illumina platform. Because of the number of iterations that can cause compatibility issues with the Progeny application, Progeny strongly recommends that you use the Custom Import Format instead. By using the Custom Import Format, you can specify the file layout and import the data as indicated in the file.

Affymetrix files

The Affymetrix CHP file, the GDAS Text Output Format, and the GTYPE Text Output Format are output formats that are generated by the Affymetrix platform.

Each Affymetrix CHP file contains a single sample. The Affymetrix file name is either the exact name of the sample name or it can also contain an underscore with additional text, such as 223339_axt3343. If the file contains an underscore with additional text, Progeny ignores this additional text during the import. For instance, 223339_axt3343 is imported as 223339.

Because of the number of iterations that can cause compatibility issues with the Progeny application, Progeny strongly recommends that you use the Custom Import Format in lieu of either the Affymetrix GDAS Text Output Format or the Affymetrix GTYPE Text Output Format. By using the Custom Import Format, you can specify the file layout and import the data as indicated in the file.

Custom import file format

A custom genotype file can have one of three formats:

  • One row per call—Calls in a row, with the following columns: Unique ID, Marker Name, Allele 1, and Allele 2.
  • One row per sample—Samples in rows with Markers in columns.
  • One row per Marker—Markers in rows with samples in columns.

To Rerun a Genotypes Import

On the Genotypes module, in the Import List pane, select the original import that is to be rerun, and then on the window toolbar, click the Rerun button to open the Rerun Genotype Import dialog box.

  1. The Marker Set that was used for the original import is automatically selected for the rerun and you cannot change this value.
  2. The Error Checking options that were selected for the original import are also automatically selected, however, you can change these selections.
  3. The File Type that was used for the original import is automatically selected for the rerun, however, you can change this selection.
  4. The files that were used for the original import are displayed in the lower pane of the dialog box. You can change this list of files, if needed, including deleting files and selecting new files.
  5. After you have made all the necessary modifications for the rerun, click Rerun Import. A dialog box opens, indicating the progress of the import. When the process is complete, a message opens indicating that the import was successfully completed.
  6. Click OK to close the message and the dialog box. You return to the Genotypes module. The import is listed in the middle pane of the Genotypes module. Any discrepancies between the original import and the rerun are listed on the Discrepancies

Clearing Genotype Data

  1. On the Genotypes module toolbar, click the Clear Genos The Clear Genotype dialog box opens.
  2. In the Database Folders pane, select the folder that contains the pedigrees for which the genotype data is to be cleared.
  3. In the Pedigrees and Individuals pane, select the pedigree (CTRL-click to select multiple pedigrees) for which the genotype data is to be cleared. The individuals that are contained in a selected pedigree are displayed in the right middle pane of the dialog box. By default, the UPN is the identifier used for the individuals.
  4. Drag the selected pedigrees to the Selected Pedigrees and Individuals
  5. Do one of the following:
    • Include all selected individuals – Deletes the genotype data for all individuals in all selected pedigrees.
    • Query within selected individuals – Deletes the genotype data for only those individuals who meet a specific criterion. Click the button to open the Specify query dialog box and define a new query or load a saved query format.
  6. Open the Markers tab, and then do one of the following:
    • To clear an entire Marker set, drag the Marker set from the Sets pane to the Selected Markers
    • To clear all the Markers for a specific chromosome, drag the chromosome from the Chromosomes pane to the Selected Markers
    • To clear only specific Markers, select the Marker (CTRL-click to select multiple Markers) from the top right pane to the Selected Markers

  1. Click on the Pedigrees tab, and then click OK on the tab. A message opens, asking you if you are sure that you want to clear the genotype data.
  2. Click Yes to close the message. A dialog box opens, indicating the status of clearing the genotype data.
  3. When the status is complete, click Close to close the dialog box and return to the Genotypes module.

Error Checks for Genotype Imports

Progeny Lab provides functionality for importing genotypes directly from supported file formats such as Illumina, Affymetrix, ABI, Sequenom, as well as other formats. When you import genotypes, an option is available for carrying out specific error checks—Missing Markers, Zero Alleles, Mendelian errors, Discrepancies, Control Errors, and Rejected—on the data. Error Check results are displayed in the bottom pane of the Genotypes module. To view the results of a specific type of error check for an import, select the import in the top pane of the Genotypes module, and then open the appropriate tab in the Error Check pane.

Some tabs will have a range of columns to display and will have the Run and Jump to page buttons. To change the view on the tab, enter new values for the beginning and ending range and then click the run button. You can click the Jump buttons at either end of the filter to update the display according to the range that you specified.

Missing Markers tab

When you import genotypes, you must select a Marker set to compare the import file against. In a Missing Markers error check, Progeny compares the Markers that are contained in the import file against the Markers that are listed in the selected Marker set. All error checking that is carried out on the import is based on this selected Marker set. For example, Markers that are defined in the Marker set, but are not found in the import file, are flagged as missing Markers.

Zero Alleles tab

In a Zero Alleles error check, Progeny checks each sample for all Markers and identifies any Markers that contain zero alleles, which occurs when the value of a microsatellite cannot be determined. As a result, a zero is displayed for the Marker value. The Zero Alleles tab displays the sample name and the corresponding Marker that contains a zero allele.

Mendelian Errors tab

In a Mendelian Errors check, Progeny compares the data for each allele against the relationship structure of the pedigree and verifies the compatibility. Any discrepancies are identified as Mendelian errors and are displayed on the Mendelian Errors tab.

The first four columns in this tab, Pedigree, Individual Name, Conflict Type, and Marker are system- generated fields and therefore, are not editable. The final three columns, A1, A2, and Comment are editable fields. You can change the value of either A1 or A2 to resolve any conflicts as well as enter comments in the Comment field for record keeping purposes. You can edit a Mendelian error directly on the Mendelian Errors tab, or you can edit the Mendelian error directly in the pedigree.

Edit a Mendelian Error

To edit a Mendelian error on the Mendelian Errors tab

  1. Select the error that you want to be corrected on the Mendelian Errors
  2. Edit the value as needed, and then right-click on the edited value, and on the context menu that opens, click Save Changes. Repeat Step 1 and Step 2 to correct all the Mendelian errors.
  3. Select a row for which you corrected the Mendelian error (CTRL-click to select multiple rows), right-click on any of the selected rows, and then on the context menu that opens, click Rerun Mendelian Checks on All Selected Rows. The Mendelian checks are rerun for all the rows to ensure that the error was properly corrected. A message opens, indicating that the Mendelian error rerun was successfully completed. It also indicates the number of errors that were corrected, and that the corrected rows have been disabled.
  4. Click OK to close the message to return to the Mendelian Errors
  5. Optionally, to delete a disabled row, select the row (CTRL-click to select multiple rows), right-click on any selected row, and on the context menu that opens, click Delete All Selected Rows.

To edit a Mendelian error in the pedigree

  1. On the Mendelian Errors tab, right-click on the error that you are editing, and on the context menu that opens, click Open Error to Pedigree. The pedigree that contains the individual with the Mendelian error opens. The specific Marker is loaded in the pedigree, and the allele information is displayed as a haplotype for the affected individual.
  2. To edit the value, click on the allele value and enter the correct value.
  3. Close the pedigree, making sure to answer the prompt about saving the changes. After you correct the error in the pedigree, you return to the Mendelian Errors The row that contained the error is disabled.
  4. Right-click on any corrected error row, and on the context menu that opens, click Rerun Mendelian Checks on All Selected Rows. The Mendelian checks are rerun for all the row to ensure that the error was properly corrected. A message opens, indicating that the Mendelian error rerun was successfully completed. It also indicates the number of errors that were corrected, and that the corrected rows have been disabled.
  5. Click OK to close the message to return to the Mendelian Errors
  6. Optionally, to delete a disabled row, select the row (CTRL-click to select multiple rows), right-click on any selected row, and on the context menu that opens, click Delete All Selected Rows.

Discrepancies tab

The Discrepancies tab lists any discrepancies that were found between an original genotype import and a rerun of the import. The first three fields on the tab, Type, Item ID, and Marker are system fields and are not editable. The next four fields display the original value for A1, the new value for A1, the original value for A2, and the new value for A2. When the tab first opens, the original values for A1 and A2 are the values that are stored in the database. You can resolve these discrepancies from the Discrepancies tab, or you can resolve these discrepancies from the pedigree. You can also export the information that is displayed on the Discrepancies tab to a text file.

To resolve discrepancies from the Discrepancies tab
  1. To resolve the discrepancy for an allele, click on the new value for the allele. The new value for the allele is now highlighted in yellow, indicating this is the value that is to be stored for the allele in the database, and the original value now has no background color (white), indicating that the original value is to be rejected.
  2. After you have determined which values to store in the database and which values to reject for both A1 and A2 for each Marker, select a row (CTRL-click to select multiple rows), and then right-click on any selected row and on the context menu that opens, click Resolve Selected Discrepancies. The selected discrepancies are removed from the Discrepancies tab and the changes that you made are saved. The selections that you made are used when data is exported for analysis.
To resolve discrepancies from the pedigree
  1. Right-click on the discrepancy on the Discrepancies tab, and on the context menu that opens, click Open Pedigree to Error. The pedigree is displayed onscreen.
  2. Make the changes directly in the displayed pedigree.
  3. Close the pedigree to return to the Discrepancies
  4. Select the changed value in either the A1 New or A2 New The new value for the allele is now highlighted in yellow, indicating this is the value that is to be stored for the allele in the database, and the original value now has no background color (white), indicating that the original value is to be rejected.
  5. After you have you have determined which values to store in the database and which values to reject for both A1 and A2 for each Marker, select a row (CTRL-click to select multiple rows), and then right-click on any selected row and on the context menu that opens, click Resolve Selected Discrepancies. The selected discrepancies are removed from the Discrepancies tab and the changes that you made are saved. The selections that you made are used when data is exported for analysis.
To export discrepancies to a text file
  1. Right-click on any row on the Discrepancies tab, and on the context menu that opens, click Export to File.
  2. Specify the name and location for the export file. The exported file contains all the discrepancies, including the original call and the change that was committed to the database.

Control Errors tab

The Control Errors Tab lists any controls where the expected value for A1 and A2 is different than what was returned in the genotype file that was imported. The expected calls are listed in the columns A1 Control and A2 Control. The returned incorrect calls are listed in A1 and A2 columns. To export the control errors to a text file, right-click anywhere on the tab, and on the context menu that opens, click Export to File. After a control is identified as an error, all the data for the Marker in the genotype import will be rejected. The resulting rejected Markers are listed on the Rejected tab with the reason “Entire Marker invalid due to control error.”

Rejected tab

The Rejected tab lists all the data in the genotype import file that was rejected for one of the following reasons:

  • There is no match in the database with a sample name that appears in the genotype import. The sample name must exist in the database to store the genotype information.
  • There is no match in the database with a Marker that appears in the genotype import. The Marker must exist in the database to store the genotype information.
  • If duplicate Markers and samples are contained in the genotype import, and the values for both are zero, the second Marker/sample entry is rejected.
  • If the expected Allele A value or Allele B value is not stored for a given SNP or Marker in the Markers module, then the SNP or Marker is rejected.
  • If a field cannot be updated during the import process (for example, another user is updating a Sample Name field during your import process), then the data that references the field is rejected.
  • If any database error occurs during the import process, then the field that failed to record data is listed on the Rejected tab along with the value that could not be recorded.

None of the fields on this tab is editable. The first three columns, File, Line Number, and Sample, list the file that contained the rejected data, the line number where the error check occurred in the file, and the sample that contained the error. The remaining three columns, Marker, Reason, and Count, list the Marker that contains the error, the reason for the rejection, and the count for multiple error rows, respectively.

Updated on April 13, 2018

Was this article helpful?

Related Articles

Leave a Comment