Extracting and Characterizing Gene-Drug Relationships from Literature

Supplemental Website For:

Chang JT and Altman RB. Extracting and Characterizing Gene-Drug Relationships from Literature.
Pharmacogenetics 14(9), 2004.

Some of the data sets and results do not fit into the constraints of a publication. We provide some supplementary information here.

Supplementary Table: Manually Reviewed Gene-Drug Co-occurrences
Supplementary Table: Genes and Drugs from PharmGKB Classified into Relationships

All Relationships

We have scanned the MEDLINE database for citations that contain co-occurrences from selected lists of genes and drugs. We derived the lists from two sources that focus on genes and drugs of pharmacogenetic interest: the PHARMGKB database and a REVIEW article (Evans WE, Relling MV (1999). "Pharmacogenomics: translating functional genomics into rational therapeutics." Science. 286(5439):487-91).

In this experiment, we focused on extracting the relationships between the genes and drugs necessary to quantify the performance of our algorithm. We did not use comprehensive lexicons for gene and drug names, and thus relationships between other genes/drugs may be missing from this list. Nevertheless, many important genes and drugs are included, and we make available all the co-occurrences found (~35,000 co-occurrences, some redundant) for the research community. There are a total of (98 genes * 258 drugs =) 25,284 possible co-occurrences between these two lists, and we find evidence for 1611.

Sentence Boundary Detection Heuristic

Finding sentence boundaries is not straightforward because of ambiguities in sentence-ending punctuation. Periods appear in many contexts, such as 0.05, N.I.H., or J. Watson. Therefore, determining whether a period or other punctuation indicates a sentence boundary requires special processing.

Fortunately, MEDLINE abstracts are relatively well-structured. In general, the text is regular and the sentences are well-formed. Thus, we use a simple set of heuristics, consisting of the following rules, to find sentence boundaries.


[ BioNLP Home | Abbreviations | Gene and Protein Names | Gene-Drug Relationships ]