LSAT was generated semi-automatically using a two-step procedure.
In the Information Retrieval step, an SVM classifier was trained using
inductive learning to identify sentences from MEDLINE describing generation of
In the Information Extraction step, information including gene
names, tissues, species, specificity, number of isoforms, and experimental
methods were extracted.
LSAT entries contain identifiers from databases like
Bow Toolkit ,
Stanford lexical parser while generating LSAT.
Sentences extracted by SVM classifier and subsequently tagged by
entity taggers are available here .
Shah PK, Jensen LJ, Boue S and Bork P.
Extraction of Transcript Diversity from Scientific Literature.
PLoS Computational Biology: 1(1) e10 .
Shah PK and Bork P
Learning About Alternative Transcripts in MEDLINE using Support Vector Machines