< Data Management in Bioinformatics
Chapter Navigation
Top E/R Theory - Normalization - Data Querying - Integrating SQL and Programming Languages
A Wikibookian has nominated this page for cleanup.
You can help make it better. Please review any relevant discussion.
A Wikibookian has nominated this page for cleanup.
You can help make it better. Please review any relevant discussion.

Data Querying

Query Languages

  • SQL (Structured Query Language ). SQL is a Data Definition and Manipulation Language (both DDL and DML)
  • RA (Relational Algebra)
  • Datalog

Creating Tables

  • Command : CREATE TABLE

For the relations:

Gene(gid, name, annotation)

Experiment(eid, name,date)

Microarray(gid, eid, exprlevel)

Tables can be created using the following set of commands in SQL:

CREATE TABLE Gene( gid INTEGER, name VARCHAR(20), annotation  
VARCHAR(50), PRIMARY KEY (gid));

CREATE TABLE Experiment(eid INTEGER, name VARCHAR(10), date  
DATE, PRIMARY KEY (eid));

CREATE TABLE Microarray (a INTEGER, b INTEGER, exprlevel REAL,  
PRIMARY KEY (gid, eid));

A primary key is used to uniquely identify each row in a table. It can either be part of the actual record itself, or it can be an artificial field. SQL will automatically index the table based on the primary key. Indexing with other keys (other than the primary key) is also possible; it can be done by using the CREATE INDEX command.

Note that the primary key for the "Microarray" table is composed of keys from the other two tables i.e. Gene and Experiment. These fields are called Foreign Keys. It is important to reference where the table gets its primary key from. The syntax is as follows.

CREATE TABLE Microarray (a INTEGER, b INTEGER, exprlevel REAL,  
PRIMARY KEY (gid, eid), FOREIGN KEY (a) REFERENCES Gene 
(gid), FOREIGN KEY (b) Experiment (eid) );

These types of constraints are often called Referential Integrity Constraints.

Storing Data

  • Command : INSERT INTO

Example:

INSERT INTO Gene VALUES(1, "1433E", "enzyme binding");

Most database management systems have the capability to load bulk data at once.

Querying

Projection

  1. To display all the data stored in the Gene table
    • SQL
      SELECT * 
      FROM Gene;
      
    • RA
    • Datalog
  2. To display the names of all genes in the Gene table
    • SQL
      SELECT name
      FROM Gene;
      
    • RA
    • Datalog
    • Note: to list out only distinct values use SELECT DISTINCT in SQL. RA and Datalog return distinct values by default.
  3. To display gene names that meet certain criterion
    • SQL
      SELECT * 
      FROM Gene
      WHERE annotation like '%phosphates%';
      
    • RA
    • Datalog

Cartesian Product

Definition : The Cartesian product of two tables of size nxa and mxb is a table having n*m rows and a+b columns

  • SQL
    SELECT * 
    FROM Gene, Experiment;
    
  • RA
  • Datalog

Join

Example 1

  • Table
Genes
Expression
Experiments
gidnameanno
.........
.........
.........
gideidexplevel
.........
.........
.........
eidnamedate
.........
.........
.........
  • E-R Diagram

File:Cs5634-ER-pic1.gif

  • SQL
    SELECT Genes.*, Expts.*, exp_level
      FROM Genes, Expression, Expts
     WHERE Genes.gid = Expression.gid
       AND Expression.eid = Expts.gid;
    
  • RA
     Genes ?? Expression ?? Expts
    
     (A ?? B) ?? C = A ?? (B ?? C)
    
  • Datalog
     Answer(x, y, t, a, b, w) ← Genes(x, y, z), Expression(x, t, w), Expts(t, a, b)
    
     Genes' x match Expression's x
     Expts' t match Expression's t
    

Example 2

Question: ...

Expression
gideidexplevel
.........
.........
.........
  • RA CannotbeMax = ... Allpairs = ... Allpairs - CannotbeMax
  • SQL
    CREATE VIEW CannotbeMax(gid, eid) AS
        SELECT E1.gid, E1.eid
          FROM Expression AS E1, Expression AS E2
         WHERE E1.explevel < E2.explevel;
    
    CREATE VIEW Allpairs(gid, eid) AS
        SELECT gid, eid
          FROM expression;
    
    SELECT * FROM Allpairs
    EXCEPT
    SELECT * FROM CannotbeMax;
    
    - EXCEPT
    ∪ UNION
    ∩ INTERSECT
    
  • Datalog
     CannotbeMax(x, y) <- Expression(x, y, t), Expression(a, b, w), t < w.
    
     Answer(x, y) <- Expression(x, y, a), NOT CannotbeMax(x, y).
    
  • Interesting Query

    Without MAX() operator

    • max
    - ref. Join section Example 2
    • min
    • second largest
    • median

    Relational Division

    A X B = C C / B = ?
    Student & Courses
    StudentIdCourseID
    oneCS2204
    oneCS4104
    oneCS6354
    twoCS2204
    twoCS4104
    Required For Graduation(R4G)
    CourseID
    CS2204
    CS4104
    CS6354

    Through Relational Algebra
    1. sid =sidStudent&Courses
    2. Ideal = sid X R4G
    3. Reality = Student&Courses
    4. Defaulters = Ideal - Reality
    5. Answer = sid - sidDefaulters
    Through SQL

    1.
    CREATE VIEW SID(sid)<BR>
              SELECT sid<BR>
              FROM   Student&Courses<BR>
    
    2.
    CREATE VIEW IDEAL(sid,courseId)<BR>
               SELECT sid,courseID<BR>
               FROM  SID,R4G<BR>
    
    3.
    CREATE VIEW REALITY(sid,courseId)<BR>
               SELECT *<BR>
               FROM Student&Courses<BR>
    
    4.
    CREATE VIEW Defaulters(sid,courseId)<BR>
               SELECT * <BR>
               FROM IDEAL<BR>
               EXCEPT<BR>
               SELECT *<BR>
               FROM REALITY<BR>
    
    5.
    SELECT * <BR>
      FROM SID<BR>
      EXCEPT <BR>
      SELECT sid<BR>
      FROM Defaulters <BR>
    


    Exercise:

    • Find students who satisfied graduate requirements except for 1.
    • Find students who satisfied most of the graduate requirements.


    A View is not really computed, an row most of times cannot be inserted because of ambiguities.
    SQL Injection, is a hacking technique where the input is formatted such that it executes a SQL query which is not expected.


    Bells and Whistles
    1. ORDER BY
    2. MAX,AVG,SUM,MIN
    3. GROUP BY, HAVING
    4. SELECT has a functionality of Printing

This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.