生 物统计2016

上课时间:周二:10:00-12:00; 周四:10:00-12:00(双周)

上课地点: 二教404

通 知
2016年3月2日通知
以下日期将由 Prof. Theis Lange 讲授
March 29th, Tuesday on power analysis etc.
April. 12nd, Tuesday on Survival Analysis
April. 14th, Thursday on Survival Analysis
April. 19th, Tuesday on Survival Analysis

本科生选课名额已增加到40名,有希望选课、但没有选上的同学尽快选课

2016年4月26日通知
计划4月28号要完成所有同学的开题报告,如不能完成则延后到下次课程。开题报告顺序以自愿和点名结合确定。点名未到者,无开题报告成绩。

2016年5月15日通知
5月17,18日统计科学中心举办大数据时代的高维统计会议。相信很多同学都对此会议感兴趣,因此取消5月17日课程,鼓励各位同学去参加此会议,望相互转告。会议信息 请见这 里

课 程简介

参考书:

Generalized Additive Models: an introduction with R by Simon Wood

Elements of Statistical Learning by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie

Advanced Data Analysis from an Elementary Point of view   by Cosma Rohilla Shalizi

Mixed Effects Models and Extensions in Ecology with R by Alain F. Zuur, Anatoly A. Saveliev, Elena N. Ieno, and Graham M. Smith

An Introduction to Statistical Learning with Applications in R
  by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

R相关
The R project
Bioconductor
Rstudio
Cookbook for R
R Graph Gallery
Swirl
DataCamp

一些有用的网络资源

Marshall Hampton's Class on Bioinformatics

Bioinformatics and Functional Genomics

Computational Genomics: A Case Studies Approach

Introductory Biology (MIT open course)

课件

Lecture1

Lecture2/code

Lecture3/code1,code2

Lecture4,5/Reading Material/Code/cnvData

Lecture6&7/code

Lecture8

Lecture9/code

Lecture10/code

Lecture11/leukemia data

Lecture12

Lecture 13

About the data in Lecture 13:  
The Stanford heart data is included in R by birth. Loaded by doing:

library(survival)
data(heart)
jasa # this is the data to use
More data description here:
http://stat.ethz.ch/R-manual/R-patched/library/survival/html/heart.html


Lecture 14/code

Lecture 15/code

Lecture 16&17/code

Lecture 18


作业

All homework should be in pdf format and should be emailed to the TA (cheung1990 AT 126 DOT com). If a homework involves coding, you should also provide your code to the TA. Your code should also be easily excutable by the TA (For the TA's convenience, you'd better write a short document explaining how to run your code).


Homework 1 (Due: March 17)
Read the paper by Cleveland and McGill..
1. Summarize the paper. According to the paper, make some general recommendations when making a plot.
2. Give at least one example that you encountered  (in scientific papers, social medias or other areas) where you can redesign the plot to make it more accurate. You need to provide the source of the example so that others can easily find the example you give. 

Homework 2 (Due: April 5)
        1. Exercise 9 of Chapter 10  in the book
An Introduction to Statistical Learning with Applications in R. Note that the data USArrests is a part of the base R distribution. You may use data(USArrests) in R to load the data.
        2. Exercise 11 of Chapter 10 in the book
An Introduction to Statistical Learning with Applications in R.
        3. The problems in this file.

Homework 3 (Due: May 2)
The problems in this file. The data is here.

期末大作业


期末大作业可以为自选问题或从下面给出的两道备选题目中选择。鼓励每组同学尽量达到3人,如果小组成员有2个或2个以上本科生,此小组最多可有4人组成。 鼓励数院同学与其他院系同学混合编组。

如果是自选题目,每个小组必须提前汇报(4月28日)你要做 的问题(背景,意义以及你的初步研究计划)。对于自选题目,将基于你做的问题的意义及你的完成情况对你的期末大作业评分。

对于两道备选题目,鉴于第二题比第一题简单,对选第二题目的同学会有一定程 度的惩罚,即,如果某组同学第二题初步评分得到X分,则本组同学在期末大作业上的实际得分只有0.9X分。选择两道备选题目的同学也需要在4月28日或5月3日汇报你组初步的研究计划。

每组同学需要在6月7日或6月9日汇报你组的结果。大作业最终版本必须在6 月19日23:00点前发给主讲老师(请发邮箱ruibinxi AT hotmail.com)和助教。邮件题目必须是“2016生物统计大作业+本组成员名单”。大作业中必须明确每位同学在作业中的贡献,贡献小的同学最终得分会有所惩罚。

两次汇报(开题汇报及最后的成果汇报)均将记录成绩,不汇报或汇报时间未到组别无此项成绩。开题汇报每组5分钟,2分钟问题。成果汇报每组10分钟,2分 钟问题。

请每组同学在4月27日前将本组名单及本组选择的题目发给助 教。

备选题目一(Replication Timing)
题目一阅读材料

备选题目二(Parkinsons)/数据