Data Carpentry Workshop - Day 1

Data Capentry

偶遇

五月初一次偶然的机会,在外导得知我在自学R语言后,推荐了一些课程给我。其中,一个学校机构组织的Data workshop通知吸引了我,在得知这是一个公益限量面向博士开课的课程后,我还是抱着试试看的态度报了名。报名截止后,我还发邮件问了主办方,被告知接下来的一周会给回复。五月底,自己收到了两个课程的正式通知,也得知后面还有很多申请的同学在排队,庆幸自己得到了这么来之不易的学习机会,必将好好珍惜。

Introduction Data Carpentry

Data Carpentry在国外很盛行,类似一些Volunteers定期为学生或研究人员组织的培训活动。对于这次活动,是KCL几位老师为wet-lab的博士们开设的数据分析课程。首次课程针对R语言零基础又对数据分析有着迫切需求的同学,课程为40人的小班教学,有两名教师轮番讲解并现场演练。课程提供免费早中餐及课件茶点,参与的学生多为生命科学领域的学生,地点选在了知名的Guys校区医院的Seminar Room。本次课程分为两天,第一天从常用的Excel入手,介绍科研中采用Excel输入及处理数据中的问题及弊端,引入R并简单入门。第二天从如何dataframe的一些常用操作入手,然后介绍了两个常用package:tidyverse和ggplot。


Lunch

课程感受

本次课程紧凑内容丰富,的确是初学者入门R的精彩课程。由于全英语授课,加之主要基于mac进行讲解,虽然有些自学基础,但到第二天的学习仍然感觉很吃力。基于来英已有仨月的生活经历,可以跟上老师的讲解,但由于坐在后排,代码看不十分清楚,加之敲代码及快捷键的使用并不熟悉,所以后期还是有难度,需要课下及时巩固学习。全班40名同学,遇到2位疑似华人学生,但由于他们英语讲的都很流利,也没好意思汉语交流,并不确定华人身份。课堂上外国学生很踊跃,反应也很快,旁边几位男生边听课边做着自己的数据分析和PPT,佩服他们超高的效率。旁边一位小姐姐也完全跟得上老师的脚步,并给我帮助很多,课间之余也跟她聊起了科研生活,更觉得自己该多下些功夫。

课堂笔记Day 1

1. Data organization in spreadsheets

1.1 Don'ts in spreadsheets

DON'T:

  • modify your raw data. Always make a copy before making any changes.
  • combine multiple values in one cell (units, numbers, etc)
  • never mess with your raw data: always work on a duplicate copy.
  • export as a text based file (csv or txt) so that R can read it.
  • make calculations. When you try to export that you will not export your formulae
  • Do not color code things. Computer does not care.
1.2 Names for columns:
  • do not use spaces
  • use UpperCaseLikeThis
  • use Underscore_to_separate_words
1.3 Dealing with missing values:
  • Do not use 0, because 0 is data sometimes
  • NA is the best
  • blank spaces also work
  • Do not combine columns
1.4 Dealing with Dates
  • Use buit-in functions
  • create a new column to split the year from month and day
  • use the formula =YEAR (#click on the cell you want to split)
  • double click on the right bottom of the cell where the little cross appears. It will apply the formula all the way down
  • reconstructing the date: =DATE(cell1; cell2; cell3) and this reconstructs the date based on the year, month and day
  • string format: a succession of numbers

2. Introduction to R

2.1 R and RStudio
  • R allows you to handle large datasets.It has lots of 'packages'.
  • R Studio is like an in-built computer to work R in a more user-friendly environment.
  • Every 'window' gives you information. The upper left corner is where you can write your script: you write your instructions like a lab protocol.The Console is the window in which you can execute your commands.The upper right is the Environment. Bottom right includes files, plots, packages and help.
  • Pipeline: one script after the other that takes you through all the actions that you need to do to deal with your data.
2.2 Advantages of R
  • it is free
  • it has thousands of functions built in- so that you don't have to do this!
  • it is much user friendly than other programs
  • there is a large community to ask questions (and you will get an answer!)
2.3 Tips to start using R
  • Be very organised. Make sub-folders that organise your project (data, outputs, figures, scripts).
  • A path shows you the way: this is a series of folders and subfolders to show you where your documents are.
  • Start a New Project: always whenever you are starting something new.
  • Start a new R Script: this is where you will type all of your commands- your script.
2.4 You can change the appearance in R
  • Windows: Options --> global options
  • Mac: In the tab 'R Studio' check 'Preferences'
2.5 Object
  • <- is the assign operator. This is how we assign a value to an object.
  • Shortcut: Windows/Linux: "Alt" + "-" | Mac: "Option" + "-".
  • Object names: with underscores, meaningful, and do not start with a number.
2.6 Useful Commands
  • getwd() : it shows you where you are in your computer, it tells you the working directory
  • setwd () Set working directory
  • ls() : it lists the things that are in your 'workspace'.
  • rm() : removes one object. THERE IS NO WAY OF RECOVERING IT!!
  • sqrt () : square root
  • round () : it rounds the number to whatever number of decimals that you want/need.
  • length () : it tells you the number of values in a vector.
  • class () : this tells you the type of object that you are dealing with
  • str () : this function tells you the structure of the object
  • ? #name of function It gives you the information about that function
  • print () it prints the value in the screen
  • (function) it prints the value in the screen
  • (#)This allows you to annotate your script
  • mean () It calculates the mean of a number
  • args (function) Args tells you the arguments of a function
  • c () Combines in one vector
  • [ ] Subsets elements from vectors. The order of the elements starts in 1.
  • ! means 'opposite'
2.7 Type of Data
  • character – text
  • Numeric (numbers)
  • integer - numbers without decimals
  • double - numbers with decimals
  • logical - TRUE or FALSE
  • In R there is a hierarchy about these types of data: logical → numeric → character ← logical
2.8 Vectors
  • This is another type of object in R.
  • This is just a series of values that you put together in an object using c .
2.9 Functions
  • A function is a command that executes some action in your input.
  • A function has 'arguments' in it: the things you input on your function so that it gets executed with your particular parameters.
2.10 Subsetting vectors
  • extract values for vector use [ ].
Conditional subsetting
  • AND: &
  • OR: |
  • Equal to: ==
  • More or equal: >=
  • Less or equal: <=
  • More: >
  • Less: <
  • %in%
2.11 Missing Data
  • Missing data as NA in vector.
  • na.rm = TRUE (ignore the missing data)
  • ( )[!is.na()]: extract those elements are not missing.
  • na.omit(): return with incomplete removed.
  • ()[complete.cases()]: return with complete.

下期预告

Starting with Data
Data Manipulation using dplyr and tidyr
Data visualization with ggplot2
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,530评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 86,403评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,120评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,770评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,758评论 5 367
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,649评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,021评论 3 398
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,675评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,931评论 1 299
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,659评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,751评论 1 330
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,410评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,004评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,969评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,203评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,042评论 2 350
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,493评论 2 343

推荐阅读更多精彩内容