COMP9313_WEEK1_1_课程简介

COMP9313(Big Data Management)课程相关介绍

教师:Doctor. 曹欣

Email:自行搜索

简历:浙江大学计算机学院本科,硕士;南洋理工大学计算机学院博士。

Paper数:22篇

兴趣方向:Data Management (in particular, on geo-textual data), Databases, Information Retrieval, and Data Mining

现在从事研究:

1) Filtering geo-textual data stream, e.g., geo-tagged tweets (SIGMOD13, ICDE15)

2) Keyword-aware route planning (PVLDB12, IJCAI15)

3) Efficient processing of spatial keyword queries (PVLDB10, SIGMOD11, PVLDB14, SIGMOD15, TODS15, PVLDB16, and an invited paper in ER12)

4) Mining significant semantic locations from user generated GPS data (PVLDB10)

5) Link structure analysis (PVLDB10, SIGMOD17)

曾经从事研究:

1)Using categorization information to improve question search in community based question answering services (CIKM09, WWW10, TOIS12)

2)Indoor distance-aware query processing (ICDE12)

3)Streaming graph clustering (ICDE16)

Tutor’s Email: 自行搜索

目的:

This course aims to introduce you to the concepts behind Big Data, the core technologies used in managing large-scale data sets, and a range of technologies for developing solutions to large-scale data analytics problems.

This course is intended for students who want to understand modern large-scale data analytics systems. It covers a wide range of topics and technologies, and will prepare students to be able to build such systems as well as use them efficiently and effectively to address challenges in big data management.

课程lecture:

Lectures focusing on the frontier technologies on big data management and the typical applications

Try to run in more interactive mode and provide more examples

A few lectures may run in more practical manner (e.g., like a lab/demo) to cover the applied aspects

Lecture length varies slightly depending on the progress (of that lecture) l

课本:

1)Hadoop: The Definitive Guide. Tom White. 4th Edition - O'Reilly Media

2)Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman, Jeff Ullman. 2nd edition - Cambridge University Press

3)Data-Intensive Text Processing with MapReduce. Jimmy Lin and Chris Dyer. University of Maryland, College Park.

4)Learning Spark . Matei Zaharia, Holden Karau, Andy Konwinski, Patrick Wendell. O'Reilly Media

参考资料:

1)Apache MapReduce Tutorial

2)Apache Spark Quick Start

课程囊括topics:

1)Topic 1. Big data management tools

Apache Hadoop

MapReduce

YARN/HDFS/HBase/Hive/Pig (briefly introduced)

Spark

AWS platform

Mahout [tentative]

2)Topic 2. Big data typical applications

Finding similar items

Graph data processing

Data stream mining

Recommender Systems

预备知识:

1)have experiences and good knowledge of algorithm design (equivalent to COMP9024 )

2)have a solid background in database systems (equivalent to COMP9311)

3)have solid programming skills in Java

4)be familiar with working on a Unix-style operating systems

5)have basic knowledge of linear algebra (e.g., vector spaces, matrix multiplication), probability theory and statistics , and graph theory

课程预期结果:

1)elaborate the important characteristics of Big Data

2)develop an appropriate storage structure for a Big Data repository

3)utilize the map/reduce paradigm and the to manipulate Big Data

4)utilize the Spark platform to manipulate Big Data

5)develop efficient solutions for analytical problems involving Big Data

课程作业及计分机制:

课程作业及计分机制

4个project:

1 warm-up programming project on Hadoop MapReduce

1 harder project on Hadoop MapReduce

1 project on Spark

1 project on AWS (MapReduce/Spark)

由于CSE电脑的运行环境是Linux,因此:

Use Linux/command line (virtual machine image will be provided)

Projects marked on Linux servers

You need to be able to upload, run, and test your program under Linux

作业上传:

Use Give to submit (either command line or web page)

Classrun. Check your submission, marks, etc. Read https://wiki.cse.unsw.edu.au/give/Classrun

(注意,作业延迟上交,第一天10% penalty,后面按照30%penalty)

Final Exam:

1)Double Pass, final >= 40%

2)Final written exam (100 pts)

课程计划表:

Schedule

Laboratory:(一共11个)

5 labs on MapReduce;3 labs on Spark;1 lab on high level MapReduce tools;1 lab on AWS;1 lab on big data machine learning platform [tentative]

运行环境安装:(使用虚拟机安装)

1)Pure Xubuntu 14.04: <u>http://www.cse.unsw.edu.au/~z3515164/Raw_Xubuntu.zip</u>

2)Xubuntu 14.04 with pre-installed Hadoop and Eclipse plugin: <u>http://mirror.cse.unsw.edu.au/pub/cs9313/Xubuntu.zip</u>

安装步骤:

(1)Download the zip file and uncompress it, and rename the file "xubuntu-disk.vmdk" as "xubuntu-disk2.vmdk“

(2)Open VirtualBox, File->Import Applicance

(3)Browse the image folder, select the "*.ovf" file

(4)The image will be imported to your computer, which may take 10 minutes

(5)comp9313 is used as both username and password. The hadoop installation path is the same as in the virtual machine on lab computers.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 194,088评论 5 459
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 81,715评论 2 371
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 141,361评论 0 319
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 52,099评论 1 263
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 60,987评论 4 355
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 46,063评论 1 272
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 36,486评论 3 381
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 35,175评论 0 253
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 39,440评论 1 290
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 34,518评论 2 309
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 36,305评论 1 326
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 32,190评论 3 312
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 37,550评论 3 298
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 28,880评论 0 17
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,152评论 1 250
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 41,451评论 2 341
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 40,637评论 2 335