<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-1415982618773243541</id><updated>2011-07-28T18:38:55.424-07:00</updated><category term='topic'/><category term='reference'/><category term='chapter'/><title type='text'>Blog for Data Mining Course @ NCU</title><subtitle type='html'>Join the &lt;a href="http://groups.google.com/group/dmCourse2007/"&gt;Discussion Board&lt;/a&gt; to earn more marks</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>15</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-1415982618773243541.post-7211005404536149634</id><published>2009-03-16T19:21:00.000-07:00</published><updated>2009-03-16T21:13:51.478-07:00</updated><title type='text'>Google Tech Talks on Statistial Aspects of Data Mining</title><content type='html'>The following are the links to the videos of the reading assignments. The detail of this video course is available on &lt;a href="http://www.stats202.com/"&gt;www.stats202.com&lt;/a&gt;.&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.youtube.com/watch?v=zRsMEl6PHhM"&gt;Statistical Aspects of Data Mining (Lecture 1)&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.youtube.com/watch?v=YFC2KUmEebc"&gt;Statistical Aspects of Data Mining (Lecture 2)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.youtube.com/watch?v=1HAAF4UT75o"&gt;Statistical Aspects of Data Mining (Lecture 3)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.youtube.com/watch?v=qBcI9WakS2o"&gt;Statistical Aspects of Data Mining (Lecture 4)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.youtube.com/watch?v=iXCPJNT9ZOQ"&gt;Statistical Aspects of Data Mining (Lecture 5)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.youtube.com/watch?v=XzxGnF_eiNo"&gt;Statistical Aspects of Data Mining (Lecture 6)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.youtube.com/watch?v=FoKxzorQIhU"&gt;Statistical Aspects of Data Mining (Lecture 7)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.youtube.com/watch?v=N5i85v0ckzY"&gt;Statistical Aspects of Data Mining (Lecture 8)&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.youtube.com/watch?v=xpuB9ydmBsM"&gt;Statistical Aspects of Data Mining (Lecture 9)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.youtube.com/watch?v=CzvgrcQhWGg"&gt;Statistical Aspects of Data Mining (Lecture 10)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.youtube.com/watch?v=l4a3e__QzoY"&gt;Statistical Aspects of Data Mining (Lecture 11)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.youtube.com/watch?v=fmZYH3rmqDQ"&gt;Statistical Aspects of Data Mining (Lecture 12)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.youtube.com/watch?v=-tWS0tN8sW0"&gt;Statistical Aspects of Data Mining (Lecture 13)&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1415982618773243541-7211005404536149634?l=dm07course.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/7211005404536149634/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1415982618773243541&amp;postID=7211005404536149634' title='40 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/7211005404536149634'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/7211005404536149634'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/2009/03/google-tech-talks-on-statistial-aspects.html' title='Google Tech Talks on Statistial Aspects of Data Mining'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>40</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1415982618773243541.post-6143260937102293028</id><published>2009-03-02T21:45:00.000-08:00</published><updated>2009-03-16T21:09:40.955-07:00</updated><title type='text'>Reading Assignments</title><content type='html'>The updated reading assignments is available on the following &lt;a href="http://sites.google.com/site/dataminingcourse2009/Home/reading-assignments"&gt;URL&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1415982618773243541-6143260937102293028?l=dm07course.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/6143260937102293028/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1415982618773243541&amp;postID=6143260937102293028' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/6143260937102293028'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/6143260937102293028'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/2009/03/reading-assignment-of-2009.html' title='Reading Assignments'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1415982618773243541.post-7866687003480986692</id><published>2009-02-22T18:50:00.000-08:00</published><updated>2009-02-24T00:45:35.508-08:00</updated><title type='text'>Course Syllabus for 2009</title><content type='html'>&lt;span style="color: rgb(0, 153, 0);"&gt;General Information&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Graduate course&lt;br /&gt;Term: Spring 2009&lt;br /&gt;Date and Time: Tuesday 2:00pm - 5:00pm&lt;br /&gt;Location: E6- A203&lt;br /&gt;Number of credits: 3 credits&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;&lt;br /&gt;Instructor: &lt;a href="http://www.csie.ncu.edu.tw/%7Echia"&gt;Chia-Hui Chang&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;Office: E6-B302&lt;br /&gt;Phone: 35302&lt;br /&gt;E-mail: chia at csie dot ncu dot edu dot tw&lt;br /&gt;Office Hours: Monday 3:00 - 4:50 or by appointment.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 204, 0);"&gt;Course Goals&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Introducing students to the basic concepts and techniques of data mining.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Learning by doing it, i.e. developing skills of using DM algorithms for knowledge discovery.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;To gain experience of doing &lt;b&gt;independent study and research.&lt;/b&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;Course Topics&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Preliminary (2~3weeks)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Predictive data mining (4 weeks)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Association rule mining (3 weeks)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Cluster analysis (3 weeks)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Sequence labling (2 weeks)&lt;/li&gt;&lt;li&gt;Learning to rank (2 weeks)&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;&lt;b&gt;Evaluations&lt;/b&gt;&lt;/span&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;ul&gt;&lt;li&gt;Two~Three Assignments: 40%&lt;/li&gt;&lt;li&gt;Project: 30%&lt;/li&gt;&lt;li&gt;Presence: 10%&lt;/li&gt;&lt;li&gt;Exam: 20%&lt;/li&gt;&lt;/ul&gt;&lt;span style="color: rgb(0, 153, 0);"&gt;&lt;b&gt;Teaching Assistant&lt;br /&gt;&lt;/b&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;Stanley Fan&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Email:&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Office Hour:&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1415982618773243541-7866687003480986692?l=dm07course.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/7866687003480986692/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1415982618773243541&amp;postID=7866687003480986692' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/7866687003480986692'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/7866687003480986692'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/2009/02/course-syllabus-for-2009.html' title='Course Syllabus for 2009'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1415982618773243541.post-5621285631659853945</id><published>2009-02-03T21:09:00.000-08:00</published><updated>2009-02-22T21:46:18.936-08:00</updated><title type='text'>Software</title><content type='html'>&lt;span style="color: rgb(51, 204, 0);"&gt;Commercial Software&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.xlminer.net/"&gt;XLMiner&lt;/a&gt;, Data Mining Add-In For Excel.  &lt;/li&gt;&lt;/ul&gt;&lt;span style="color: rgb(51, 204, 0);"&gt;Free and Shareware &lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.cs.waikato.ac.nz/ml/weka/index.html"&gt;Weka&lt;/a&gt;, collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform.   &lt;/li&gt;&lt;/ul&gt;&lt;a href="http://www.kdnuggets.com/software/suites.html"&gt;&lt;span style="color: rgb(51, 204, 0);"&gt;Other software&lt;/span&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1415982618773243541-5621285631659853945?l=dm07course.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/5621285631659853945/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1415982618773243541&amp;postID=5621285631659853945' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/5621285631659853945'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/5621285631659853945'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/2009/02/software.html' title='Software'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1415982618773243541.post-5148644191719671892</id><published>2009-02-03T19:22:00.000-08:00</published><updated>2009-02-22T19:32:46.086-08:00</updated><title type='text'>Book summary for Tan, Steinbach and Kumar's DM book</title><content type='html'>&lt;ol&gt;&lt;br /&gt;&lt;li&gt;&lt;span style="font-size:100%;"&gt;&lt;a href="http://dm07course.blogspot.com/2007/05/data-mining-chapter-6-overview.html"&gt;Data Mining Chapter 6 Overview&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;span style="font-size:100%;"&gt;&lt;/span&gt;&lt;li&gt; &lt;span style="font-size:100%;"&gt;&lt;a href="http://dm07course.blogspot.com/2007/05/data-mining-chapter-5-overview.html"&gt;Data Mining Chapter 5 Overview&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;span style="font-size:100%;"&gt;&lt;/span&gt;&lt;li&gt; &lt;span style="font-size:100%;"&gt;&lt;a href="http://dm07course.blogspot.com/2007/05/data-mining-chapter-4-overview.html"&gt;Data Mining Chapter 4 Overview&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;span style="font-size:100%;"&gt;&lt;/span&gt;&lt;li&gt; &lt;span style="font-size:100%;"&gt;&lt;a href="http://dm07course.blogspot.com/2007/03/data-mining-chapter-3-overview.html"&gt;Data Mining Chapter 3 Overview&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;span style="font-size:100%;"&gt;&lt;/span&gt;&lt;li&gt; &lt;span style="font-size:100%;"&gt;&lt;a href="http://dm07course.blogspot.com/2007/03/data-mining-chapter-2-overview.html"&gt;Data Mining Chapter 2 Overview&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;span style="font-size:100%;"&gt;&lt;/span&gt;&lt;li&gt; &lt;span style="font-size:100%;"&gt;&lt;a href="http://dm07course.blogspot.com/2007/03/data-mining-chapter-1-overview.html"&gt;Data Mining Chapter 1 Overview&lt;/a&gt;&lt;/span&gt;   &lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1415982618773243541-5148644191719671892?l=dm07course.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/5148644191719671892/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1415982618773243541&amp;postID=5148644191719671892' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/5148644191719671892'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/5148644191719671892'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/2009/02/course-summary-for-2007.html' title='Book summary for Tan, Steinbach and Kumar&apos;s DM book'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1415982618773243541.post-8978427747085688954</id><published>2007-05-07T22:55:00.001-07:00</published><updated>2007-05-07T22:55:52.042-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='chapter'/><title type='text'>Data Mining Chapter 6 Overview</title><content type='html'>本章介紹另一種廣泛應用於商業領域的探勘技術－「Association Mining」，Association Mining透過設定support threshold過濾掉不重要的itemset，限制confidence threshold以得到較高準確度的關聯規則。Association Mining的主要研究方向為如何有「效率」地找出frequent itemset及association rule，特別是在如何產生frequent itemset的研究議題中，有許多論文使用特殊的資料結構或演算法加快速度，其中最有名的演算法是Apriori。Apriori使用anti-monotone的特性，避免產生不必要的candidate itemset以減小search space。另一種不須列舉candidate itemset的方法是FP-Growth Algorithm，它能直接從FP-Tree擷取frequent itemset，但若各交易中的項目差異較大時，它須耗費相當大的記憶體，所以使用何種方法需評估目前的資料分佈情況。&lt;br /&gt;&lt;br /&gt;一般而言，使用support及confidence仍會產生大量的pattern，而許多pattern卻不是我們想要的，甚至會誤導我們，所以另一個研究主題為如何篩選出有意義的pattern。6.7節介紹許多domain-independent的measure，例如Interest Factor、IS Measure…等，並將這些measure依Symmetric or Asymmetric、Inversion Property、Null Addition Property及Scaling Property特性加以分類。沒有一個measure能適用於所有情況，所以我們應了解各個measure的特性及優缺點，如此才能選擇適當的measure來篩選出有意義的pattern。&lt;br /&gt;&lt;br /&gt;本章最後敘述我們在做Association Mining時，兩個可能發生的問題－「Simpson’s Paradox」及「Skewed Support Distribution」。發生Simpson’s Paradox時，表示我們忽略的某個因素，導致我們被得出的pattern所誤導。若要避免Simpson’s Paradox，可能需要擁有domain knowledge解讀關聯規則的能力，或是對該關聯規則下的資料分佈再加以分析。若發生Skewed Support Distribution時，一般會直覺地調降support threshold，但除了導致增加計算時間之外，也會產生相當多的cross-support pattern。6.8節提供一個具有anti-monotone特性的measure－「h-confidence」來過濾cross-support pattern，減少因過低的support threshold而產生的大量pattern。&lt;br /&gt;&lt;br /&gt;學習目標&lt;br /&gt;&lt;br /&gt;1. 了解各種產生frequent itemset的方法&lt;br /&gt;2. 了解各種measure的特性&lt;br /&gt;3. 了解何謂「Simpson’s Paradox」、「Skewed Support Distribution」、「cross-support pattern」&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1415982618773243541-8978427747085688954?l=dm07course.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/8978427747085688954/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1415982618773243541&amp;postID=8978427747085688954' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/8978427747085688954'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/8978427747085688954'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/2007/05/data-mining-chapter-6-overview.html' title='Data Mining Chapter 6 Overview'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1415982618773243541.post-6382425426044925390</id><published>2007-05-07T22:50:00.000-07:00</published><updated>2007-05-07T22:53:10.610-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='chapter'/><title type='text'>Data Mining Chapter 5 Overview</title><content type='html'>第5章延續上一章的分類主題，介紹其他常見的分類技術。從較基本的rule-based及nearest-neighbor classifier開始，再介紹較複雜的Bayesian classifier、ANN及SVM。&lt;br /&gt;&lt;br /&gt;Rule-based classifier的產生方式分為直接與間接兩種－－直接方法最主要使用sequential covering algorithm來產生規則，間接方法則從如從決策樹來轉換成規則。Rule-based classifier通常會拿來與decision tree比較，同學們應更進一步探討兩者之間的優缺點。 Nearest-neighbor classifier屬於instance-based learning，它最大的不同在於不需要建立模型，因此稱之為Lazy Learner。&lt;br /&gt;&lt;br /&gt;Bayesian Classifier使用條件機率的概念來建立模型，它特別適合資料即使擁有相同的屬性值，卻屬於不同的類別的情況。但要注意的是它假設屬性間的關係獨立，所以在某些情況下，其分類能力會不如於考慮屬性間關係的Bayesian Belief Networks。同學們在決定使用何種classifier時，應特別注意屬性間的關係程度所造成的影響。&lt;br /&gt;&lt;br /&gt;ANN模擬人類大腦的神經元運作模式來建立與其相似功能的模型，在使用此分類技術時，應先決定神經元數目、層級數等參數來決定模型的學習能力。理論上，ANN所逼近的函數具備學習任何事物的能力，但如何設定適當的參數是個難題。本章對ANN只有簡短的介紹，有興趣的同學可自行研讀其它書籍。&lt;br /&gt;&lt;br /&gt;SVM起源於統計理論，找出一個函數具備maximal margin hyperplane以達到最小化generalization error的目的。它可分為Linear SVM及Non-Linear SVM，而non Linear SVM特別適合處理非線性可區隔的資料，但SVM處理Multiclass Problem時需要建立多個classifier才能進行分類。&lt;br /&gt;&lt;br /&gt;本章除了介紹分類技術之外，也討論兩種Ensemble Methods以提高分類的準確度－－「bagging」與「boosting」，兩者的差別在於boosting會在每一回合結束後調整權重，使得學習效果更好。最後一個小節探討Class Imbalance Problem，有時我們會特別注意不常出現的類別，但許多分類技術卻會被佔大多數的類別所主導，此時我們常用的準確度衡量標準就不代表任何意義了。本節介紹許多替代的衡量標準，例如Confusion Matrix、F-Measure、ROC Curve，或是使用Cost-Sensitive Learning以避免發生嚴重的分類錯誤。&lt;br /&gt;&lt;br /&gt;學習目標&lt;br /&gt;&lt;br /&gt;1. 了解各種分類技術的特性，以求在不同的情況之下選擇適當的技術&lt;br /&gt;2. 了解Ensemble Method的流程及為何能提高分類準確度&lt;br /&gt;3. 有能力判斷在不同的情況下使用何種的衡量標準，而不是只會選擇Accuracy&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1415982618773243541-6382425426044925390?l=dm07course.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/6382425426044925390/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1415982618773243541&amp;postID=6382425426044925390' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/6382425426044925390'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/6382425426044925390'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/2007/05/data-mining-chapter-5-overview.html' title='Data Mining Chapter 5 Overview'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1415982618773243541.post-7762588329258176390</id><published>2007-05-07T22:49:00.000-07:00</published><updated>2007-05-07T22:50:34.663-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='chapter'/><title type='text'>Data Mining Chapter 4 Overview</title><content type='html'>我們已經在前二章學習了各種資料的型態及許多前置處理的方法，從本章開始，我們將進入探勘技術的相關演算法。分類是資料分析中最常見的工作之一，本章即從分類的基本概念開始，介紹其中最簡單且具代表性的分類技術－－決策樹。決策樹的建立主要在決定每個節點所採用的區隔條件，最常用的便是利用Information Gain來評估那種區隔屬性最佳。因此決策樹的建立對離散型資料來說最為容易，比較麻煩的是處理數值型屬性，這時我們必須決定如何分裂屬性才會獲得最好結果。&lt;br /&gt;&lt;br /&gt;分類模型的好壞可以準確率或錯誤率為評量，一般而言模型愈複雜或是訓練時間愈久，training error愈低，而testing error則可能減少(underfitting)，也可能增加(overfitting)。但由於testing error在訓練時是未知的，因此本章討論幾個預估generalization error的方法，做為training error的輔助，以避免可能發生的兩種狀況－overfitting及underfitting。&lt;br /&gt;&lt;br /&gt;本章最後介紹利用測試集(test data)來衡量不同分類模型的效能。當我們透過同一份測試集來比較不同模型時，若只以簡單的分類準確率來評斷好壞是不客觀的，尤其若是使用不同的測試集更是如此，所以此時我們可利用統計檢定來評斷不同模型之間的效能差異，是否存在statistically significant，藉以得到較客觀的結果。  &lt;br /&gt;&lt;br /&gt;學習目標&lt;br /&gt;1. 決策樹建立演算法&lt;br /&gt;- Splitting Evaluation: Entropy, Gini Index, Misclassification Error.&lt;br /&gt;- Generalization error的估計方法&lt;br /&gt;2. 了解overfitting及underfitting的問題&lt;br /&gt;3. 了解各種比較classifier的方法。&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1415982618773243541-7762588329258176390?l=dm07course.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/7762588329258176390/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1415982618773243541&amp;postID=7762588329258176390' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/7762588329258176390'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/7762588329258176390'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/2007/05/data-mining-chapter-4-overview.html' title='Data Mining Chapter 4 Overview'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1415982618773243541.post-4015607400389358703</id><published>2007-03-22T21:37:00.000-07:00</published><updated>2007-03-22T21:38:39.490-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='chapter'/><title type='text'>Data Mining Chapter 3 Overview</title><content type='html'>本章可視為前一章主題「Data」的延伸，最主要是針對資料進行探索式分析，用以了解資料的特性，並幫助我們選擇適當的前置處理及後續資料資料的技術。本章包含三個重要的主題－Summary Statistics、Visualization和OLAP。Summary Statistics是利用統計常用的一些measure來描述資料集的特性；Visualization是利用人類比機器所擁有更強大功能的眼睛來看出資料的特性，因此如何將資料視覺化，讓我們能更輕易地觀察出資料的特性，是第二部份的挑戰；OLAP 操作通常附屬於資料庫或資料倉儲，它除了包含前面二者(Summary Statistics、Visualization)的功能之外，還提供分析高維度資料集Data Cube的五項操作(Slicing、Dicing、Roll-Up、Drill-Down及Pivot)。基本上，本章內容並不難。圖的部分不僅要能看懂，更要曉得該圖所要傳達的是何種資料的特性及關係。OLAP部分則會出作業以供同學們熟悉。 &lt;br /&gt;&lt;br /&gt;學習目標 &lt;br /&gt;&lt;br /&gt;1.      了解各種統計measure所代表的意義 &lt;br /&gt;          variance, covariance, correlation, the p%th percentile&lt;br /&gt;2.      了解各種現有資料視覺化方法及其優缺點 &lt;br /&gt;          parallel coordinate, star coordinate&lt;br /&gt;3.      了解OLAP及其操作元如何做為資料視覺化工具 &lt;br /&gt;          multi-dimensional data model, cube operations: slice and dice, roll-up and drill-down&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1415982618773243541-4015607400389358703?l=dm07course.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/4015607400389358703/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1415982618773243541&amp;postID=4015607400389358703' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/4015607400389358703'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/4015607400389358703'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/2007/03/data-mining-chapter-3-overview.html' title='Data Mining Chapter 3 Overview'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1415982618773243541.post-7852275654530966447</id><published>2007-03-09T17:01:00.000-08:00</published><updated>2007-03-09T17:13:18.112-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='chapter'/><title type='text'>Data Mining Chapter 2 Overview</title><content type='html'>一般而言，在進行探勘工作之前必須先評估所輸入資料，並做適當的前置處理工作，以便能應用在後續的探勘工作。本章將從資料本身的型態開始討論，敘述各型態資料的特性；其次探討三個常見的資料品質問題，並進行Cleaning。另外為了使資料適用於特定的探勘技術，我們通常要做資料的前置處理，例如Data Reduction、Feature Selection、Normalization、Discretization等來轉換原始資料。最後則是資料彼此間的關係(Similarity、Dissimilarity)的計算方法。概括而言，本章內容尚屬概念性介紹，並無較難理解的演算法，唯一的數學是統計的PCA及線代的SVD，請同學多花一點時間讀Appendix。&lt;br /&gt;&lt;br /&gt;學習目標&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;了解何謂data matrix, transactional data, sequential data, sequence data及time series data&lt;/li&gt;&lt;li&gt;區分不同的資料屬性based on 4 properties: distinctness, order, addition, multiplication &lt;/li&gt;&lt;li&gt;資料品質的問題：outlier&amp;amp;noise, missing data, duplicate &lt;/li&gt;&lt;li&gt;資料前置處理工作： &lt;/li&gt;&lt;ul&gt;&lt;li&gt;Data Reduction by Samping&lt;/li&gt;&lt;li&gt;Dimension Reduction via PCA&lt;/li&gt;&lt;li&gt;Feature Selection by specific filtering algorithms&lt;/li&gt;&lt;li&gt;Feature Creation by feature extraction, mapping to new space, feature construction&lt;/li&gt;&lt;li&gt;Normalization&lt;li&gt;Discretization by equal width, equal depth, or clustering&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Proximity Measure (distance and similarity) &lt;/li&gt;&lt;ul&gt;&lt;li&gt;Similarity: Jaccard measure, cosine measure, pearson's correlatio&lt;/li&gt;&lt;li&gt;Distance: Euclidean, mahalanobis distance&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;其他可能遭遇問題：例如 &lt;/li&gt;&lt;ul&gt;&lt;li&gt;資料量過多或過少時&lt;/li&gt;&lt;li&gt;Attributes超過演算法處理上限&lt;/li&gt;&lt;li&gt;使用Regression時只能輸入數值型屬性的資料 &lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1415982618773243541-7852275654530966447?l=dm07course.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/7852275654530966447/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1415982618773243541&amp;postID=7852275654530966447' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/7852275654530966447'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/7852275654530966447'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/2007/03/data-mining-chapter-2-overview.html' title='Data Mining Chapter 2 Overview'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1415982618773243541.post-5382763499961851629</id><published>2007-03-09T16:39:00.000-08:00</published><updated>2007-03-09T16:41:41.865-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='chapter'/><title type='text'>Data Mining Chapter 1 Overview</title><content type='html'>本章主要目的是讓初學者對資料探勘有初步的認識。書中一開始從人類為了要迅速掌握大量資料所隱藏的意義的角度來切入，並舉了許多Data Mining的應用來彰顯Data Mining的必要性及其威力。1.4節概述Data Mining Task可分為Predictive task及Descriptive task，再各自利用對資料不同的觀點來分成六種task(Classification、Clustering、Association analysis、Regression、Sequential Pattern Discovery及Deviation Detection)。最後本章也提及Data Mining於未來發展所將面臨的挑戰。&lt;br /&gt;&lt;p&gt;學習目標 &lt;/p&gt;&lt;ul&gt;&lt;li&gt;什麼是Data Mining? 其挖掘知識的程序為何? (ppt 5) &lt;/li&gt;&lt;li&gt;為什麼我們需要Data Mining? 它能帶來什麼好處? (ppt 2 3 4) &lt;/li&gt;&lt;li&gt;六個常用 Data Mining Task的定義&lt;br /&gt;Classification (ppt 10)&lt;br /&gt;Clustering (ppt 17)&lt;br /&gt;Association Rule Discovery (ppt 23)&lt;br /&gt;Sequential Pattern Discovery (ppt 27)&lt;br /&gt;Regression (ppt 29)&lt;br /&gt;Deviation Detection (ppt 30)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1415982618773243541-5382763499961851629?l=dm07course.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/5382763499961851629/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1415982618773243541&amp;postID=5382763499961851629' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/5382763499961851629'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/5382763499961851629'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/2007/03/data-mining-chapter-1-overview.html' title='Data Mining Chapter 1 Overview'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1415982618773243541.post-4961494831969722777</id><published>2007-03-02T00:58:00.000-08:00</published><updated>2007-03-02T01:02:41.163-08:00</updated><title type='text'>Tentative Schedule</title><content type='html'>&lt;ul&gt;&lt;li&gt;Week 1 to 8: Instructor Teaching &lt;/li&gt;&lt;li&gt;Week 9: Mid-term Exam &lt;/li&gt;&lt;li&gt;Week 10 to 12: Reading Assignment&lt;/li&gt;&lt;li&gt;Week 13 to 15: Case Study&lt;/li&gt;&lt;li&gt;Week 16 to 18: Project presentation &lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1415982618773243541-4961494831969722777?l=dm07course.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/4961494831969722777/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1415982618773243541&amp;postID=4961494831969722777' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/4961494831969722777'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/4961494831969722777'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/2007/03/tentative-schedule.html' title='Tentative Schedule'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1415982618773243541.post-3494290396633652253</id><published>2007-03-02T00:56:00.000-08:00</published><updated>2007-03-02T00:58:18.561-08:00</updated><title type='text'>Grading Policy</title><content type='html'>&lt;ul&gt;&lt;li&gt;Homework (25%) &lt;/li&gt;&lt;li&gt;Mid-term (25%) &lt;/li&gt;&lt;li&gt;Paper presentation (20%) &lt;/li&gt;&lt;li&gt;Final project (20%) &lt;/li&gt;&lt;li&gt;Course involvement (10%)&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1415982618773243541-3494290396633652253?l=dm07course.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/3494290396633652253/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1415982618773243541&amp;postID=3494290396633652253' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/3494290396633652253'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/3494290396633652253'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/2007/03/grading-policy.html' title='Grading Policy'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1415982618773243541.post-8240848257911132936</id><published>2007-02-26T21:34:00.000-08:00</published><updated>2009-07-23T10:46:47.261-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='topic'/><title type='text'>Course Syllabus for 2007</title><content type='html'>&lt;span style="color: rgb(255, 102, 0);"&gt;Course Topics&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Introduction &lt;/li&gt;&lt;li&gt;Data Preprocessing &lt;/li&gt;&lt;li&gt;Exploring Data &lt;/li&gt;&lt;li&gt;Classication: Basic Concepts, Decision Trees, and Model Evaluation &lt;/li&gt;&lt;li&gt;Classication: Alternative Techniques &lt;/li&gt;&lt;li&gt;Association Analysis: Basic Concepts and Algorithms &lt;/li&gt;&lt;li&gt;Association Analysis: Advanced Concepts &lt;/li&gt;&lt;li&gt;Cluster Analysis: Basic Concepts and Algorithms &lt;/li&gt;&lt;li&gt;Cluster Analysis: Additional Issues and Algorithms  &lt;/li&gt;&lt;/ul&gt;&lt;a style="color: rgb(255, 102, 0);" href="http://dm07course.blogspot.com/2007/03/tentative-schedule.html" target="_new"&gt;Schedule&lt;/a&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Week 1 to 8: Instructor Teaching &lt;/li&gt;&lt;li&gt;Week 9: Mid-term Exam &lt;/li&gt;&lt;li&gt;Week 10 to 12: Reading Assignment&lt;/li&gt;&lt;li&gt;Week 13 to 15: Case Study&lt;/li&gt;&lt;li&gt;Week 16 to 18: Project presentation &lt;/li&gt;&lt;/ul&gt;&lt;a style="color: rgb(255, 102, 0);" href="http://dm07course.blogspot.com/2007/02/course-materials.html" target="_new"&gt;Textbooks&lt;/a&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Introduction to Data Mining, Pearson International Edition, 2005. Slides: &lt;a href="http://www-users.cs.umn.edu/%7Ekumar/dmbook/index.php"&gt;http://www-users.cs.umn.edu/~kumar/dmbook/index.php&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Data Mining: Concepts and Techniques, J. Han and M. Kamber, Morgan Kaufmann , 2000. Slides: &lt;a href="http://www.cs.sfu.ca/%7Ehan/DM_Book.html"&gt;http://www.cs.sfu.ca/~han/DM_Book.html&lt;/a&gt; &lt;/li&gt;&lt;li&gt;Ethem Alpaydin, Introduction to Machine Learning, MIT Press, 2004. Slides: &lt;a href="http://www.cs.umd.edu/class/spring2004/cmsc726/courseTopicsPage.html"&gt;http://www.cs.umd.edu/class/spring2004/cmsc726/courseTopicsPage.html&lt;/a&gt; &lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;a style="color: rgb(255, 102, 0);" href="http://dm07course.blogspot.com/2007/03/grading-policy.html" target="_new"&gt;Grading Policy&lt;/a&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Homework (25%) &lt;/li&gt;&lt;li&gt;Mid-term (25%) &lt;/li&gt;&lt;li&gt;Paper presentation (20%) &lt;/li&gt;&lt;li&gt;Final project (20%) &lt;/li&gt;&lt;li&gt;Course involvement (10%)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;p&gt; &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1415982618773243541-8240848257911132936?l=dm07course.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/8240848257911132936/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1415982618773243541&amp;postID=8240848257911132936' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/8240848257911132936'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/8240848257911132936'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/2007/02/course-topics.html' title='Course Syllabus for 2007'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1415982618773243541.post-4095096440916812530</id><published>2007-02-26T21:28:00.000-08:00</published><updated>2007-03-02T00:14:40.683-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='reference'/><title type='text'>Textbooks</title><content type='html'>&lt;ul&gt;&lt;li&gt;Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Introduction to Data Mining, Pearson International Edition, 2005. Slides: &lt;a href="http://www-users.cs.umn.edu/~kumar/dmbook/index.php"&gt;http://www-users.cs.umn.edu/~kumar/dmbook/index.php&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Data Mining: Concepts and Techniques, J. Han and M. Kamber, Morgan Kaufmann , 2000. Slides: &lt;a href="http://www.cs.sfu.ca/~han/DM_Book.html"&gt;http://www.cs.sfu.ca/~han/DM_Book.html&lt;/a&gt; &lt;/li&gt;&lt;li&gt;Ethem Alpaydin, Introduction to Machine Learning, MIT Press, 2004. Slides: &lt;a href="http://www.cs.umd.edu/class/spring2004/cmsc726/courseTopicsPage.html"&gt;http://www.cs.umd.edu/class/spring2004/cmsc726/courseTopicsPage.html&lt;/a&gt; &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1415982618773243541-4095096440916812530?l=dm07course.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://dm07course.blogspot.com/feeds/4095096440916812530/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1415982618773243541&amp;postID=4095096440916812530' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/4095096440916812530'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1415982618773243541/posts/default/4095096440916812530'/><link rel='alternate' type='text/html' href='http://dm07course.blogspot.com/2007/02/course-materials.html' title='Textbooks'/><author><name>Jahui</name><uri>http://www.blogger.com/profile/04407009593178832508</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
