| |
Instructor: Prof.
S. Muthu Muthukrishnan
Topic: Data Mining on Streams
With the vast development of
Internet Techonology, there is large amount of data transmitting online
with very fast speed. Traditional data mining algorithms can no long satisfy
our new efficiency and accuracy requirements. Since the data generating
speed is much faster than the traditional mining speed, we strongly feel
that we have no enough time to deal with all these data, also our unbounded
memory requirements can no long fit in the limited memory space. All such
constraints stimulate us to find out more appropriate algorithms to mininig
on streams with constant memory and within constant time. And we have to
require all data can be seen at most once. Here algorithm efficiency is
very important for us. All coming data may represent a very complex model.
If we only use samples, we may only get very simple model, which cannt
represent the real currently accurate model.
And for time-changing data,
concept may drift with the time being. This requires our streaming model
can detect all such changes, and refine the model simultaneously.
Reading List:
Traditional
Learning Algorithms
Online
Learning Algorithms
Streaming
Algorithms
-
B. Babcock, S. Babu, M. Datar,
R. Motwani, J. Widom. Models
and Issues in Data Stream Systems[survey].
-
P. Domingos, G. Hulten. Mining
High-Speed Data Streams. Proceedings of the Sixth International Conference
on Knowledge Discovery and Data Mining (pp. 71-80), 2000. Boston, MA: ACM
Press.
-
P. Domingos, G. Hulten, L. Spencer.
Mining Time-Changing
Data Streams. Proceedings of the Seventh International Conference on
Knowledge Discovery and Data Mining (pp. 97-106), 2001. San Francisco,
CA: ACM Press.
-
P. Domingos, G. Hulten. A
General Method for Scaling Up Machine Learning Algorithms and its Application
to Clustering. Proceedings of the Eighteenth International Conference
on Machine Learning (pp. 106-113), 2001. Williamstown, MA: Morgan Kaufmann.
-
P. Domingos, G. Hulten. Catching
Up with the Data: Research Issues in Mining Data Streams. Workshop
on Research Issues in Data Mining and Knowledge Discovery, 2001.
-
S. Guha, N. Mishra, R. Motwani,
L. O'Callaghan. Clustering
Data Streams. IEEE Symposium on Foundations of Computer Science, 2000.
-
A. C. Gilbert, Y. Kotidis, S.
M. Muthukrishnan, M. J. Strauss. Surfing
Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries.
The VLDB Journal, 2001.
-
Jon Kleinberg. Bursty
and Hierarchical Structure in Streams[Concept Drift]. Cornell. Proceedings
of the 8th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, 2002.
|