Interval Set Clustering of Web Users with
			Rough K-means

		Pawan Lingras and Chad West

Data collection and analysis in web mining faces certain unique 
challenges. Due to a variety of reasons inherent in web browsing
and web logging, the likelihood of bad or incomplete data is higher
than conventional applications. The analytical techniques in web
mining need to accommodate such data. Fuzzy and rough sets provide the
ability to deal with incomplete and approximate information.
Fuzzy set theory has been shown to be useful in three important
aspects of web and data mining, namely clustering, association,
and sequential analysis. However, there is limited research
on clustering based on rough set theory. Clustering is an important part 
of web mining that involves finding natural groupings of web resources 
or web users. Researchers have pointed out some important differences 
between clustering in conventional applications and clustering in web 
mining. For example, the clusters and associations in web mining do not 
necessarily have crisp boundaries. As a result, researchers have studied 
the possibility of using fuzzy sets in web mining clustering applications.
Recent attempts have used genetic algorithms based on
rough set theory for clustering. However, the genetic algorithms
based clustering may not be able to handle large amount of data typical
in a web mining application. This paper proposes a variation
of the K-means clustering algorithm based on properties of rough sets.
The proposed algorithm represents clusters as interval or rough sets.
The paper also describes the design of an experiment including
data collection and the clustering process. The experiment is used to 
create interval set representations of clusters of web visitors.