Mining Top-K Click Stream Sequences Patterns

MEHDI Haj Ali, Qun-Xiong Zhu, Yan-Lin He

Abstract


Sequential pattern mining, it  is not just important in data mining field , but  it is the basis of many applications .However, running applications cost time and memory, especially when dealing with dense of the dataset. Setting the proper minimum support threshold is one of the factors that consume more memory and time. However ,  it is difficult for users to get the appropriate patterns, it may present too many sequential patterns  and makes it difficult for users to comprehend the results. The problem becomes worse and worse when dealing with long click stream sequences or huge dataset. As a solution, we developed an efficient algorithm, called TopK (Top-K click stream sequence pattern mining), which employs the output as top-k patterns , K is the most important and relevant frequencies (with a high support) . However ,our algorithm based on pseudo-projection to avoid consuming more time and memory, and uses several efficient search space pruning methods together with BI-Directional Extension. Our extensive study and experiments on real click stream datasets show TopK significantly outperforms the previous algorithms.


Keywords


Pattern mining ; stream sequence ; Sequence database; Top-k; Data mining

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v4.i3.pp655-664

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics