Difference between revisions of "Publications/diop.22.ieeebigdata"
From LRDE
(Created page with "{{Publication | published = true | date = 2022-12-12 | authors = Lamine Diop, Cheikh Talibouya Diop, Arnaud Giacometti, Dominique Li, Arnaud Soulet | booktitle = 2022 IEEE Int...") |
|||
(One intermediate revision by the same user not shown) | |||
Line 2: | Line 2: | ||
| published = true |
| published = true |
||
| date = 2022-12-12 |
| date = 2022-12-12 |
||
− | | authors = Lamine Diop, Cheikh Talibouya Diop, |
+ | | authors = Lamine Diop, Cheikh Talibouya Diop, GiacomettiArnaud, Dominique Li, Arnaud Soulet |
| booktitle = 2022 IEEE International Conference on Big Data (Big Data) |
| booktitle = 2022 IEEE International Conference on Big Data (Big Data) |
||
| title = Trie-based Output Itemset Sampling |
| title = Trie-based Output Itemset Sampling |
||
| address = Osaka, Japan |
| address = Osaka, Japan |
||
− | | abstract = Pattern sampling algorithms produce interesting patterns with a probability proportional to a given utility measure. Utility changes need quick re-preprocessing when sampling patterns from large databases. In this context, existing sampling techniques require storing all data in |
+ | | abstract = Pattern sampling algorithms produce interesting patterns with a probability proportional to a given utility measure. Utility changes need quick re-preprocessing when sampling patterns from large databases. In this context, existing sampling techniques require storing all data in memorywhich is costly. To tackle these issues, this work enriches D. Knuth's trie structure, avoiding 1) the need to access the database to sample since patterns are drawn directly from the enriched trie and 2) the necessity to reprocess the whole dataset when the utility changes. We define the trie of occurrences that our first algorithm TPSpace (Trie-based Pattern Space) uses to materialize all of the database patterns. Factorizing transaction prefixes compresses the transactional database. TPSampling (Trie-based Pattern Sampling), our second algorithm, draws patterns from a trie of occurrences under a length-based utility measure. Experiments show that TPSampling produces thousands of patterns in seconds. |
| pages = 1 to 10 |
| pages = 1 to 10 |
||
+ | | publisher = IEEE |
||
| lrdekeywords = IA |
| lrdekeywords = IA |
||
| lrdenewsdate = 2022-12-12 |
| lrdenewsdate = 2022-12-12 |
||
Line 43: | Line 44: | ||
thousands of patterns in seconds.<nowiki>}</nowiki>, |
thousands of patterns in seconds.<nowiki>}</nowiki>, |
||
pages = <nowiki>{</nowiki>1-10<nowiki>}</nowiki>, |
pages = <nowiki>{</nowiki>1-10<nowiki>}</nowiki>, |
||
+ | publisher = <nowiki>{</nowiki>IEEE<nowiki>}</nowiki>, |
||
doi = <nowiki>{</nowiki>10.1109/BigDataXXXX<nowiki>}</nowiki>, |
doi = <nowiki>{</nowiki>10.1109/BigDataXXXX<nowiki>}</nowiki>, |
||
note = <nowiki>{</nowiki>accepted<nowiki>}</nowiki> |
note = <nowiki>{</nowiki>accepted<nowiki>}</nowiki> |
Latest revision as of 16:55, 14 December 2022
- Authors
- Lamine Diop, Cheikh Talibouya Diop, GiacomettiArnaud, Dominique Li, Arnaud Soulet
- Where
- 2022 IEEE International Conference on Big Data (Big Data)
- Place
- Osaka, Japan
- Type
- inproceedings
- Publisher
- IEEE
- Keywords
- IA
- Date
- 2022-12-12
Abstract
Pattern sampling algorithms produce interesting patterns with a probability proportional to a given utility measure. Utility changes need quick re-preprocessing when sampling patterns from large databases. In this context, existing sampling techniques require storing all data in memorywhich is costly. To tackle these issues, this work enriches D. Knuth's trie structure, avoiding 1) the need to access the database to sample since patterns are drawn directly from the enriched trie and 2) the necessity to reprocess the whole dataset when the utility changes. We define the trie of occurrences that our first algorithm TPSpace (Trie-based Pattern Space) uses to materialize all of the database patterns. Factorizing transaction prefixes compresses the transactional database. TPSampling (Trie-based Pattern Sampling), our second algorithm, draws patterns from a trie of occurrences under a length-based utility measure. Experiments show that TPSampling produces thousands of patterns in seconds.
Bibtex (lrde.bib)
@InProceedings{ diop.22.ieeebigdata, author = {Diop, Lamine and Diop, Cheikh Talibouya and Giacometti, Arnaud and Li, Dominique and Soulet, Arnaud}, booktitle = {2022 IEEE International Conference on Big Data (Big Data)}, title = {Trie-based Output Itemset Sampling}, year = {2022}, address = {Osaka, Japan}, month = dec, abstract = {Pattern sampling algorithms produce interesting patterns with a probability proportional to a given utility measure. Utility changes need quick re-preprocessing when sampling patterns from large databases. In this context, existing sampling techniques require storing all data in memory, which is costly. To tackle these issues, this work enriches D. Knuth's trie structure, avoiding 1) the need to access the database to sample since patterns are drawn directly from the enriched trie and 2) the necessity to reprocess the whole dataset when the utility changes. We define the trie of occurrences that our first algorithm TPSpace (Trie-based Pattern Space) uses to materialize all of the database patterns. Factorizing transaction prefixes compresses the transactional database. TPSampling (Trie-based Pattern Sampling), our second algorithm, draws patterns from a trie of occurrences under a length-based utility measure. Experiments show that TPSampling produces thousands of patterns in seconds.}, pages = {1-10}, publisher = {IEEE}, doi = {10.1109/BigDataXXXX}, note = {accepted} }