Full text
![]() |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) |
Reviriego Vasallo, Pedro ORCID: https://orcid.org/0000-0003-2273-1341, Pontarelli, Salvatore and Martínez, Jorge
(2023).
Bitwise Signature Comparison: Enabling More Efficient Similarity Estimation.
"IEEE Transactions on Emerging Topics in Computing", v. 11
(n. 3);
pp. 798-804.
https://doi.org/10.1109/TETC.2022.3221872.
Title: | Bitwise Signature Comparison: Enabling More Efficient Similarity Estimation |
---|---|
Author/s: |
|
Item Type: | Article |
Título de Revista/Publicación: | IEEE Transactions on Emerging Topics in Computing |
Date: | 2023 |
Volume: | 11 |
Subjects: | |
Faculty: | E.T.S.I. Telecomunicación (UPM) |
Department: | Ingeniería de Sistemas Telemáticos |
UPM's Research Group: | Internet de Nueva Generación |
Creative Commons Licenses: | None |
![]() |
PDF
- Requires a PDF viewer, such as GSview, Xpdf or Adobe Acrobat Reader
Download (1MB) |
Estimating the similarity of sets of data is a common operation in computing. Minhash is widely used to estimate similarity by computing a signature for each set and then comparing their signatures. Therefore, signature comparison is an important part of similarity estimation. To make the comparison efficient, the size of the signature components is commonly set to the word size of the processor or to one half or one fourth of it. This enables efficient data manipulation and comparison but is not optimal in terms of storage. For example, 48-bit signatures may be more than enough in many applications but since that size cannot be easily manipulated by most processors, 64-bit signatures are used. This implies a 33.3\% memory overhead. In this paper, Bitwise Signature Comparison (BSC), a method that enables the efficient comparison of signature components of any bitwidth is presented and evaluated. The results show that BSC achieves a similar speed to that of the traditional comparison implementation regardless of the size of the signature components. This enables the use of any signature component size enabling better trade-offs in the implementation of similarity estimation sketches.
Item ID: | 76678 |
---|---|
DC Identifier: | https://oa.upm.es/76678/ |
OAI Identifier: | oai:oa.upm.es:76678 |
DOI: | 10.1109/TETC.2022.3221872 |
Official URL: | https://ieeexplore.ieee.org/document/9954421 |
Deposited by: | Profesor Pedro Reviriego |
Deposited on: | 20 Nov 2023 07:37 |
Last Modified: | 20 Nov 2023 07:37 |