A Practical Approach to Novel Class Discovery in Tabular Data: Appendix

Authors:

(1) Troisemaine Colin, Department of Computer Science, IMT Atlantique, Brest, France., and Orange Labs, Lannion, France;

(2) Reiffers-Masson Alexandre, Department of Computer Science, IMT Atlantique, Brest, France.;

(3) Gosselin Stephane, Orange Labs, Lannion, France;

(4) Lemaire Vincent, Orange Labs, Lannion, France;

(5) Vaton Sandrine, Department of Computer Science, IMT Atlantique, Brest, France.

Table of Links

Abstract and Intro

Related work

Approaches

Hyperparameter optimization

Estimating the number of novel classes

Full training procedure

Appendix A: Additional result metrics

Appendix B: Hyperparameters

Appendix C: Cluster Validity Indices numerical results

Appendix D: NCD k-means centroids convergence study

Appendix A Additional result metrics

Appendix B Hyperparameters

The Table B3 shows the hyperparameters found by the full procedure described in Section 6.

Appendix C Cluster Validity Indices numerical results

An estimate of the number of clusters in the 7 datasets considered in this paper can be found in Table C4. Among the 6 CVIs reported here, the Silhouette coefficient performed the best. Furthermore, compared to the original feature space, its average estimation error significantly decreased in the latent space, validating our approach. For some datasets, the Davies-Bouldin index continued to decrease and the Dunn index continued to increase as the number of clusters increased, resulting in very large overestimations. Note that the estimates of the number of novel classes in Table C4 are

not needed in the experiments of Section 7.2.2, since Algorithm 1 directly incorporates such estimates in the training procedure. This table has only helped us to identify the most appropriate CVI for our problem. The only exception is the TabularNCD method, which requires an a priori estimation of the number of novel classes in the original feature space.

Appendix D NCD k-means centroids convergence study

In this appendix, we aim to determine how to achieve the best performance with NCD k-means. Specifically, after the centroid initialization described in Section 3.2, we investigate: (1) whether it is more effective to update the centroids of both known and novel classes, or only the centroids of novel classes; (2) whether the centroids need to be updated using data from both known and novel classes, or only using data from novel classes. The results are presented in Table D5 and show that for 5 out of 7 datasets, the best results are obtained when only the centroids of the novel classes are updated on the unlabeled data. Updating the centroids of the known classes always leads to worse performance, as the class labels are not used in this process. Thus, the centroids of the known classes run the risk of capturing data from the novel classes (and vice versa).

A Practical Approach to Novel Class Discovery in Tabular Data: Appendix

Table of Links

Appendix A Additional result metrics

Appendix B Hyperparameters

Appendix C Cluster Validity Indices numerical results

Appendix D NCD k-means centroids convergence study

What do you think?

LA County’s Department of Public Health (DPH) data breach impacted over 200,000 individuals

Ransomware attack forces nearly 1,000 surgeries to be cancelled in UK capital

Spanish police arrested an alleged member of the Scattered Spider group

What is DevSecOps and Why is it Essential for Secure Software Delivery?

Spanish police arrested an alleged member of the Scattered Spider group

New TIKTAG attack exposed, specifically targeting Google Chrome and Linux systems

LA County’s Department of Public Health (DPH) data breach impacted over 200,000 individuals

Ransomware attack forces nearly 1,000 surgeries to be cancelled in UK capital

Spanish police arrested an alleged member of the Scattered Spider group

What is DevSecOps and Why is it Essential for Secure Software Delivery?

Spanish police arrested an alleged member of the Scattered Spider group

New TIKTAG attack exposed, specifically targeting Google Chrome and Linux systems

Leave a ReplyCancel reply

Cheats For Little Alchemy

3TB Of Mega.nz Links For Free Courses And E-Books 2022 (Updated)

How to Earn Money from FreeCash.com, Playing Games, Testing Apps, and Taking Surveys

The Carding Masterclass: A Complete Course Of Carding

Amplemarket (YC W14) Seeks Senior Software Engineer in Lisbon, Portugal, Hacker News

AntiPublic Myrz 0.83 – Cracked

Update: byte-stats.py Version 0.0.10

Unlocking Novel Class Discovery: Advances in NCD Algorithms and Hyperparameter Tuning

Table of Links

Appendix A Additional result metrics

Appendix B Hyperparameters

Appendix C Cluster Validity Indices numerical results

Appendix D NCD k-means centroids convergence study

What do you think?

Leave a ReplyCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections