Inside NeurIPS 2025: The Year’s AI Research…

Jay Alammar

Nov 3, 2025

Using Cohere's Command A Reasoning and Embed 4 to Visualize the ~6,000 Papers Accepted to NeurIPS 2025

Read →

5 Comments

Jack Penzer

Nov 3

Posting some thoughts here as we've tackled similar challenges over the years:

"Clustering workflows that better address noise. I know HDBSCAN is useful here. I often still lean towards a first K-Means step, but then disregard clusters with a few number of papers as noise, because they might not actually coherently be a part of a coherent grouping despite their semantic similarity."

I find the main issue with HBDSCAN is that areas of low density are considered noise, but often in unstructured data it's the highest density areas that are actually noise (Slop, bot spam and such). Which are virtually guaranteed to be labelled as clusters so long as there are enough nearest neighbours to satisfy the value of `min_cluster_size = `.

Conversely, kMeans assumes nothing is noise, and there is always noise.

So I prefer to run both kMeans and HDBSCAN over the data inpdendently, kMeans with a low-ish value, and then see how the HDBSCAN clusters fall within the kMeans clusters. I find it tends to be the case that:

- HDBSCAN clusters have high precision, low recall

- kMeans clusters have high recall, low precision

- ~100% of individual HDBSCAN clusters tend to fall into single kMeans clusters

- Similar HDBSCAN clusters fall into the same kMeans cluster

If we can label the HDBSCAN clusters appropriately, we should be able to use those HDBSCAN clusters to label the kMeans clusters. Eventually we end up with labelled high-level clusters/topics(kMeans) and specific cluster/subtopics (HDBSCAN).

There's still a challenge with what to do with the HDBSCAN outliers, at this point I prefer to evaluate the HDBSCAN -1 category as k-different noise categories. And then determine whether the `kmeans_cluster == 1 and hdbscan_cluster == -1` is itself clearly part of the named kmeans cluster, or whether it appears to be genuine noise. If the -1 is noisy, drop them from the kMeans cluster. Now our kMeans clusters will tend towards higher precision and lower recall.

Radomir

Dec 4

I have created a simple browser: https://kotwic4.github.io/Neurips2025Papers/

Radomir

Dec 2

It is amazing, however it lack for me list view, so i could look at papers that way. It would be also amazing if there would be options to filter dots to for example specific category or specific poster session at neurips. Can the data be downloaded in some easy way for example to make such custom views?

Ashish IBM

Nov 7

Great list thanks for sharing this is big list of gem if reading

Aristotelis K

Nov 7

Very cool application of topic modelling. The approach is very reminiscent of Anthropic’s recent CLIO paper.

Language Models & Co.

Inside NeurIPS 2025: The Year’s AI Research…