Skip to content

Conversation

@TheYorouzoya
Copy link
Contributor

@TheYorouzoya TheYorouzoya commented Aug 15, 2025

Closes #13476: Add support for automatic ICORE conference ranking lookup

This PR adds the required feature to enable ICORE conference ranking lookups whenever a BibTeX entry includes a conference title.

Task list mentioned in the original issue:

  • Move DuplicateCheck#similarity from org.jabref.logic.database to a new utility class: org.jabref.logic.util.strings.StringSimilarity.
  • Add ICORE.csv to src/main/resources/
  • Create a class that loads and indexes the ICORE data at instantiation.
  • Implement the logic to detect and return the ranking from a full conference name.
  • Create a new field: ICORANKING ('icoranking') and add it to org.jabref.model.entry.field.StandardField at the // JabRef-specific fields section.
  • Create a new field editor ICoreRankingEditor (inspired by IdentifierEditor) and integrate it into the UI by modifying FieldEditors#getForField(...). - The lookup button (like DOI lookup) to lookup the ranking in the CSV file
  • Write unit tests for acronym matching and similarity fallback.

Steps to test

  1. By default, the Icoreranking field shows up in the General Tab under the DOI field.
image
  1. Add a New Entry of type InProceedings and enter a conference acronym (in parentheses) in the Booktitle field. Then, navigate to the General Tab again and click the lookup rank button to see the ICORE rank for the conference.
image image
  1. Clicking the Open Conference Page button will open your default browser and take you to the ICORE conference page for the conference (for SIGCOMM in the screenshot, it would be here.

  2. In case an acronym isn't present in the title, the tool will then try to lookup the entire Booktitle in the ranking data, with a fuzzy match fallback of 90% similarity.

image image image
  1. The feature allows lookups for InProceedings, InCollection, and Article entry types and looks for conference titles in Booktitle, Journaltitle, or Title fields.
image image
  1. In case an acronym is present but it doesn't match anything, the feature will still fallback to searching for the entire title string. If a match is not found for the full title either, a notification with "not found" will be displayed and the Open Conference Page Button will be disabled.
image image

Some caveats:

  • The feature will always look for the acronym in the first, deepest set of parentheses it encounters. This is a direct consequence of how the regex in the ConferenceAcronymExtractor works (see related tests for details). Some examples to illustrate this:
    • (This doesn't get pulled (this does)) -> this does
    • (First) acronym is pulled, not the (second) one. -> First
    • (This doesn't (I DO)) and (this won't (either)). -> I DO

Mandatory checks

  • I own the copyright of the code submitted and I license it under the MIT license
  • Change in CHANGELOG.md described in a way that is understandable for the average user (if change is visible to the user)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (if change is visible to the user)
  • TODO Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.

It seems that new rules expect only 6 points here.

double exactMatch = 1.0;
double similarity = similarityChecker.similarity(a, b);

assertTrue(similarity >= EPSILON_SIMILARITY && similarity < exactMatch);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using assertTrue with a boolean condition instead of asserting the actual contents. Should compare the actual similarity value with expected bounds using assertEquals.

@TheYorouzoya
Copy link
Contributor Author

For the last few days, I've just been browsing the code, reading the docs, and interacting with the application on my local machine. Since this is the first time I'm interacting with the JabRef ecosystem, and ICORE by extension, I have some questions regarding the app itself and the feature's use-case. I'll post each one as a separate comment. Apologies if some of these are too obvious.

@TheYorouzoya
Copy link
Contributor Author

The issue post mentions: "When a BibTeX entry includes a conference title". What does "conference title" here refer to?

A. Following the Getting Started guide on the app, when you add a new entry via Library->New Entry, the "Select entry type" dialogue box asks for an entry type. One of them is the "Conference" type. Hovering over it says that it is a legacy alias for "InProceedings". Both of these have a "Title" required field. My question is if the feature's exclusively for these types of BibTeX entries or for ALL entries regardless of their type.

I'm guessing the answer is any entry regardless of type, but I still have to ask to be sure.

B. If I'm querying all entries regardless of type, do I have to search only the Title field? There are entries where the conference name isn't in the title field. Case in point: I used the Web Search feature in the app to lookup the conference mentioned in the issue post: "ACIS Conference on Software Engineering Research, Management and Applications". I selected and imported the following entry: https://ieeexplore.ieee.org/document/9509045. It gets imported as an InProceedings, but the conference name is not in the Title field but in the Journal field under the Other Fields tab.

I'm assuming that I should be looking for the conference title and its acronym in all of the fields. Is there some sort of a standard way here regarding how entries are imported into JabRef so that I only have to look for the title inside a subset of fields rather than all of them?

@TheYorouzoya
Copy link
Contributor Author

The ICORE ranking data and its presentation.

A. Since each BibTeX entry is annotated with a year, I'm assuming that the user wants to see the ICORE ranking for that particular year. Again, very obvious, but I want to confirm. This would also mean that if a conference was added to ICORE later on, any entries from previous years should have a "Not Available" or "N/A" in the ICORE rank field. Or do we use the mismatched ranking as a fallback?

B. Since ICORE rankings are released roughly every 2-3 years, what about some weird edge cases within that time period? Say a conference was there for 2014, then it wasn't in the 2017 list, but then it was added again in 2018. Do we worry about entries for 2017 and give them an "N/A" rank? What if a conference's rank changed during that time period? Of course, if the answer to the previous question was that we do not use a fallback and stick to the exact date, then none of these questions matter.

C. The exported ICORE ranking data provided from the website (https://portal.core.edu.au/conf-ranks/) does not contain a header row in the CSV file. This isn't a big deal as the important bits can be made out quite clearly (all except one, that is). Each line contains 9 columns: ID, Title, Acronym, Source, Rank, UNKNOWN, FoR-1, FoR-2, FoR-3. The UNKNOWN field in column 6 is a "Yes" or "No" for every line. It is always present, but I can't seem to figure what it corresponds to (it isn't the Note or DBLP column from the website). Consequently, I cannot determine whether it is important. If you know what it is or if it is relevant to the feature, please let me know.

@TheYorouzoya
Copy link
Contributor Author

@koppor can you please help answer the questions I've posted above?

@koppor
Copy link
Member

koppor commented Aug 19, 2025

The issue post mentions: "When a BibTeX entry includes a conference title". What does "conference title" here refer to?

Oh, did you ever read about bibtex? - it is @InProceedings plus @InCollection and booktitle.

@koppor
Copy link
Member

koppor commented Aug 19, 2025

A. Following the Getting Started guide on the app, when you add a new entry via Library->New Entry, the "Select entry type" dialogue box asks for an entry type. One of them is the "Conference" type. Hovering over it says that it is a legacy alias for "InProceedings". Both of these have a "Title" required field. My question is if the feature's exclusively for these types of BibTeX entries or for ALL entries regardless of their type.

I'm guessing the answer is any entry regardless of type, but I still have to ask to be sure.

title is context dependend. We refer to booktitle. You can read on at https://ctan.org/pkg/biblatex for more information on bibtex if you want.

@koppor
Copy link
Member

koppor commented Aug 19, 2025

B. Since ICORE rankings are released roughly every 2-3 years, what about some weird edge cases within that time period? Say a conference was there for 2014, then it wasn't in the 2017 list, but then it was added again in 2018. Do we worry about entries for 2017 and give them an "N/A" rank? What if a conference's rank changed during that time period? Of course, if the answer to the previous question was that we do not use a fallback and stick to the exact date, then none of these questions matter.

Always use the latest ranking. We are not interested in historic data. - Only one CSV should be used.

@koppor
Copy link
Member

koppor commented Aug 19, 2025

C. The exported ICORE ranking data provided from the website (portal.core.edu.au/conf-ranks) does not contain a header row in the CSV file. This isn't a big deal as the important bits can be made out quite clearly (all except one, that is). Each line contains 9 columns: ID, Title, Acronym, Source, Rank, UNKNOWN, FoR-1, FoR-2, FoR-3. The UNKNOWN field in column 6 is a "Yes" or "No" for every line. It is always present, but I can't seem to figure what it corresponds to (it isn't the Note or DBLP column from the website). Consequently, I cannot determine whether it is important. If you know what it is or if it is relevant to the feature, please let me know.

Web site has:

image

Which is

  • Title
  • Acronym
  • Source
  • Rank
  • Note
  • DBLP
  • Primary FoR
  • Comments
  • Average Rating

Example CSV line

9,"ACIS Conference on Software Engineering Research, Management and Applications",SERA,CORE2023,C,No,4612,,
1825,ACM International Joint Conference on Pervasive and Ubiquitous Computing (PERVASIVE and UbiComp combined from 2013),UbiComp,CORE2023,journal published,No,4608,,

(NOTE: It would be good if this was included in your question to make it self-contained)

I cannot quickly see it, but we need "Title", "Acronym" and "Rank" only. The other columns can be ommitted, can't they?

@koppor
Copy link
Member

koppor commented Aug 19, 2025

Please make your question numbers unique. "A" is used double, isn't it?

A. Since each BibTeX entry is annotated with a year, I'm assuming that the user wants to see the ICORE ranking for that particular year. Again, very obvious, but I want to confirm. This would also mean that if a conference was added to ICORE later on, any entries from previous years should have a "Not Available" or "N/A" in the ICORE rank field. Or do we use the mismatched ranking as a fallback?

No, always the latest year.

@koppor
Copy link
Member

koppor commented Aug 19, 2025

B. Since ICORE rankings are released roughly every 2-3 years, what about some weird edge cases within that time period? Say a conference was there for 2014, then it wasn't in the 2017 list, but then it was added again in 2018. Do we worry about entries for 2017 and give them an "N/A" rank? What if a conference's rank changed during that time period? Of course, if the answer to the previous question was that we do not use a fallback and stick to the exact date, then none of these questions matter.

Always use the latest CSV - there is one export. This CSV should be used.

@koppor
Copy link
Member

koppor commented Aug 19, 2025

B. If I'm querying all entries regardless of type, do I have to search only the Title field? There are entries where the conference name isn't in the title field. Case in point: I used the Web Search feature in the app to lookup the conference mentioned in the issue post: "ACIS Conference on Software Engineering Research, Management and Applications". I selected and imported the following entry: ieeexplore.ieee.org/document/9509045. It gets imported as an InProceedings, but the conference name is not in the Title field but in the Journal field under the Other Fields tab.

For @Article use getFieldOrAlias( StandardField.Title) , which will use JournalTitle (for BibLaTeX) first and then check Title.

@koppor
Copy link
Member

koppor commented Aug 19, 2025

@koppor can you please help answer the questions I've posted above?

I hope, I got all questions, I am a bit confused since the questions are all labeled with "A" and I could have missed something.

@koppor
Copy link
Member

koppor commented Aug 19, 2025

@TheYorouzoya I wonder if you have seen the "Helpful resources" section at the issue description (#13476)

image

It links to #13512

Did you know that one can click on "Files changed"?

image

You are routed to https://github.com/JabRef/jabref/pull/13512/files

You then might have seen

image

I know that code reading is not easy; but it is an essential skill to produce maintainable code.

@TheYorouzoya
Copy link
Contributor Author

I hope, I got all questions, I am a bit confused...

Thank you for your patience with answering my questions. I appreciate it.

Please make your question numbers unique...

I bundled my questions for a specific context under a singular comment so that it would be easier to reply to all of them like this

image

That said, this is my way of doing things, and I'm the guest here. Sorry about the confusion it lead to. Moving forward, I will post one individual question per comment. No labels included.

@TheYorouzoya
Copy link
Contributor Author

Oh, did you ever read about bibtex? - it is @InProceedings plus @InCollection and booktitle.

Starting from the issue post

image

I wanted to see for myself what the feature might look like inside of JabRef to the user. So I downloaded the current version and searched for the conference mentioned in the issue post. I imported one of the entries and saw that the conference name showed up inside the "Journal" field

image

Even though the manual says that the optional field for a venue exists

image

So I, then, booted up the build on my local repo, i.e., the jabgui:run task. I imported the same article again, but this time, the conference name shows up in the "Booktitle" field

image

even though there is a clearly indicated "Venue" field which is empty

image

Do you see why I would ask such an obvious question after this?

@koppor
Copy link
Member

koppor commented Aug 20, 2025

@TheYorouzoya Thank you for your patience. It's all voluntary work here. It needs time to explain the domain of scientific references. Maybe you can be a guest a little longer here and improve our documentation at https://docs.jabref.org. Currently we see guests being here just a short time, doing a task, and then leave. I always hope that a guest will make the place better as a whole; especially because all guests seem to be learning software engineering and not just programming.

Data sourced from ICORE website here: https://portal.core.edu.au/conf-ranks/ to enable ICORE rank lookups.

As discussed here: JabRef#13699 (comment), only the latest data from ICORE is to be used. At this time, it is the ICORE2023 ranking data.

Part of JabRef#13476
@subhramit
Copy link
Member

I bundled my questions for a specific context under a singular comment so that it would be easier to reply to all of them like this

Moving forward, I will post one individual question per comment. No labels included.

Hey @TheYorouzoya - That is not needed. You can just use the labels to bundle questions under their respective contexts like you do, just add numbering to them (like A1, A2, etc.) so that they can be specifically and easily referred to when answering.

@koppor
Copy link
Member

koppor commented Aug 22, 2025

I am more used to Gitter (Matrix) chat for a bulk of questions 😅. Sorry for that!

I have to confess that I did not check the terms properly while writing. I used "venue" as a scientist indicating a conference. And I did not check whether BibLaTeX has some "definition" of venue. A "venue" meant in the issue is some @InCollection and @InProceedings entry. We identify it by the booktitle.

I also meant journal articles, which are defined by @Article having title or journaltitle. You can receive the value by getFieldOrAlias(StandardField.JOURNAL).

I hope, I could answer your question now and you are unblocked to move forward.

- Append a header row to resources/icore/ICORE2023.csv
- Add ConferenceEntry record to represent ICORE conference data
- Add ConferenceRepository to load conference data and allow conference lookups using an acronym or a bookTitle with fuzzy match as a fallback
- Add utility class to extract an acronym from a bookTitle
- Add tests

Part of JabRef#13476
// A slight modification of: https://stackoverflow.com/a/17759264
private static final Pattern PATTERN = Pattern.compile("\\(([^()]*)\\)");

public static Optional<String> extract(String input) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method lacks input validation for null parameter which could lead to NullPointerException. While Optional return is good, the input should be validated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NonNull jspecify annotation is OK

Please add JavaDoc.

String acronym,
String rank
) {
private final static String URL_PREFIX = "https://portal.core.edu.au/conf-ranks/";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect order of modifiers. According to Java conventions and effective Java principles, it should be 'private static final' instead of 'private final static'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix that in the next commit.

}

@Test
void extractReturnsEmptyforEmptyParentheses() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method name contains a typo: 'forEmptyParentheses' should be 'ForEmptyParentheses' to maintain consistent camelCase naming convention in test methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix that in the next commit.


@Test
void getConferenceFromBookTitleReturnsConferenceForFuzzyMatchAboveThreshold() {
// String similarity > 0.9
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment merely states what can be derived from the code and test name, not providing additional information about reasoning or implementation details.

@TheYorouzoya
Copy link
Contributor Author

I hope, I could answer your question now and you are unblocked to move forward.

Thank you! I'll work on the GUI side next. I do have some questions there, but I'll post those once I'm done looking around the code a bit more.

@subhramit
Copy link
Member

I am more used to Gitter (Matrix) chat for a bulk of questions 😅.

Also, here is a link to our gitter chat.

- Add ICORERankingEditor and ICORERankingEditorViewModel classes (inspired by other editors and their view models) to the GUI package.
- Add ICORERankingEditor.fxml for the editor's layout.
- Add field creation logic to FieldEditors#getForField.
- Update jablib/module-info to export the icore logic and model packages for use on the GUI side.
- Add ICORE rank field to FieldFactory#getDefaultGeneralFields and update preferences migrations as per JabRef#13512 (comment).
- Fix typos in ConferenceAcronymExtractorTest.
- Fix order of modifiers in ConferenceEntry.
- Update CHANGELOG.

Part of JabRef#13476
@TheYorouzoya
Copy link
Contributor Author

TheYorouzoya commented Sep 3, 2025

@koppor I was just about to push an updated matching algorithm before heading to sleep. I was doing some finishing touches here. Should I push it now? Or do I do it on another issue/PR later on?
I also had a writeup explaining the decisions. I can post just that if you wanna take a look.

@koppor koppor disabled auto-merge September 3, 2025 21:55
@koppor
Copy link
Member

koppor commented Sep 3, 2025

@koppor I was just about to push an updated matching algorithm before heading to sleep. I was doing some finishing touches here. Should I push it now? Or do I do it tomorrow? I also had a writeup explaining the decisions. I can post just that if you wanna take a look.

No rush - then we wait to merge the PR.

(The alterantive would be to merge this as is and that you base your new commits on latest "main" - or create a [magic merge commit])(https://github.com/koppor/magic-merge-commit/))

@TheYorouzoya
Copy link
Contributor Author

I've improved the search algorithm to include some of the stuff I mentioned in my last comment. This new version finds 23/31 matches from this test.

The 8 failing tests all fail for only two reasons: either the data has too much noise with jumbled up conference titles or the conference title/acronym has been changed in the latest data (there is no matching entry for it).

The New Algorithm

1. Generate acronym candidates

Firstly, to deal with the Group A offenders, i.e., acronyms inside parentheses with other text like CLOSER'12 or CLOSER 2018, I added a function to generate acronym candidates from a string.

The idea is to define a set of delimiters, mark all the positions of the delimiters in the string, and generate all substrings between the computed bounds.
image

The above example of ACM_WiSec'2018 would generate, in order: ACM_WiSec'2018, WiSec'2018, ACM_WiSec (bingo!), WiSec, 2018, ACM.

Note that simply splitting on delimiters would not work in our favor since the acronyms themselves can contain those delimiters within them.

Also, It is important here that a length-based ordering is maintained in the final result so as to avoid looking up composite acronyms like IV in IEEE-IV before we've looked up a more likely match. So we order the substrings by their length in our final result. Here, a simple TreeSet solves our problem and also keeps duplicates at bay.

To further trim down the number of generated candidates, we also pass in a CUTOFF value which is equal to the length of the longest acronym in our acronym data. This way, we do not bother generating longer substrings.

Substrings are also delimiter-trimmed since no acronym starts with a delimiter like ,, _, :, etc.

Since the number of substrings can still blow up pretty quickly, we have a hard cap of 50 candidates. As soon as we hit that number, we stop the generation and return the candidates collected so far.

Overall, the probability of a false positive is quite low since we only split on a set of delimiters rather than in the middle of strings.

2. Normalize input

The main obstacle hindering Group B matches was the abundance of noise in the input. So we need a way to trim as much out as we can without hurting our odds for a good match. Take this for example,

image

I've classified noise into following categories:

  • Delimiters; anything other than a letter or a digit
  • Years of the form 19XX or 20XX.
  • Ordinals like 1st, 2nd, 3rd, etc., as well as their LaTeX syntax counterparts like 7\textsuperscript{th} (for 7th) and so on.
  • Strings in parentheses. If there was useful data there, our acronym lookup would've found it.
  • Stopwords like Proceedings, Papers, Volume, etc.

Note that none of the above can contain things which are found in the ICORE conference title data.

I've implemented a normalizer that strips away all the noise defined above and smashes the input into one long string composed only of letters and digits. So our example of Proceedings of the 3rd International Conference on Cloud Computing and Service Science, CLOSER 2013, 8-10 May 2013, Aachen, Germany gets squashed down to this internationalconferenceoncloudcomputingandservicesciencecloser810aachengermany.

While loading the conference data, we also fed all the conference titles through the same normalizer to get a normalized title to conference map. In many cases, a normalized input query can match directly in our map.

We also do a acronym lookup after normalizing which can also catch cases like CoopIS 2009 (OTM 2009) which gets normalized down to coopis, matching the acronym.

3. Introduce another metric for matching

Normalization is just a preliminary step to improving the odds of matching titles. But even with much of the noise removed, there is still enough clutter left in the resulting string which can throw off our Levenshtein matching. Since it is a reasonable assumption to make that we will often find conference titles as a substring inside the query, matching based on substrings should improve our odds quite a bit.

Following this, I've introduced a Longest Common Substring Similarity rating similar to the Levenshtein Similarity. We compute the length of the longest common substring between the query and a conference title and divide the result with the length of the shorter string. This gives us a value between 0 and 1 which tells us how much of the shorter string exists as-is in the longer one.

If a conference title exists as a substring inside the query, we'll get back a value of 1 for an exact match.

To avoid overfitting while matching the query against conference titles, we only compute the similarity values when the normalized query string is either equal to or longer than the conference title in length. This prevents incomplete queries like International Conference on Information and Communication from matching against multiple entries like International Conference on Information and Communication Technologies and Development or International Conference on Information and Communication Technologies in Tourism.

We also use the LCS similarity to compute a combined metric along with Levenshtein Similarity as follows:

Combined Score = 0.6 * Levenshtein + 0.4 * LCS >= 0.75

These ratios and the threshold aren't exactly "grounded" in hard data, but they're more of an educated guess based on certain assumptions and fiddling with the various inputs to find a pattern. The core assumption here is that if a user is intending to lookup a conference's ICORE rank, the probability that the correct conference title is embedded somewhere in the booktitle is quite high.

image

We prioritize Levenshtein so that edit distances do matter more than substrings, but this combination should give us a reasonable metric for maching.

For example, for our noisy input with a misspelling in the conference title:

Proceedings of the 3rd International Conference on Cloud Computing and Service Science, CLOSER 2013, 8-10 May 2013, Aachen, Germany

normalized down to: internationalconferenceoncloudcomputingandservicesciencecloser810aachengermany

Matches with the correct conference title of: International Conference on Cloud Computing and Services Science,

which gets normalized to: internationalconferenceoncloudcomputingandservicesscience

with Levenshtein similarity of 0.705, LCS similarity of 0.877 , and a combined score of 0.7739.

There is still a possibility of false positives here, but I believe the tradeoff is worth it.

@TheYorouzoya
Copy link
Contributor Author

TheYorouzoya commented Sep 3, 2025

It seems I've bungled up a test when I changed the expected value to be '' which gets treated by Optional.ofNullable(expectedResult) to return an empty string optional rather than the literal Optional.empty(). I'll fix that.

@TheYorouzoya
Copy link
Contributor Author

Also seems like my JavaDoc is malformed for the ConferenceUtils#normalize method. I'll fix that too.

static Stream<Arguments> generateAcronymCandidateTestData() {
return Stream.of(
// Edge cases
Arguments.of("", 2, Collections.emptySet()), // Empty string returns empty set
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses Collections.emptySet() instead of the modern Java Set.of() for creating empty sets. Modern Java practices prefer Set.of() for better readability and consistency.

Copy link
Member

@subhramit subhramit Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheYorouzoya please push your changes if you've taken this suggestion ^, as I think your PR is near completion, so two of us can approve and concretely plan follow-ups if any.

P.S. thank you so much for the detailed write-ups!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was waiting for any feedback regarding the updated algorithm before pushing in case I needed to make some changes there as well. I'll update the test with Set.of() and push.

@TheYorouzoya TheYorouzoya requested a review from koppor September 4, 2025 12:13
@subhramit
Copy link
Member

Regarding your comment elaborating the algorithm, we think it should go somewhere in the docs folder so that this information is not buried in GitHub.

import static org.junit.jupiter.api.Assertions.assertEquals;

public class ConferenceUtilsTest {
@ParameterizedTest(name = "Extract from \"{0}\" should return \"{1}\"")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test method uses @ParameterizedTest with a display name pattern. According to JabRef practices, method names should be comprehensive enough without @DisplayName or name patterns.

@TheYorouzoya
Copy link
Contributor Author

Regarding your comment elaborating the algorithm, we think it should go somewhere in the docs folder so that this information is not buried in GitHub.

Are you referring to the docs/decisions folder? I'd be happy to add it there once everything is finalized.

@koppor
Copy link
Member

koppor commented Sep 4, 2025

Regarding your comment elaborating the algorithm, we think it should go somewhere in the docs folder so that this information is not buried in GitHub.

Are you referring to the docs/decisions folder? I'd be happy to add it there once everything is finalized.

I think, its more than "just" a decision - its a documentation how things work. Maybe, docs/components, because I don't see any other good place; and this describes (part of) the consistency check component.

Copy link
Member

@subhramit subhramit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor cosmetic comments

Comment on lines 99 to 100
public static Set<String> generateAcronymCandidates(@NonNull String input, int CUTOFF) {
if (input.isEmpty() || CUTOFF <= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capital letters are used for constants, please use lowercase here (and adjust the javadoc as well)

*/
public static Set<String> generateAcronymCandidates(@NonNull String input, int CUTOFF) {
if (input.isEmpty() || CUTOFF <= 0) {
return Collections.emptySet();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set.of and remove the collections import

return Collections.emptySet();
}

final int MAX_CANDIDATES_THRESHOLD = 50;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to class level declarations

Comment on lines 106 to 107
// Collect delimiter boundaries: -1 (start), every delimiter index, and input length (end).
bounds.add(-1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract as class-level constant DELIMITER_START

import org.jspecify.annotations.NonNull;

public class ConferenceUtils {
// Regex that'll extract the string within the first deepest set of parentheses
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is stating the obvious which can be derived from the code and regex pattern itself. It doesn't provide additional value or reasoning behind the implementation.

Copy link
Member

@subhramit subhramit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was some fine work, lgtm :)
Documentation can be added as a follow-up.

@koppor koppor added this pull request to the merge queue Sep 4, 2025
@koppor
Copy link
Member

koppor commented Sep 4, 2025

@TheYorouzoya You have nice screenshots on the Icore ranking field - and a step-by-step guide. Can you add this to https://github.com/JabRef/user-documentation/tree/main/en/advanced/entryeditor? (It is OK if the screenshots do not match the color theme of the current ones - this is future work to fix it)

Merged via the queue into JabRef:main with commit 7823af5 Sep 4, 2025
37 checks passed
@TheYorouzoya TheYorouzoya deleted the add-ICORE-ranking-support branch September 5, 2025 08:39
Siedlerchr added a commit that referenced this pull request Sep 8, 2025
* upstream/main: (54 commits)
  Split relativizeSymlinks parameterized tests in separate tests (#13782)
  Update the search syntax highlight for web search (#13801)
  Chore(deps): Bump ai.djl:bom from 0.33.0 to 0.34.0 in /versions (#13833)
  Fix typos in CHANGELOG.md (#13826)
  Chore(deps): Bump com.konghq:unirest-modules-gson in /versions (#13831)
  Chore(deps): Bump org.gradlex:extra-java-module-info in /build-logic (#13830)
  Chore(deps): Bump org.apache.logging.log4j:log4j-to-slf4j in /versions (#13832)
  Chore(deps): Bump io.zonky.test.postgres:embedded-postgres-binaries-bom (#13834)
  Chore(deps): Bump jablib/src/main/resources/csl-locales (#13829)
  Chore(deps): Bump jablib/src/main/resources/csl-styles (#13827)
  Chore(deps): Bump jablib/src/main/abbrv.jabref.org (#13828)
  add: CAYW endpoint formats (#13785)
  New Crowdin updates (#13823)
  chore(deps): update dependency org.kohsuke:github-api to v2.0-rc.5 (#13822)
  Add support for automatic ICORE conference ranking lookup [#13476] (#13699)
  New Crowdin updates (#13820)
  Initialize search bar auto-completion with real database context (no tab switch needed) (#13816)
  Fixes #13274: Allow cygwin-paths on Windows (#13297)
  Refine "REDACTED" replacement of API key value in web fetcher search URL (#13814)
  changed ISSNCleanup into NormalizeIssn, refactored respective tests #13748 (#13767)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for automatic ICORE conference ranking lookup

4 participants