Skip to content

Conversation

@btangmu
Copy link
Member

@btangmu btangmu commented Nov 14, 2025

-Update ExampleDependencies.java by running GenerateExampleDependencies

-Improve GenerateExampleDependencies performance roughly 30% by conditionally skipping intern() in XMLSource: new XMLSource.setDoInternStrings()

-Add options to GenerateExampleDependencies for testing with subset of locales (e.g., -f^[^_]+$ to skip locales whose names contain underscore)

-Report how many minutes it takes to run GenerateExampleDependencies

-Fix a bug in GenerateExampleDependencies: call clearRecordedPaths after loop

-Refactor ICUServiceBuilder and related code: new constructor with CLDRFile, instead of calling setCldrFile separately

-Remove ICUServiceBuilder.setCldrFile and the no-arg ICUServiceBuilder constructor

-Only call the ICUServiceBuilder constructor locally from forLocale, except for fictional test locales xx, mul

-Do not call icuServiceBuilder.clearCache when it is superfluous immediately after constructing

-Call ICUServiceBuilder.clearCache instead of duplicating the code; remove ISB_CAN_CLEAR_CACHE

-New constructors for DateTimeFormats and ZoneFormats obviate calling set()

-New FlexibleDateFromCLDR constructor with CLDRFile replaces FlexibleDateFromCLDR.set

-Change CheckNumbers.defaultNumberingSystem to a local variable

-New constant LocaleNames.XX_TEST (compare existing LocaleNames.MUL)

-Make data private/final/static where possible; ICUServiceBuilder.supplementalData can be static since it does not depend on the locale

-Some automated refactoring by IntelliJ, from accepting suggestions linked to compiler warnings

-Comments

CLDR-19035

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

-Update ExampleDependencies.java by running GenerateExampleDependencies

-Improve GenerateExampleDependencies performance roughly 30% by conditionally skipping intern() in XMLSource: new XMLSource.setDoInternStrings()

-Add options to GenerateExampleDependencies for testing with subset of locales (e.g., -f^[^_]+$ to skip locales whose names contain underscore)

-Report how many minutes it takes to run GenerateExampleDependencies

-Fix a bug in GenerateExampleDependencies: call clearRecordedPaths after loop

-Refactor ICUServiceBuilder and related code: new constructor with CLDRFile, instead of always calling setCldrFile separately

-Do not call icuServiceBuilder.clearCache when it is superfluous immediately after constructing

-Call ICUServiceBuilder.clearCache instead of duplicating the code; remove ISB_CAN_CLEAR_CACHE

-New constructors for DateTimeFormats and ZoneFormats obviate calling set()

-Change data/methods to private and/or final where possible

-Some automated refactoring by IntelliJ, from accepting suggestions linked to compiler warnings

-Comments
"//ldml/personNames/personName[@order=\"([^\"]*+)\"][@length=\"([^\"]*+)\"][@usage=\"([^\"]*+)\"][@formality=\"([^\"]*+)\"]/namePattern")
.putAll(
"//ldml/dates/timeZoneNames/zone[@type=\"([^\"]*+)\"]/short/generic",
"//ldml/personNames/personName[@order=\"([^\"]*+)\"][@length=\"([^\"]*+)\"][@usage=\"([^\"]*+)\"][@formality=\"([^\"]*+)\"]/namePattern")
Copy link
Member Author

@btangmu btangmu Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only example dependency removed. I don't know why. The implication is that examples for these namePattern paths no longer depend on values for these timeZoneNames paths.

…aths

-The idea of rootOnlyPaths is that if a path only occurs in root.xml, and root is read-only as far as ST is concerned, then it does not matter if that examples for other paths depend on that path value, since the value will not change. But, that is untrue, since a non-root locale could override the root value if ST allows. The idea might be worth further investigation in the case of paths that really cannot be changed in ST.
// .../beforeCurrency/insertBetween
// .../afterCurrency/currencyMatch
// .../afterCurrency/surroundingMatch
// .../afterCurrency/insertBetween
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these paths occur only in root.xml and can't be changed in ST, then their values are ordinarily effectively constants and need not be taken into account for example dependencies. There may be other such paths. That implies:

  • ExampleDependencies.java could be smaller
  • Maybe ICUServiceBuilder caching needn't be completely disabled for GenerateExampleDependencies, so it could run faster

}

public ICUServiceBuilder() {
// This constructor without CLDRFile requires setCldrFile to be called subsequently.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The no-arg ICUServiceBuilder constructor still has 4 remaining usages. For example, CheckDates.icuServiceBuilder has an instance initializer and icuServiceBuilder.setCldrFile is called from CheckDates.handleSetCldrFileToCheck. The same ICUServiceBuilder instance gets handleSetCldrFileToCheck called repeatedly with a different CLDRFile (and locale) each time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea to make ICUServiceBuilder be immutable, thus no set() method. (Seems like a good idea).

Also, splitting out the functionality of ICUServiceBuilder into separate classes with smaller chunks of data that they need would allow us to be finer-grained when flushing a cache.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be helpful to make ICUServiceBuilder immutable. I'm still trying to understand it better, especially in relation to caching and example dependencies. It was mostly being constructed like new ICUServiceBuilder().setCldrFile(cldrFile). It seems better to use the constructor like new ICUServiceBuilder(cldrFile) where possible. Sometimes clearCache() was needlessly called between new ICUServiceBuilder() and setCldrFile(cldrFile).

However, for some purposes like CheckDates(), currently the same ICUServiceBuilder is re-used with many different CLDRFiles. I don't know whether that gives a significant performance benefit. I'd like to measure that before deciding whether to make ICUServiceBuilder immutable.

Copy link
Member Author

@btangmu btangmu Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just made a third commit, making ICUServiceBuilder almost immutable. Specifically to find any performance difference from calling new ICUServiceBuilder(cldrFile) for each locale instead of calling icuServiceBuilder.setCLdrFile(cldrFile) on the same instance, I tested performance of TestCheckCLDR.TestAllLocales which is an expensive test taking between 91 and 107 seconds locally. On average it seems to run slightly faster with that change, but I suspect that's only noise and the change makes no significant difference to performance.

One part of ICUServiceBuilder that is still not immutable is collationFile, and there's something strange related to that. ICUServiceBuilder.collationFile is only initialized in ICUServiceBuilder.forLocale(). Instances of ICUServiceBuilder that are constructed without ICUServiceBuilder.forLocale() never get ICUServiceBuilder.collationFile.

Also, ICUServiceBuilder.forLocale() uses a cache of ICUServiceBuilder instances, one for each locale. Presumably this cache has a large performance benefit. What's puzzling, then, is why is the cache not always used? There are many callers of new ICUServiceBuilder() that bypass the cache.

Copy link
Member Author

@btangmu btangmu Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fourth commit replaces nearly all new ICUServiceBuilder(cldrFile) with ICUServiceBuilder.forLocale() for consistency and improved performance. The only exception is for some unit tests that use fictitious locale names "xx" and "mul", with special test data, for which forLocale would not work.

There is a performance improvement for TestCheckCLDR.TestAllLocales which was taking between 91 and 107 seconds locally and now takes between 75 and 79 seconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before 4th commit, "Test with maven" took 22m 28s; after 4th commit, "Test with maven" took 21m 25s. Only 63 seconds shorter, so not a big performance improvement as far as running tests is concerned. The code is cleaner though, and these changes may pave the way for more improvements to GenerateExampleDependencies and other features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be helpful to make ICUServiceBuilder immutable. I'm still trying to understand it better, especially in relation to caching and example dependencies.

This is a good topic for discussion in the infra group.

When the underlying CLDRFile is immutable (or frozen), then the ICUServiceBuilder can be cached. So for charts, unit tests, etc that can be the case. We should check, however, that CLDRFile::isFrozen! (In general, we should not be caching mutable/unfrozen objects.

The problem is when the CLDRFile can change underneath, as in the survey tool. In that case, it cannot be cached, since the change to CLDRFile can require that the data in the ICUServiceBuilder be refreshed. Now, ideally, we would have objects like ICUServiceBuilder have Listeners on their mutable CLDRFile; then we would know to refresh the ICUServiceBuilder only when that CLDRFile changes a value that affects the ICUServiceBuilder; that would cut down on needless recreation. (But before undertaking anything like that, we should measure the performance cost of needless recreation first!)

The other nice cleanup would be to split ICUServiceBuilder into different parts, based on the CLDRData that they need. For example, we could have ICUNumberBuilder that just fetched the number information and ICUDateBuilder that just fetched the date information (but also has-a ICUNumberBuilder, since that is used in building dates). But again, only worth doing if we know it would improve performance significantly.

Only 63 seconds shorter

That is actually pretty good! That is 5% of 22m 28s. With ten 63s reductions, we'd be taking half the time for the tests (and probably a measurable effect on ST; since checks and examples would run faster).

@btangmu btangmu marked this pull request as ready for review November 14, 2025 18:34
@btangmu btangmu requested a review from macchiati November 14, 2025 18:35
…d refactoring

-Remove the no-arg ICUServiceBuilder constructor

-Remove ICUServiceBuilder.setCldrFile

-New FlexibleDateFromCLDR constructor with CLDRFile replaces FlexibleDateFromCLDR.set

-Change CheckNumbers.defaultNumberingSystem to a local variable

-Make data private and/or final where possible
-Only call the ICUServiceBuilder locally from forLocale, except for fictional test locales xx, mul

-New constants LocaleNames.XX_TEST (compare existing LocaleNames.MUL)

-Make data private/final/static where possible; ICUServiceBuilder.supplementalData can be static since it does not depend on the locale
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants