-
Notifications
You must be signed in to change notification settings - Fork 445
feat: Add support for reading shared formulas in XLSX files #693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0a71d19 to
6c87bba
Compare
|
In code-review process.Just waiting :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for reading shared formulas from XLSX files by implementing detection and conversion of shared formula references. Currently, Fesod cannot read formula text from reference cells in shared formula groups, resulting in missing formula information. The implementation uses Apache POI's formula conversion utilities to parse master formulas and convert them to cell-specific formulas based on row and column offsets.
- Adds shared formula detection using XML attributes (
t="shared"andsiindex) - Implements formula conversion for reference cells using Apache POI's
SharedFormulaclass - Adds comprehensive test coverage with various formula types (volatile, relative, absolute, mixed, addition)
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| fesod/src/main/java/org/apache/fesod/sheet/analysis/v07/handlers/CellFormulaTagHandler.java | Implements core shared formula handling logic including master cell storage and reference cell conversion |
| fesod/src/main/java/org/apache/fesod/sheet/read/metadata/holder/xlsx/XlsxReadSheetHolder.java | Adds shared formula state tracking with map storage and SharedFormulaInfo data class |
| fesod/src/main/java/org/apache/fesod/sheet/constant/ExcelXmlConstants.java | Defines XML attribute constants for shared formula detection (ATTRIBUTE_SI, ATTRIBUTE_SHARED) |
| fesod/src/test/java/org/apache/fesod/sheet/formula/FormulaDataTest.java | Adds test case for reading shared formulas from XLSX file |
| fesod/src/test/java/org/apache/fesod/sheet/formula/FormulaDataListener.java | Implements test listener to verify shared formulas are read correctly |
| fesod/src/test/java/org/apache/fesod/sheet/formula/FormulaData.java | Defines test data model with various formula types |
| fesod/src/test/resources/formula/shared_formula.xlsx | Test Excel file containing shared formulas for validation |
Comments suppressed due to low confidence (1)
fesod/src/main/java/org/apache/fesod/sheet/analysis/v07/handlers/CellFormulaTagHandler.java:50
- Potential uncaught 'java.lang.NumberFormatException'.
xlsxReadSheetHolder.setTempFormulaSharedIndex(sharedIndex != null ? Integer.parseInt(sharedIndex) : null);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| xlsxReadSheetHolder.setTempFormulaSharedIndex(sharedIndex != null ? Integer.parseInt(sharedIndex) : null); | ||
| } |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling for Integer.parseInt(sharedIndex). If the shared index attribute contains an invalid integer value, this will throw a NumberFormatException that will propagate up the call stack. You should either wrap this in a try-catch block or validate the value before parsing.
| xlsxReadSheetHolder.setTempFormulaSharedIndex(sharedIndex != null ? Integer.parseInt(sharedIndex) : null); | |
| } | |
| if (sharedIndex != null) { | |
| try { | |
| xlsxReadSheetHolder.setTempFormulaSharedIndex(Integer.parseInt(sharedIndex)); | |
| } catch (NumberFormatException e) { | |
| // Optionally log the error or handle it as needed | |
| xlsxReadSheetHolder.setTempFormulaSharedIndex(null); | |
| } | |
| } else { | |
| xlsxReadSheetHolder.setTempFormulaSharedIndex(null); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I checked the codebase and found that Integer.parseInt is used without try-catch in similar XML parsing contexts such as CellTagHandler.java:66. Meanwhile, according to the Office Open XML standard, the si attribute is defined as:
<attribute name="si" type="xsd:unsignedInt" use="optional"/>
The spec guarantees si is always an unsigned integer in valid Excel files. If it contains an invalid value, the Excel file is malformed and the NumberFormatException should propagate to fail fast rather than silently swallowing the error.
| } catch (Exception e) { | ||
| // If conversion fails, return the master formula as-is | ||
| // This handles cases like volatile functions with no cell references | ||
| // where the formula should be identical across all cells anyway | ||
| return masterInfo.getFormulaText(); | ||
| } |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silent exception swallowing can make debugging difficult. Consider adding logging to record when formula conversion fails. This would help users understand why they might be seeing unexpected master formulas instead of properly converted cell-specific formulas. For example:
} catch (Exception e) {
log.warn("Failed to convert shared formula at row {} col {}, using master formula instead: {}",
currentRow, currentCol, masterInfo.getFormulaText(), e);
return masterInfo.getFormulaText();
}| if (masterInfo == null) { | ||
| return ""; | ||
| } |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding logging or a more informative return value when the master formula is not found. Returning an empty string silently could make it difficult to debug issues where:
- A reference cell is processed before its master cell (unusual but possible with malformed files)
- The shared index doesn't match any existing master formula
For example:
if (masterInfo == null) {
log.warn("Master formula not found for shared index {} at row {} col {}",
sharedIndex, currentRow, currentCol);
return "";
}| /** | ||
| * Map to store master shared formulas by their shared index. | ||
| */ | ||
| private Map<Integer, SharedFormulaInfo> sharedFormulaMap; |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getSharedFormulaMap exposes the internal representation stored in field sharedFormulaMap. The value may be modified after this call to getSharedFormulaMap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! However, this class uses @Getter at the class level, which already exposes all mutable fields directly including Deque<String> tagDeque. XlsxReadSheetHolder is an internal holder class designed to store temporary parsing state where direct map access is consistent with how other mutable fields in this class are used. The map is only accessed by CellFormulaTagHandler during the parsing process.
Related: #571 #272
Purpose of the pull request
This PR adds support for reading shared formulas from XLSX files. Currently, when there are shared formulas in the file, Fesod cannot read the formula text from reference cells. This results in missing formula information for users who need to extract formulas from their Excel data.
What's changed?
This PR implements shared formula support by leveraging Apache POI's formula conversion utilities. When parsing XLSX files, the code now detects shared formulas by checking for the shared type attribute and shared index attribute in the formula XML tags. For master cells that contain the actual formula text, the formula and its position are stored in a map. For reference cells that only contain a shared index, the code retrieves the master formula and converts it to the cell-specific formula by calculating row and column offsets using Apache POI's SharedFormula conversion method. A comprehensive unit test with various formula types has been added to verify the functionality, following the test cases referenced in PR #581.
Current test file: shared_formula.xlsx
Checklist