Skip to content

Conversation

@ongdisheng
Copy link
Contributor

@ongdisheng ongdisheng commented Nov 17, 2025

Related: #571 #272

Purpose of the pull request

This PR adds support for reading shared formulas from XLSX files. Currently, when there are shared formulas in the file, Fesod cannot read the formula text from reference cells. This results in missing formula information for users who need to extract formulas from their Excel data.

What's changed?

This PR implements shared formula support by leveraging Apache POI's formula conversion utilities. When parsing XLSX files, the code now detects shared formulas by checking for the shared type attribute and shared index attribute in the formula XML tags. For master cells that contain the actual formula text, the formula and its position are stored in a map. For reference cells that only contain a shared index, the code retrieves the master formula and converts it to the cell-specific formula by calculating row and column offsets using Apache POI's SharedFormula conversion method. A comprehensive unit test with various formula types has been added to verify the functionality, following the test cases referenced in PR #581.

Current test file: shared_formula.xlsx

Checklist

  • I have read the Contributor Guide.
  • I have written the necessary doc or comment.
  • I have added the necessary unit tests and all cases have passed.

@ongdisheng ongdisheng force-pushed the fix/support-shared-formulas branch from 0a71d19 to 6c87bba Compare November 17, 2025 09:53
@psxjoy psxjoy added the PR: reviewing Currently under active review. label Nov 17, 2025
@psxjoy psxjoy requested review from Copilot and psxjoy and removed request for Copilot November 17, 2025 12:36
@psxjoy
Copy link
Member

psxjoy commented Nov 17, 2025

In code-review process.Just waiting :)

Copilot finished reviewing on behalf of psxjoy November 17, 2025 12:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for reading shared formulas from XLSX files by implementing detection and conversion of shared formula references. Currently, Fesod cannot read formula text from reference cells in shared formula groups, resulting in missing formula information. The implementation uses Apache POI's formula conversion utilities to parse master formulas and convert them to cell-specific formulas based on row and column offsets.

  • Adds shared formula detection using XML attributes (t="shared" and si index)
  • Implements formula conversion for reference cells using Apache POI's SharedFormula class
  • Adds comprehensive test coverage with various formula types (volatile, relative, absolute, mixed, addition)

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
fesod/src/main/java/org/apache/fesod/sheet/analysis/v07/handlers/CellFormulaTagHandler.java Implements core shared formula handling logic including master cell storage and reference cell conversion
fesod/src/main/java/org/apache/fesod/sheet/read/metadata/holder/xlsx/XlsxReadSheetHolder.java Adds shared formula state tracking with map storage and SharedFormulaInfo data class
fesod/src/main/java/org/apache/fesod/sheet/constant/ExcelXmlConstants.java Defines XML attribute constants for shared formula detection (ATTRIBUTE_SI, ATTRIBUTE_SHARED)
fesod/src/test/java/org/apache/fesod/sheet/formula/FormulaDataTest.java Adds test case for reading shared formulas from XLSX file
fesod/src/test/java/org/apache/fesod/sheet/formula/FormulaDataListener.java Implements test listener to verify shared formulas are read correctly
fesod/src/test/java/org/apache/fesod/sheet/formula/FormulaData.java Defines test data model with various formula types
fesod/src/test/resources/formula/shared_formula.xlsx Test Excel file containing shared formulas for validation
Comments suppressed due to low confidence (1)

fesod/src/main/java/org/apache/fesod/sheet/analysis/v07/handlers/CellFormulaTagHandler.java:50

  • Potential uncaught 'java.lang.NumberFormatException'.
        xlsxReadSheetHolder.setTempFormulaSharedIndex(sharedIndex != null ? Integer.parseInt(sharedIndex) : null);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +50 to 51
xlsxReadSheetHolder.setTempFormulaSharedIndex(sharedIndex != null ? Integer.parseInt(sharedIndex) : null);
}
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding error handling for Integer.parseInt(sharedIndex). If the shared index attribute contains an invalid integer value, this will throw a NumberFormatException that will propagate up the call stack. You should either wrap this in a try-catch block or validate the value before parsing.

Suggested change
xlsxReadSheetHolder.setTempFormulaSharedIndex(sharedIndex != null ? Integer.parseInt(sharedIndex) : null);
}
if (sharedIndex != null) {
try {
xlsxReadSheetHolder.setTempFormulaSharedIndex(Integer.parseInt(sharedIndex));
} catch (NumberFormatException e) {
// Optionally log the error or handle it as needed
xlsxReadSheetHolder.setTempFormulaSharedIndex(null);
}
} else {
xlsxReadSheetHolder.setTempFormulaSharedIndex(null);
}

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I checked the codebase and found that Integer.parseInt is used without try-catch in similar XML parsing contexts such as CellTagHandler.java:66. Meanwhile, according to the Office Open XML standard, the si attribute is defined as:

<attribute name="si" type="xsd:unsignedInt" use="optional"/>

The spec guarantees si is always an unsigned integer in valid Excel files. If it contains an invalid value, the Excel file is malformed and the NumberFormatException should propagate to fail fast rather than silently swallowing the error.

Comment on lines 123 to 128
} catch (Exception e) {
// If conversion fails, return the master formula as-is
// This handles cases like volatile functions with no cell references
// where the formula should be identical across all cells anyway
return masterInfo.getFormulaText();
}
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent exception swallowing can make debugging difficult. Consider adding logging to record when formula conversion fails. This would help users understand why they might be seeing unexpected master formulas instead of properly converted cell-specific formulas. For example:

} catch (Exception e) {
    log.warn("Failed to convert shared formula at row {} col {}, using master formula instead: {}", 
        currentRow, currentCol, masterInfo.getFormulaText(), e);
    return masterInfo.getFormulaText();
}

Copilot uses AI. Check for mistakes.
Comment on lines 96 to 98
if (masterInfo == null) {
return "";
}
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding logging or a more informative return value when the master formula is not found. Returning an empty string silently could make it difficult to debug issues where:

  1. A reference cell is processed before its master cell (unusual but possible with malformed files)
  2. The shared index doesn't match any existing master formula

For example:

if (masterInfo == null) {
    log.warn("Master formula not found for shared index {} at row {} col {}", 
        sharedIndex, currentRow, currentCol);
    return "";
}

Copilot uses AI. Check for mistakes.
/**
* Map to store master shared formulas by their shared index.
*/
private Map<Integer, SharedFormulaInfo> sharedFormulaMap;
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getSharedFormulaMap exposes the internal representation stored in field sharedFormulaMap. The value may be modified after this call to getSharedFormulaMap.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! However, this class uses @Getter at the class level, which already exposes all mutable fields directly including Deque<String> tagDeque. XlsxReadSheetHolder is an internal holder class designed to store temporary parsing state where direct map access is consistent with how other mutable fields in this class are used. The map is only accessed by CellFormulaTagHandler during the parsing process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: reviewing Currently under active review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants