Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 24 additions & 5 deletions src/tools/mongodb/metadata/collectionSchema.ts
Original file line number Diff line number Diff line change
@@ -1,19 +1,35 @@
import type { CallToolResult } from "@modelcontextprotocol/sdk/types.js";
import { DbOperationArgs, MongoDBToolBase } from "../mongodbTool.js";
import type { ToolArgs, OperationType } from "../../tool.js";
import type { ToolArgs, OperationType, ToolExecutionContext } from "../../tool.js";
import { formatUntrustedData } from "../../tool.js";
import { getSimplifiedSchema } from "mongodb-schema";
import z from "zod";
import { ONE_MB } from "../../../helpers/constants.js";
import { collectCursorUntilMaxBytesLimit } from "../../../helpers/collectCursorUntilMaxBytes.js";

export class CollectionSchemaTool extends MongoDBToolBase {
public name = "collection-schema";
protected description = "Describe the schema for a collection";
protected argsShape = DbOperationArgs;
protected argsShape = {
...DbOperationArgs,
sampleSize: z.number().optional().default(50).describe("Number of documents to sample for schema inference"),
responseBytesLimit: z.number().optional().default(ONE_MB).describe(`The maximum number of bytes to return in the response. This value is capped by the server’s configured maxBytesPerQuery and cannot be exceeded.`),

Check failure on line 16 in src/tools/mongodb/metadata/collectionSchema.ts

View workflow job for this annotation

GitHub Actions / check-style

Replace `.number().optional().default(ONE_MB).describe(`The·maximum·number·of·bytes·to·return·in·the·response.·This·value·is·capped·by·the·server’s·configured·maxBytesPerQuery·and·cannot·be·exceeded.`` with `⏎············.number()⏎············.optional()⏎············.default(ONE_MB)⏎············.describe(⏎················`The·maximum·number·of·bytes·to·return·in·the·response.·This·value·is·capped·by·the·server’s·configured·maxBytesPerQuery·and·cannot·be·exceeded.`⏎············`
};

public operationType: OperationType = "metadata";

protected async execute({ database, collection }: ToolArgs<typeof DbOperationArgs>): Promise<CallToolResult> {
protected async execute(
{ database, collection, sampleSize, responseBytesLimit }: ToolArgs<typeof this.argsShape>,
{ signal }: ToolExecutionContext
): Promise<CallToolResult> {
const provider = await this.ensureConnected();
const documents = await provider.find(database, collection, {}, { limit: 5 }).toArray();
const cursor = provider.aggregate(database, collection, [{ $sample: { size: Math.min(sampleSize, this.config.maxDocumentsPerQuery) } }]);

Check failure on line 26 in src/tools/mongodb/metadata/collectionSchema.ts

View workflow job for this annotation

GitHub Actions / check-style

Replace `{·$sample:·{·size:·Math.min(sampleSize,·this.config.maxDocumentsPerQuery)·}·}` with `⏎············{·$sample:·{·size:·Math.min(sampleSize,·this.config.maxDocumentsPerQuery)·}·},⏎········`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want to limit the sample to maxDocumentsPerQuery - the way I interpreted this config option, it's dealing with the number of documents we'd be returning to the LLM, not necessarily the number of documents we're fetching internally - e.g. the LLM shouldn't care if we sample 50 or 1000 docs since it's only seeing the inferred schema anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be another option, I just wanted to limit in case a model gets crazy and tries to query thousands and thousands of documents for sampling. $sample is a bit more expensive than just finding, so it's just for safety.

No strong opinion here by the way, we can have a specific hardcoded option for sample in a constant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to a constant for the upper limit.

const { cappedBy, documents } = await collectCursorUntilMaxBytesLimit({
cursor,
configuredMaxBytesPerQuery: this.config.maxBytesPerQuery,
toolResponseBytesLimit: responseBytesLimit,
abortSignal: signal,
});
const schema = await getSimplifiedSchema(documents);

const fieldsCount = Object.entries(schema).length;
Expand All @@ -28,9 +44,12 @@
};
}

const header = `Found ${fieldsCount} fields in the schema for "${database}.${collection}"`;
const cappedWarning = cappedBy !== undefined ? `\nThe schema was inferred from a subset of documents due to the response size limit. (${cappedBy})` : "";

Check failure on line 48 in src/tools/mongodb/metadata/collectionSchema.ts

View workflow job for this annotation

GitHub Actions / check-style

Replace `·cappedBy·!==·undefined·?·`\nThe·schema·was·inferred·from·a·subset·of·documents·due·to·the·response·size·limit.·(${cappedBy})`` with `⏎············cappedBy·!==·undefined⏎················?·`\nThe·schema·was·inferred·from·a·subset·of·documents·due·to·the·response·size·limit.·(${cappedBy})`⏎···············`

return {
content: formatUntrustedData(

Check failure on line 51 in src/tools/mongodb/metadata/collectionSchema.ts

View workflow job for this annotation

GitHub Actions / check-style

Replace `⏎················`${header}${cappedWarning}`,⏎················JSON.stringify(schema)⏎············` with ``${header}${cappedWarning}`,·JSON.stringify(schema)`
`Found ${fieldsCount} fields in the schema for "${database}.${collection}"`,
`${header}${cappedWarning}`,
JSON.stringify(schema)
),
};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,25 @@
import { describe, expect, it } from "vitest";

describeWithMongoDB("collectionSchema tool", (integration) => {
validateToolMetadata(

Check failure on line 18 in tests/integration/tools/mongodb/metadata/collectionSchema.test.ts

View workflow job for this annotation

GitHub Actions / check-style

Replace `⏎········integration,⏎········"collection-schema",⏎········"Describe·the·schema·for·a·collection",⏎·······` with `integration,·"collection-schema",·"Describe·the·schema·for·a·collection",`
integration,
"collection-schema",
"Describe the schema for a collection",
databaseCollectionParameters
[
...databaseCollectionParameters,

Check failure on line 23 in tests/integration/tools/mongodb/metadata/collectionSchema.test.ts

View workflow job for this annotation

GitHub Actions / check-style

Delete `····`
{

Check failure on line 24 in tests/integration/tools/mongodb/metadata/collectionSchema.test.ts

View workflow job for this annotation

GitHub Actions / check-style

Delete `····`
name: "sampleSize",

Check failure on line 25 in tests/integration/tools/mongodb/metadata/collectionSchema.test.ts

View workflow job for this annotation

GitHub Actions / check-style

Delete `····`
type: "number",

Check failure on line 26 in tests/integration/tools/mongodb/metadata/collectionSchema.test.ts

View workflow job for this annotation

GitHub Actions / check-style

Delete `····`
description: "Number of documents to sample for schema inference",

Check failure on line 27 in tests/integration/tools/mongodb/metadata/collectionSchema.test.ts

View workflow job for this annotation

GitHub Actions / check-style

Delete `····`
required: false,
},
{
name: "responseBytesLimit",
type: "number",
description: `The maximum number of bytes to return in the response. This value is capped by the server’s configured maxBytesPerQuery and cannot be exceeded.`,
required: false,
}
]
);

validateThrowsForInvalidArguments(integration, "collection-schema", databaseCollectionInvalidArgs);
Expand Down
Loading