-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
Description
This issue is meant to be a tracking issue for where we as a team think we want ES module loaders to go. I’ll start it off by writing what I think the next steps are, and based on feedback in comments I’ll revise this top post accordingly.
I think the first priority is to finish the WIP PR that @jkrems started to slim down the main four loader hooks (resolve, getFormat, getSource, transformSource) into two (resolveToURL and loadFromURL, or should they be called resolve and load?). This would solve the issue discussed in #34144 / #34753.
Next I’d like to add support for chained loaders. There was already a PR opened to achieve this, but as far as I can tell that PR doesn’t actually implement chaining as I understand it; it allows the transformSource hook to be chained but not the other hooks, if I understand it correctly, and therefore doesn’t really solve the user request.
A while back I had a conversation with @jkrems to hash out a design for what we thought a chained loaders API should look like. Starting from a base where we assume #35524 has been merged in and therefore the only hooks are resolve and load and getGlobalPreloadCode (which probably should be renamed to just globalPreloadCode, as there are no longer any other hooks named get*), we were thinking of changing the last argument of each hook from default<hookName> to next, where next is the next registered function for that hook. Then we hashed out some examples for how each of the two primary hooks, resolve and load, would chain.
Chaining resolve hooks
So for example say you had a chain of three loaders, unpkg, http-to-https, cache-buster:
- The
unpkgloader resolves a specifierfooto an urlhttp://unpkg.com/foo. - The
http-to-httpsloader rewrites that url tohttps://unpkg.com/foo. - The
cache-busterthat takes the url and adds a timestamp to the end, so likehttps://unpkg.com/foo?ts=1234567890.
These could be implemented as follows:
unpkg loader
export async function resolve(specifier, context, next) { // next is Node’s resolve
if (isBareSpecifier(specifier)) {
return `http://unpkg.com/${specifier}`;
}
return next(specifier, context);
}http-to-https loader
export async function resolve(specifier, context, next) { // next is the unpkg loader’s resolve
const result = await next(specifier, context);
if (result.url.startsWith('http://')) {
result.url = `https${result.url.slice('http'.length)}`;
}
return result;
}cache-buster loader
export async function resolve(specifier, context, next) { // next is the http-to-https loader’s resolve
const result = await next(specifier, context);
if (supportsQueryString(result.url)) { // exclude data: & friends
// TODO: do this properly in case the URL already has a query string
result.url += `?ts=${Date.now()}`;
}
return result;
}These chain “backwards” in the same way that function calls do, along the lines of cacheBusterResolve(httpToHttpsResolve(unpkgResolve(nodeResolve(...)))) (though in this particular example, the position of cache-buster and http-to-https can be swapped without affecting the result). The point though is that the hook functions nest: each one always just returns a string, like Node’s resolve, and the chaining happens as a result of calling next; and if a hook doesn’t call next, the chain short-circuits. I’m not sure if it’s preferable for the API to be node --loader unpkg --loader http-to-https --loader cache-buster or the reverse, but it would be easy to flip that if we get feedback that one way is more intuitive than the other.
Chaining load hooks
Chaining load hooks would be similar to resolve hooks, though slightly more complicated in that instead of returning a single string, each load hook returns an object { format, source } where source is the loaded module’s source code/contents and format is the name of one of Node’s ESM loader’s “translators”: commonjs, module, builtin (a Node internal module like fs), json (with --experimental-json-modules) or wasm (with --experimental-wasm-modules).
Currently, Node’s internal ESM loader throws an error on unknown file types: import('file.javascript') throws, even if the contents of that file are perfectly acceptable JavaScript. This error happens during Node’s internal resolve when it encounters a file extension it doesn’t recognize; hence the current CoffeeScript loader example has lots of code to tell Node to allow CoffeeScript file extensions. We should move this validation check to be after the format is determined, which is one of the return values of load; so basically, it’s on load to return a format that Node recognizes. Node’s internal load doesn’t know to resolve a URL ending in .coffee to module, so Node would continue to error like it does now; but the CoffeeScript loader under this new design no longer needs to hook into resolve at all, since it can determine the format of CoffeeScript files within load. In code:
coffeescript loader
import CoffeeScript from 'coffeescript';
// CoffeeScript files end in .coffee, .litcoffee or .coffee.md
const extensionsRegex = /\.coffee$|\.litcoffee$|\.coffee\.md$/;
export async function load(url, context, next) {
const result = await next(url, context);
// The first check is technically not needed but ensures that
// we don’t try to compile things that already _are_ compiled.
if (result.format === undefined && extensionsRegex.test(url)) {
// For simplicity, all CoffeeScript URLs are ES modules.
const format = 'module';
const source = CoffeeScript.compile(result.source, { bare: true });
return {format, source};
}
return result;
}And the other example loader in the docs, to allow import of https:// URLs, would similarly only need a load hook:
https loader
import { get } from 'https';
export async function load(url, context, next) {
if (url.startsWith('https://')) {
let format; // default: format is undefined
const source = await new Promise((resolve, reject) => {
get(url, (res) => {
// Determine the format from the MIME type of the response
switch (res.headers['content-type']) {
case 'application/javascript':
case 'text/javascript': // etc.
format = 'module';
break;
case 'application/node':
case 'application/vnd.node.node':
format = 'commonjs';
break;
case 'application/json':
format = 'json';
break;
// etc.
}
let data = '';
res.on('data', (chunk) => data += chunk);
res.on('end', () => resolve({ source: data }));
}).on('error', (err) => reject(err));
});
return {format, source};
}
return next(url, context);
}If these two loaders are used together, where the coffeescript loader’s next is the https loader’s hook and https loader’s next is Node’s native hook, so like coffeeScriptLoad(httpsLoad(nodeLoad(...))), then for a URL like https://example.com/module.coffee:
- The
httpsloader would load the source over the network, but returnformat: undefined, assuming the server supplied a correctContent-Typeheader likeapplication/vnd.coffeescriptwhich ourhttpsloader doesn’t recognize. - The
coffeescriptloader would get that{ source, format: undefined }early on from its call tonext, and setformat: 'module'based on the.coffeeat the end of the URL. It would also transpile the source into JavaScript. It then returns{ format: 'module', source }wheresourceis runnable JavaScript rather than the original CoffeeScript.
Chaining globalPreloadCode hooks
For now, I think that this wouldn’t be chained the way resolve and load would be. This hook would just be called sequentially for each registered loader, in the same order as the loaders themselves are registered. If this is insufficient, for example for instrumentation use cases, we can discuss and potentially change this to follow the chaining style of load.
Next Steps
Based on the above, here are the next few PRs as I see them:
- Finish esm: merge and simplify loader hooks #35524, simplifying the hooks to
resolve,loadandglobalPreloadCode. - Refactor Node’s internal ESM loader’s hooks into
resolveandload. Node’s internal loader already has no-ops fortransformSourceandgetGlobalPreloadCode, so all this really entails is merging the internalgetFormatandgetSourceinto one functionload. - Refactor Node’s internal ESM loader to move its exception on unknown file types from within
resolve(on detection of unknown extensions) to withinload(if the resolved extension has no defined translator). - Implement chaining as described here, where the
default<hookName>becomesnextand references the next registered hook in the chain. - Get a
loadreturn value offormat: 'commonjs'to work, or at least error informatively. See esm: Modify ESM Experimental Loader Hooks #34753 (comment). - Investigate and potentially add an additional
transformhook (see below).
This work should complete many of the major outstanding ES module feature requests, such as supporting transpilers, mocks and instrumentation. If there are other significant user stories that still wouldn’t be possible with the loaders design as described here, please let me know. cc @nodejs/modules