- 
                Notifications
    You must be signed in to change notification settings 
- Fork 79
Preprocessing API rework #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am so glad you got the ball rolling on an RFC for this change.
        
          
                text/preprocessing-api-rework.md
              
                Outdated
          
        
      | function extractStyles(code: string): { | ||
| start: number; | ||
| end: number; | ||
| content: { text: string; start: number; end: number }; | ||
| attributes: Array<{ | ||
| name: string; | ||
| value: string; | ||
| start: number; | ||
| end: number; | ||
| }>; | ||
| }; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to start, end, etc, it would probably be useful to also have a meta property (or some other name) whose value is an object which can contain other pertinent information about the style tag in relation to the rest of the file, or even the rest of the build.
For example, this would be a decent place to add some kind of indication of if it is a top-level style tag or nested as you mention in the Outstanding Questions section of this RFC. Then, rather than having some kind of mode, a preprocessor can just choose to filter on that value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also, since you mentioned that there are non-top-level styles and scripts that could be present (which I didn't know until reading this RFC), shouldn't the return value of the function be an array of these objects you've described?
|  | ||
| ### Roadmap, backwards-compatibility | ||
|  | ||
| This new functionality could be implemented in Svelte 3, where the `preprocess` function checks if the passed in preprocessor is an object (current API) or a function (proposed API). In Svelte 4, this would become the default, and we could provide a function for preprocessors that don't support the new API yet. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a clarification is in order that we'd provide a function which developers could use in their config files to wrap preprocessors that have not yet taken the necessary steps to adopt the new API.
As written, you might interpret it as "we will provide a function to preprocessor authors".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, there needs to be more work done here to explain what we do in Svelte 3, I think.
If you pass an array of preprocessors, some of which use the new API, and some of which use the old API, then what do you do about ordering? If preprocessor A uses the old API, but B uses the new API, you've got a condition where B is expecting that A will run first. But if A uses script or style, it won't have done anything when B runs. In this case, my ugly workaround is still necessary to get PostCSS turned into proper CSS before Svelte Image does its thing.
Perhaps we still need to have developers (not preprocessor authors) signal that they want to opt in to the new API via some config setting. Then we can just force them to use the provided function to wrap old-style preprocessors (as they would have to do in Svelte 4 anyway).
So with no "experimental" flag set, we respect either style of API, but preserve the old ordering mechanism: any new-style API usage will be ordered in the same fashion as markup is today. Additionally, when we encounter the new-style API, we log a warning message to the terminal with a brief message about ordering and a link to visit for more info.
When the experimental flag set to true, we remove the old-style API entirely which fixes ordering automatically. Additionally, when we encounter the old-style API, we throw an error with an explanation that "Preprocessor {name} attempted to use the deprecated API, but you've opted in to the new style. Either wrap your preprocessor in the legacyPreprocessor function or opt out of the new behavior." Also with a link to read more.
| ## Alternatives | ||
| None that I can think of right now | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordering
- just a config flag: opt in to order preprocessors top to bottom with no special second or third pass on scriptorstyle.
- Some nesting mechanism as I created in my PR that you linked at the top. (For those who have not been following, I personally prefer this RFC solution to the PR that I originally opened)
Attr adjustments within script/style
- You could fix the inability to edit tag information in scripts and styles by adding another optional property in the return of either scriptorstylethat we'd then act on internally. The bit below is adapted from current Svelte docs. See the commented lines which indicate the change I mean.
type Preprocessor = {
	markup?: (input: { content: string, filename: string }) => Promise<{
		code: string,
		dependencies?: Array<string>
	}>,
	script?: (input: { content: string, markup: string, attributes: Record<string, string>, filename: string }) => Promise<{
		code: string,
		dependencies?: Array<string>
		// Add the following property
		attributes?: Record<string, string>
	}>,
	style?: (input: { content: string, markup: string, attributes: Record<string, string>, filename: string }) => Promise<{
		code: string,
		dependencies?: Array<string>
		// Add the following property
		attributes?: Record<string, string>
	}>
}- We could still provide a helper function for grabbing scriptandstyledirectly out ofmarkupas described in this RFC, but without changing the underlying API at all. Preprocessor authors could more easily transition fromstyleorscripttomarkupusing the function we provide.
As for these alternatives, this RFC feels like the better, more thorough approach to me.
| ## Drawbacks | ||
| None that I can think of right now | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main drawback is that this is planned to be a breaking change in Svelte 4. But that is mitigated by providing the legacyPreprocessor function.
There are other potential drawbacks... but I cannot guess at them because I do not know the original thinking that led to doing three separate passes on preprocessors to begin with. That decision led to this current ordering problem which is addressed in this RFC, but it ostensibly was solving some other problem at the time. In theory, this change would re-introduce whatever that problem was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked the other maintainers about the decision for the API back then and there was no conscious "this is good because X" reason for that. The main argument was to provide some ease of use, but that would be achieved just as well with the helper functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am hoping that this change is a slam dunk, then.
        
          
                text/preprocessing-api-rework.md
              
                Outdated
          
        
      | ## Unresolved questions | ||
| - We could expand the functionality of `extractScripts`/`extractStyles`. Right now, every script/style is processed, not just the top level ones. Enhance the Svelte parser with a mode that only parses the top level script/style locations, but not its contents? | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd set this as a priority (albeit a low one). This came up a few times in the svelte-preprocess issues. However, the question persists: should we make it possible for someone to write, let's say, typescript code inside a <svelte:head> tag? Or is it outside of the scope of the preprocessing step? One code is for the compiler, the other for the browser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since after preprocessing all script tags should be JavaScript it doesn't make much of a difference I'd say. But it certainly feels weird to write the non-top-level script/style tags in another language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it certainly feels weird to write the non-top-level script/style tags in another language.
100% with you. I'd focus on top-level only for the time being.
| This looks very promising indeed and will also simplify quite a lot the preprocessors. I'm travelling right now, but will read it again once I'm back. | 
Co-authored-by: Christian Kaisermann <[email protected]>
| looks like a very good idea. i think the current design is due to a hope that plugins would compose nicely with a predictable order, but i think in practice composing within one step like this would achieve the same goal + be far more flexible. | 
| Some things i remember from working with preprocessor arrays in svite and vite-plugin-svelte: 
 So maybe instead of a function and object signature would still be helpful? {
  name: 'foo';
  preprocess: function
  capabilites: tbd
} | 
| ) | ||
| ``` | ||
|  | ||
| Additionally, `svelte/preprocess` exports new utility functions which essentially establish the current behavior: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the preprocess function live in svelte/compiler but having the utility functions live in svelte/preprocess seems a bit confusing to me. I'm not sure what I'd suggest though. Having preprocess in svelte/compiler be the "old" preprocessor and preprocess in svelte/preprocessor be the "new" preprocessor would eliminate the need for different behavior based on what types of arguments were passed (and then we'd eventually get rid of the old one in Svelte 4), but that might be even more confusing.
Having the compiler and the preprocessor be separate concerns (exported from separate modules) seems appealing to me, but I'm not sure how valuable of a distinction that would be to force users to make. Almost everyone is going to be interacting with this with svelte.config.js, not by directly calling preprocess and compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that even though most users will interact with the whole system inside the config file, there is still merit to the idea of moving everything new into svelte/preprocess for the sake of internal organization. I like the idea.
| 
 I like the idea of preprocessors creating some kind of registration object that contains information about the preprocessor itself. Perhaps at this stage, we don't worry so much about the  (To be clear, I only suggest skipping on capabilities because that seems quite a complex problem to solve and feels out of scope for this RFC. Not that it wouldn't give us some interesting capabilities.) | 
- make it an object with at least name/preprocess properties - add isTopLevel boolean - enhance how we teach this section
| Thanks for all your suggestions! I like the idea of making the new preprocessor API an object, too, with a  | 
| I really like this way of writing preprocessors, feels more streamlined and predictable. I have some questions below: If we do have sveltejs/svelte#6611 expose a public API, do we need  Regarding  Another concern with  Also, about the removal of  With the above said, I'm not too much a sourcemap wizard so I might be babbling stuff. Anyways, really looking forward to this! | 
| Quoting @pngwn from Discord: 
 So essentially preprocessing would be a two-stage process: 
 Problem with that is that there are preprocessors already AFAIK which rely on script being transformed but not on the html part being transformed because they do that themselves then. So maybe that two-stage process could be loosened so that we can order 1/2 like we want, and the preprocessing pipeline detects if this is a AST->AST or string->string preprocessor and prepares the input accordingly (transform AST back into string for example). | 
|  | ||
| ```typescript | ||
| { | ||
| name: 'a string', // sometimes it's good to know from a preprocessor library persepective what preprocessor you are dealing with | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| name: 'a string', // sometimes it's good to know from a preprocessor library persepective what preprocessor you are dealing with | |
| name: 'a string', // sometimes it's good to know from a preprocessor library perspective what preprocessor you are dealing with | 
| This might also be a good opportunity to include an API for preprocessors to contribute to the warnings and errors similar to those output by svelte.compile. This allows tooling (e.g. VS Code) to pick them up accordingly. Integration with the  | 
| 
 We probably don't want to be transforming back-and-forth more than necessary. Maybe having various hooks at various stages of the pipeline like Rollup would help? E.g.  | 
| I like this but I think we can go simpler — if we exposed a proper parser and require the input to that parser to adhere to certain rules, then I don't think we need  import MagicString from 'magic-string';
import { parse } from 'svelte/compiler';
const preprocessor = {
  name: 'foo-to-bar',
  preprocess: ({ code, filename }) => {
    const ast = parse(code);
    const result = new MagicString(code);
    // this is all we need to get top-level scripts. no faffing
    const top_level_scripts = ast.body.filter((element) => element.name === 'script');
    for (script of top_level_scripts) {
      const { start, end, value } = script.children[0]; // <script> will only ever have one child, a text node
      for (const match of value.matchAll('foo')) {
        result.overwrite(
          start + match.index,
          start + match.index + match[0].length,
          'bar',
          { storeName: true }
        );
      }
    }
    return {
      code: result.toString(),
      map: result.generateMap()
    };
  }
}Requirements for using  
 (I think there's an interesting conversation to be had separately about whether we want to support TypeScript syntax in curlies natively, but I don't know that we need to resolve that here.) | 
| sveltejs/svelte#6611 is related to that as it parses the mustache tags accordingly. Does your proposal mean that we just make the existing  | 
| 
 It's honestly quite problematic that the AST isn't considered public API. I think Svelte 4 would be a good opportunity to correct that. A noteworthy thing here of course is that  We would also have to be prescriptive about control flow — in order to support things like a preprocessor that converted  {#if error}
  <p>oh no!</p>
{:else if vibes === "cool"}
  <p>everything is cool</p>
{:else}
  <p>no error, but the vibes are off</p>
{/if}
{#each things as thing, i (thing.id)}
  <p>{thing.name}</p>
{:else}
  <p>no things</p>
{/each}
{#switch color}
  <p>default case</p>
{:case "red"}
  <p>red</p>
{:case "blue"}
  <p>blue</p>
{:case "green"}
  <p>green</p>
{/switch}...might be represented thusly: [
  {
    type: 'block',
    kind: 'if',
    branches: [
      {
        value: 'error',
        children: [...]
      },
      {
        value: 'else if vibes === "cool"',
        children: [...]
      },
      {
        value: 'else',
        children: [...]
      }
    ]
  },
  {
    type: 'block',
    kind: 'each',
    branches: [
      {
        value: 'things as thing, i (thing.id)',
        children: [...]
      },
      {
        value: 'else',
        children: [...]
      }
    ]
  },
  {
    type: 'block',
    kind: 'switch',
    branches: [
      {
        value: 'color',
        children: [...]
      },
      {
        value: 'case "red"',
        children: [...]
      },
      {
        value: 'case "blue"',
        children: [...]
      },
      {
        value: 'case "green"',
        children: [...]
      }
    ]
  }
]In other words each  (Speaking of correctness: should the parser be fault-tolerant? Is it beneficial for things like editor integrations if we're able to generate an AST from partial content?) 
 Exactly — if the contract is as simple as  | 
| Love the idea of simplifying down to just a one-stage process as I've also run into some pain with the current three-stage process. 
 This is fine, but could you include a small example in the future docs of how to target these blocks using regex (location, attributes, and content) since you've already been through the ins and outs of such. In large apps, the speed difference between AST and regex is significant, and I think you would save a lot of users a lot of time by giving some examples (including something like the AST example shown above as well). | 
| 
 Yes it would be very much benefitial for editor integrations since right now every syntax error makes the whole file completely different in the underlying TS representation. But I don't think we need to solve this as part of the preprocessor rework. In general I feel like the comments go away from just the preprocess rework towards a parser rework which is more involved and, if not behind some flag or through a different parser method, would be a breaking change. Right now  From what you're getting at it sounds like you want something similar to what @pngwn did with the svast specification and its implementation. | 
| 
 I was wondering if we should implement the parser using something like Lezer, since we'd then get error tolerance for free | 
| Lezer docs talk about its limitations/tradeoffs: 
 I'm not familiar about with the details but my impression is that Lezer is not meant for much beyond syntax highlighting. | 
| 
 My fear would be that this results in slower parsing, would be nice if it would be the other way. @pngwn's work AFAIK was much faster (sacrificing readability for it because it's all one big file). | 
| On the topic of error-tolerance/Lezer, you could have detailed error tolerance (i.e custom recovery heuristics specific to each error). This allows for faster recovery (recovers fewer tokens away), better user errors and enables things like autofix and VS Code code-actions but at the expense of memory, speed, bundle-size etc. On the other hand, simple heuristics can get you pretty far. For example, I implemented a Svelte parser that has error recovery for some cases, usually resuming parsing at the next  About the Svelte parserer I started (but haven't finished): I've been prototyping a state machine library (https://github.com/kwangure/hine) and to test it, I built a Svelte parser (https://github.com/kwangure/parserer). It resumes parsing at the next  {
    "html": {
        "start": 0,
        "end": 13,
        "type": "Fragment",
        "children": [
            {
                "error": {
                    "code": "invalid_attribute_name",
                    "message": "Expected a valid attribute character but instead found '+'",
                    "start": 6,
                    "end": 13,
                    "type": "Invalid"
                },
                "start": 0,
                "end": 13,
                "type": "Element",
                "name": "div",
                "attributes": [],
                "children": []
            }
        ]
    }
}I just made the Parserer repo public for the purpose of this comment but it was never meant for public consumption so it has rough edges and YMMV if you choose to run it. Neither Hine or Parserer is primetime-ready. Also, state machines are good. | 
| We want to tackle the new preprocessor API for Svelte 4. This would probably only include the  One major advantage I just realized now with the existing API is that it handles merging of source maps etc for you. It takes care of extracting the source map and transforming it so that you don't need to wrestle with all that stuff yourself. For example if you run TypeScript on the script code, you get back something like export const foo;
// #sourceMappingURL=..and Svelte's  So maybe the existing API isn't that bad overall? Maybe just the order needs to be switched to run in sequence; and you can't have multiple anymore and chose between either  One thing I don't see how it's possible to do yet through that API is, once we allow TypeScript inside mustache tags, you have the problem of all the source mapping again - at which point some kind of convenience method is needed to achieve this. So maybe the solution is to implement the new API but with a function which takes care at least of that? If we decide to depracate existing API and go with the new API, we will add more documentation around creating a preprocessor with some code snippets including the regex approach to ease onboarding. The old API will keep functioning but a warning is emitted that it will stop doing so in Svelte 5. | 
| Removing PreprocessorGroup makes it a bit harder for preprocessors like vitePreprocess to be easily added in one line. To avoid users having to add multiple imported parts, it would help to allow returning arrays that get flattened, similar to how vite handles plugins. Handling typescript inside template strings is a tricky situation. It needs to be done in a single modification + sourcemap so that we can continue with other preprocessors and merge everything at the end. How do we avoid frequent string operations on the whole code? Having separate script and style preprocessors is also very convenient for preprocessor development and allows more performance optimizations like running script and style preprocessors in parallel. I'd love if there was a way we can keep a "phases" approach, maybe limit it to markup and ast phases and try to come up with a way to pass along ast and maybe magicstrings to avoid having to redo a lot of work. On that note, what about making returning sourcemaps and dependencies mandatory? | 
| Could you elaborate on that performance optimization thing? How would that help? Right now everything's run sequentially and things are re-parsed on every preprocessor step. Are you suggesting we change that to speed things up? How would that look like? Would a sub process or thread really make things faster here, assuming you generate one for every file? I agree with you that  | 
Rendered
@kaisermann @arxpoetica @happycollision I'd like to hear your thoughts on this. Surely I missed some other pain points that need to be adressed.