Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bun runtime plugin onResolve doesn't filter non-file: protocol imports #9863

Open
TomasHubelbauer opened this issue Apr 2, 2024 · 33 comments
Labels
bug Something isn't working

Comments

@TomasHubelbauer
Copy link

What version of Bun is running?

1.1.0+5903a6141

What platform is your computer?

Darwin 23.4.0 arm64 arm

What steps can reproduce the bug?

See my repro repo: https://github.com/TomasHubelbauer/bun-runtime-plugin-onResolve-custom-protocol

Make a runtime plugin with an onResolve hook:

import { plugin } from "bun";

await plugin({
  name: "test",
  async setup(build) {
    console.log("Plugin 1 loaded!", build);

    build.onResolve({ filter: /\.demo$/ }, (args) => {
      console.log("onResolve1", args);

      return {
        path: './demo1.ts'
      };
    });
  },
});

Register it in bunfig.toml: preload = ["./plugin.ts"].

Run a script with an import that should get captured by the plugin: bun index.ts:

import demo1 from '1.demo';

console.log('demo 1', demo1);

So far everything works as expected:

Plugin 1 loaded! {
  target: "bun",
  onLoad: [Function: onLoad],
  onResolve: [Function: onResolve],
  module: [Function: module],
}
onResolve1 {
  path: "1.demo",
  importer: "/Users/tom/Desktop/bun-runtime-plugin-onResolve/index.ts",
}
demo 1 Hello, 1!

Change the plugin to filter for a custom protocol:

import { plugin } from "bun";

await plugin({
  name: "test",
  async setup(build) {
    console.log("Plugin 2 loaded!", build);

    build.onResolve({ filter: /^demo:/ }, (args) => {
      console.log("onResolve2", args);

      return {
        path: './demo2.ts'
      };
    });
  },
});

Change the import and run the script again:

import demo2 from 'demo:2';

console.log('demo 2', demo2);

Now we see a problem:

Plugin 2 loaded! {
  target: "bun",
  onLoad: [Function: onLoad],
  onResolve: [Function: onResolve],
  module: [Function: module],
}
error: Cannot find package "demo:2" from "/…/index.ts"

Notice that onResolve2 was never even called.

This might be a design decision, not a real bug, if I learn as much, I will contribute a documentation change explaining this.

What is the expected behavior?

Plugin 2 loaded! {
  target: "bun",
  onLoad: [Function: onLoad],
  onResolve: [Function: onResolve],
  module: [Function: module],
}
onResolve2 {
  path: "demo:2",
  importer: "/Users/tom/Desktop/bun-runtime-plugin-onResolve/index.ts",
}
demo 2 Hello, 2!

What do you see instead?

A runtime error in module resolution not preceded by the custom onResolve hook call.

Additional information

This might not be a bug but a design decision. I ended up trying this, because build.module doesn't accept a regex for the module name (so that I could virtualize all modules on a custom protocol) so I ended up looking at onResolve. Ultimately I would prefer a change to build.module that allows for regexes, but this might also be worth supporting IMO.

@TomasHubelbauer TomasHubelbauer added the bug Something isn't working label Apr 2, 2024
@guest271314
Copy link
Contributor

In theory this could be resolved by supporting import maps. Then we wouldn't need to use plugins.

@TomasHubelbauer
Copy link
Author

That's interesting. I found #4458 talking about adding import map support to the bundler. But I wonder how one would define import maps for the runtime? Bunfig?

@TomasHubelbauer
Copy link
Author

Ultimately my interest lies in being able to make up a module from thin air when using a custom protocol in the import. Ideally build.module would accept a regex not just a string which would allow me to do that. I ended up looking at onResolve and onBuild as a hacky combination to achieve the same. My logic was that onResolve could catch these imports and drop a temporary file and redirect to it under a different extension and onLoad would pick up that and return the file as { contents } and delete it (I don't want it to hand around on the disk - this would just be a hack in lieu of build.module regex support) so import maps wouldn't help in this specific case.

@guest271314
Copy link
Contributor

Like this https://github.com/guest271314/webbundle/blob/main/deno.json

{
  "imports": {
    "base32-encode": "https://esm.sh/base32-encode@2.0.0",
    "cborg": "https://esm.sh/cborg@1.10.2",
    "commander": "https://esm.sh/commander@4.1.1",
    "mime": "https://esm.sh/mime@2.6.0",
    "to-data-view": "https://esm.sh/to-data-view@2.0.0",
    "wbn": "https://esm.sh/wbn@0.0.9",
    "wbn-sign-webcrypto": "https://raw.githubusercontent.com/guest271314/wbn-sign-webcrypto/main/lib/wbn-sign.js",
    "zod": "https://esm.sh/zod@3.22.4"
  }
}

Then bun build --import-map="./deno.json".

The idea for me to get away from runtime-specific inventions, so we can run the same code in bun, deno, and node, et al.

@TomasHubelbauer
Copy link
Author

TomasHubelbauer commented Apr 3, 2024

This is related to bun build, but what about the bun index.ts case - no build step, just Bun running a file at runtime with some plugins in the Bunfig preload. I would expect import map support would include the ability to specify (and mutate) the import map at runtime the same way that is possible in the browser.

Edit: turns out mutating import maps at runtime is not possible in browsers, I mis-remembered: WICG/import-maps#92.

@guest271314
Copy link
Contributor

Ultimately my interest lies in being able to make up a module from thin air when using a custom protocol in the import.

For your use case you should be able to use a Data URL or a Bob URL without having to create a custom protocol. See Deno dynamic import("./exports") throws module not found for "exports.js" dynamically created in the script

  const script = `export default 1;`;

then https://gist.github.com/guest271314/4637bb1288321256d7c14c72ebc81137#file-dynamic_import_always_throws-js-L24-L26

 // bun
  if (runtime.includes("Bun")) {
    await Bun.write("exports.js", encoder.encode(script));
  }

then use dynamic import() https://gist.github.com/guest271314/4637bb1288321256d7c14c72ebc81137#file-dynamic_import_always_throws-js-L27-L29

  const { default: module } = await import("./exports.js"); // Raw string specifier
  console.log({ module });
  console.log({ runtime });

Deno's dynamic import() ain't dynamic. So far Bun's is.

@TomasHubelbauer
Copy link
Author

I am not attempting to create a custom protocol, I have an existing custom protocol which I am looking to bridge Bun into supporting. It is possible that runtime plugins are intentionally locked into the file: protocol only, but if that design decision is not strongly held, I would definitely like runtime plugins to support any protocol / filter against any import specifier, not just the ones with implicit file: protocol.

Node has experimental support for custom loaders which I believe are roughly equivalent to Bun's runtime plugins and AFAICT it should be possible to create a custom loader which matches the URL by protocol, here demonstrated on a rudimentary HTTP/HTTPS loader:
https://nodejs.org/api/module.html#import-from-https

Whether it is by build.module module virtualization with regex module name matching or extending onResolve and onLoad to support non-file protocol, this would enable something similar to what's possible there.

@guest271314
Copy link
Contributor

Why is the custom protocol necessary? Can't you just do

import(`data:text/javascript,export default 1`)

?

@TomasHubelbauer
Copy link
Author

Whatever is on the other side of the import is dynamic, I can't bake it in or use a static data: protocol import.

@guest271314
Copy link
Contributor

Reproduced on Bun version 1.0.36 (bun-linux-x64-baseline) re onResolve.

I was able to get the expected work using import.meta.url, new URL(), and build.module()

import { plugin } from "bun";

await plugin({
  name: "test2",
  async setup(build) {
    console.log("Plugin 2 loaded!", build);
    build.module(
      // The specifier, which can be any string - except a built-in, such as "buffer"
      "demo:2",
      async () => {
        const metaURL = new URL(await import.meta.resolve("./demo2.ts"), import.meta.url);
        return {
          contents: await (await fetch(metaURL.href)).text(),
          loader: "ts",
        };
      }
    )
/*
    build.onResolve({ filter: /^demo:/ }, (args) => {
      console.log("onResolve2", args);

      return {
        path: './demo2.ts'
      };
    });
*/
  },
});

@guest271314
Copy link
Contributor

This alone works too

const metaURL = new URL("./demo2.ts", import.meta.url);

@guest271314
Copy link
Contributor

Edit: turns out mutating import maps at runtime is not possible in browsers, I mis-remembered: WICG/import-maps#92.

I'm not sure how that issue is related. Bun doesn't implement import maps so we can't test import maps at all in bun.

Technically this is possible bun and deno, where we can fetch() file: protocol using absolute path:

const script = await (await fetch("file:///absolute/path/to/local/file")).text();
const imports = await import(new Blob([script], {type:"text/javascript"}));

And possible for node which does not support file: protocol for the Undici fetch() implementation (see Implementing file: protocol support for WHATWG fetch() in Node.js without fs or import) you can modify Undici/lib/web/fetch/index.js#L129 to include

  const p = createDeferredPromise()
  const url = new URL(input);
  if (url.protocol === "file:") {
    return import("node:fs").then((fs) => {
      p.resolve(new Response(fs.readFileSync(url)));
      return p.promise;
    })       
  }

for the capability to to use file: protocol with fetch() using node.

Then you can use dynamic import() for any script you create in the script or not. For deno with the caveat that bare string specifers behave differently than bun or deno dynamic_import_always_throws.js.

@guest271314
Copy link
Contributor

Should be

const imports = await import(URL.createObjectURL(new Blob([script], {type:"text/javascript"})));

to create a Blob URL. Which also gets around denos bare string specifier behaviour. Then you can iport anything from anywhere, particularly using import assertions with type set to "json" and data serialized to a Uint8Array spread to an Array and stringified to JSON.

@guest271314
Copy link
Contributor

This is hownode works

// https://github.com/nodejs/node/blob/main/doc/api/module.md#import-from-https
// https-hooks.mjs
import { get } from "node:https";

export function load(url, context, nextLoad) {
  // For JavaScript to be loaded over the network, we need to fetch and
  // return it.
  if (url.startsWith("https://")) {
    return new Promise((resolve, reject) => {
      get(url, (res) => {
        let data = "";
        res.setEncoding("utf8");
        res.on("data", (chunk) => data += chunk);
        res.on("end", () =>
          resolve({
            // This example assumes all network-provided JavaScript is ES module
            // code.
            format: "module",
            shortCircuit: true,
            source: data,
          }));
      }).on("error", (err) => reject(err));
    });
  }

  // Let Node.js handle all other URLs.
  return nextLoad(url);
}
node --experimental-default-type=module --experimental-network-imports --loader=./https-hooks.js ./test-network-imports.js

or

node --experimental-default-type=module --experimental-network-imports --import 'data:text/javascript,import { register } from "node:module"; import { pathToFileURL } from "node:url"; register(pathToFileURL("./https-hooks.js"));' ./test-network-imports.js

See

feross/buffer bundled to Ecmascript Module.

@guest271314
Copy link
Contributor

Using deno with an import map deno run --import-map=import-map.json index.ts

import-map.json

{
  "imports": {
    "1.demo": "./demo1.ts",
    "demo:2": "./demo2.ts"
  }
}

@TomasHubelbauer
Copy link
Author

Thanks for the PR, I noted in the repro repo README and in my initial post in this issue that I can't use build.module because it doesn't take a regex. demo:2 is not meant to be a module name, the plugin is meant to be able to handle any resolution starting with demo:.

As far as import maps go, I was just pondering how it could look, I am aware Bun doesn't support import maps at the moment, but should it, I would hope for an implementation where the import map can be passed to bun index.ts via the CLI or Bunfig or some other means, not only baked into a build via bun build.

I am aware of the Node loader support, I've linked it previously. This is something I am hoping to do an equivalent of in Bun, because it seems like it can support custom protocols as the HTTP/HTTPS loader demonstrates. I believe if build.module took a regex as an option in addition to a string, it would be possible to add this type of support to Bun as well.

I've been able to build Bun locally and while I am not at al proficient in Zig, I think I've gathered enough context by now to maybe be able to contribute regex support to build.module. I'm not very good at debugging Zig yet, though, so it's a lot of wading through log statements.

@guest271314
Copy link
Contributor

I can't use build.module because it doesn't take a regex

Sure you can use build.module. RegExp is not special. Just write out all the "demo:*" specifiers you are expecting.

The Node loader and Bun plugin systems are roughly the same from my perspective. Import maps are by far the least amount of code and less cumbersome.

@TomasHubelbauer
Copy link
Author

I cannot write out all the specifiers I am expecting as the specifiers are dynamic, not known at compile time.

@guest271314
Copy link
Contributor

You have to be able to write them out at some point in the process, else you will be getting "Module not found" errors.

@TomasHubelbauer
Copy link
Author

With build.module being able to return virtual modules, there is no need to map every module specifier to a path on the disk, especially if the module isn't backed by a physical path. The Node HTTP loader is a good example of this.

https://bun.sh/docs/runtime/plugins#virtual-modules

@guest271314
Copy link
Contributor

I am expecting as the specifiers are dynamic, not known at compile time.

there is no need to map every module specifier to a path on the disk

Yes, there is. How else are you going to resolve some arbitrary specifier to a real path?

That doesn't matter. There MUST be a mapping between the specifiers you are expecting and URL's. You MUST already know what arbitrary specifiers map to which local or remote paths. There is no escaping these technical facts.

From my perspective both Node.js loader and Bun plugin systems are clucky and less ideal than import maps - but that's all node and bun have since neither has import maps. Node.js loader system doesn't offer anything novel in this regard cf. Bun' plugin interface.

@guest271314
Copy link
Contributor

I am expecting as the specifiers are dynamic, not known at compile time.

Then how are you going to write a RegExp for the specifier? You can't in this case.

@TomasHubelbauer
Copy link
Author

Here's the simplest case I can think of to demonstrate build.module with a regex:

import { plugin } from "bun";

await plugin({
  name: "test",
  async setup(build) {
    console.log("Plugin loaded!", build);

    build.build(/^demo:/, (specifier) => {
      return {
        contents: `export default "Hi there, I am the '${specifier}' import!";`,
        loader: 'ts'
      };
    });
  },
});
console.log(await import('demo:alice')); // "Hi there, I am the 'alice' import!"
console.log(await import('demo:bob')); // "Hi there, I am the 'bob' import!"

@guest271314
Copy link
Contributor

As I suggested above you can use a Blob URL or Data URL for that.

Even simpler is to just write out the "demo:alice" and "demo:bob" specifiers.

@guest271314
Copy link
Contributor

I see what you are trying to do now. You are correct, Bun doesn't have the specifier parameter string as a parameter exposed in the function that returns content and loader. That's a reasonable feature request.

The RegExp parts doesn't do anything special though. We could just as well use string methods to to match and do stuff. What you want is that specifier in the parameter passed to the function that returns content and loaders type.

@guest271314
Copy link
Contributor

I'll try t figure out a way or two to achieve what you are tryiing to do with what we already have exposed.

@guest271314
Copy link
Contributor

Here's one way to achieve the requirement, including still using bun.plugin(). We parse the file that is going to be passed to bun run in the preloaded script, before we call plugin(). That is, we do what we want in the preload script outside of just relying on bun.plugin() to do the work. We use RegExp to match specifiers begining an arbitrary specifier: specifier, replace the exported content.

We can do this without bun.plugin(), too.

const url = new URL("./index.ts", import.meta.url);
const script = await (await fetch(url.href)).text();
const regexp = /(?<=['"])demo:\w+(?=['"])/g;
const matches = script.match(regexp);

import { plugin } from "bun";

for (const specifier of matches) {
  await plugin({
    name: specifier,
    async setup(build) {
      console.log("Plugin 2 loaded!", build);
      build.module(
        // The specifier, which can be any string - except a built-in, such as "buffer"
        specifier,
        async () => {
          console.log({ specifier });
          return {
            contents: `export default "Hi there, I am the '${specifier}' import!";`,
            loader: "ts",
          };
        },
      );
    },
  });
}

@guest271314
Copy link
Contributor

The script file name passed to bun is dynamically retrievable using Bun.argv

plugin2.ts

const [, file] = Bun.argv;
const url = new URL(file, import.meta.url);
// ...

@guest271314
Copy link
Contributor

Substituting Bun.file() for fetch() in plugin2.ts

const url = await import.meta.resolve(file);
const script = await Bun.file(url).text();

then we can do bun run index.ts which contains

import demo1 from '1.demo';

console.log('demo 1', demo1);

import demo2 from 'demo:2';

console.log('demo 2', demo2);

console.log(await import('demo:alice')); // "Hi there, I am the 'alice' import!"
console.log(await import('demo:bob')); // "Hi there, I am the 'bob' import!"

and get this result

Plugin 1 loaded! {
  target: "bun",
  onLoad: [Function: onLoad],
  onResolve: [Function: onResolve],
  module: [Function: module],
}
[ "/home/user/bin/bun", "/home/user/bun-runtime-plugin-onResolve-custom-protocol/index.ts"
]
Plugin 2 loaded! {
  target: "bun",
  onLoad: [Function: onLoad],
  onResolve: [Function: onResolve],
  module: [Function: module],
}
Plugin 2 loaded! {
  target: "bun",
  onLoad: [Function: onLoad],
  onResolve: [Function: onResolve],
  module: [Function: module],
}
Plugin 2 loaded! {
  target: "bun",
  onLoad: [Function: onLoad],
  onResolve: [Function: onResolve],
  module: [Function: module],
}
onResolve1 {
  path: "1.demo",
  importer: "/home/user/bun-runtime-plugin-onResolve-custom-protocol/index.ts",
}
{
  specifier: "demo:2",
}
demo 1 Hello, 1!
demo 2 Hi there, I am the 'demo:2' import!
{
  specifier: "demo:alice",
}
Module {
  default: "Hi there, I am the 'demo:alice' import!",
}
{
  specifier: "demo:bob",
}
Module {
  default: "Hi there, I am the 'demo:bob' import!",
}

@guest271314
Copy link
Contributor

To not hard code any specifier we can use RegExp, too, to match static import ... or dynamic import( ... )

const regexp = /(?<=['"]|import\s\w+\sfrom(?=\s|)|import\((?=\s|))[\w\d+]+:[\w\d]+(?=['"])/g;

@guest271314
Copy link
Contributor

Then you can include something like this in the script

const r = "0:1";
console.log(await import(r)); // "Hi there, I am the random dynamic import!"

import demo3 from'x0000:2';
console.log('demo 3', demo3);

@guest271314
Copy link
Contributor

I don't think node has a way to handle static import and dynamic import() the way we have here for dynamic, arbitrary specifiers with either the loaders https://github.com/nodejs/loaders or vm module https://nodejs.org/api/vm.html.

@guest271314
Copy link
Contributor

There is a way to do this using node with vm module

node --experimental-default-type=module --experimental-vm-modules --no-warnings node-vm.js
import vm from "node:vm";
import { readFileSync } from "node:fs";

const contextifiedObject = vm.createContext({
  print: console.log,
});

// Step 1
//
// Create a Module by constructing a new `vm.SourceTextModule` object. This
// parses the provided source text, throwing a `SyntaxError` if anything goes
// wrong. By default, a Module is created in the top context. But here, we
// specify `contextifiedObject` as the context this Module belongs to.
//
// Here, we attempt to obtain the default export from the module "foo", and
// put it into local binding "secret".

const script = readFileSync(
  new URL(import.meta.resolve("./index.js", import.meta.url)),
);
const scriptText = `${script}`.replace(/console.log/g, "print");

const bar = new vm.SourceTextModule(scriptText, {
  context: contextifiedObject,
  importModuleDynamically: async function importModuleDynamically(
    specifier,
    referrer,
    importAttributes,
  ) {
    console.log({ specifier });
    const m = new vm.SyntheticModule(["default"], () => {});
    await m.link(() => {});
    m.setExport("default", `Hi there, I am the '${specifier}' import!`);
    return m;
  },
});

// Step 2
//
// "Link" the imported dependencies of this Module to it.
//
// The provided linking callback (the "linker") accepts two arguments: the
// parent module (`bar` in this case) and the string that is the specifier of
// the imported module. The callback is expected to return a Module that
// corresponds to the provided specifier, with certain requirements documented
// in `module.link()`.
//
// If linking has not started for the returned Module, the same linker
// callback will be called on the returned Module.
//
// Even top-level Modules without dependencies must be explicitly linked. The
// callback provided would never be called, however.
//
// The link() method returns a Promise that will be resolved when all the
// Promises returned by the linker resolve.
//
// Note: This is a contrived example in that the linker function creates a new
// "foo" module every time it is called. In a full-fledged module system, a
// cache would probably be used to avoid duplicated modules.

async function linker(specifier, referencingModule) {
  console.log({ specifier });
  return new vm.SourceTextModule(
    `
      // The "secret" variable refers to the global variable we added to
      // "contextifiedObject" when creating the context.
      export default "Hi there, I am the '${specifier}' import!";
    `,
    {
      context: referencingModule.context,
    },
  );

  // Using `contextifiedObject` instead of `referencingModule.context`
  // here would work as well.
  throw new Error(`Unable to resolve dependency: ${specifier}`);
}
await bar.link(linker);

// Step 3
//
// Evaluate the Module. The evaluate() method returns a promise which will
// resolve after the module has finished evaluating.
await bar.evaluate();

prints

{ specifier: '1.demo' }
{ specifier: 'demo:2' }
{ specifier: 'x0000:2' }
demo 1 Hi there, I am the '1.demo' import!
demo 2 Hi there, I am the 'demo:2' import!
{ specifier: 'demo:alice' }
Hi there, I am the 'demo:alice' import!
{ specifier: 'demo:bob' }
Hi there, I am the 'demo:bob' import!
{ specifier: '0:1' }
Hi there, I am the '0:1' import!
demo 3 Hi there, I am the 'x0000:2' import!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants