Trial details page empty #5793

Interfish · 2024-06-17T02:06:31Z

Describe the issue:
Trial details page is empty while overview page and others seems okay. I am using Edge browser. Here is a screenshot of total blank details page below, after clicking the button:

Overview page is okay:

Environment:

NNI version: 3.0
Training service (local|remote|pai|aml|etc): local
Client OS: Ubuntu 22.04 docker container
Server OS (for remote mode only): NaN
Python version: 3.10.12
PyTorch/TensorFlow version: PyTorch 2.3.1+cu121
Is conda/virtualenv/venv used?:No
Is running in Docker?: Yes

Configuration:

Experiment config (remember to remove secrets!):

{
  "params": {
    "experimentType": "hpo",
    "trialCommand": "python dl_run.py --use_nni --config /life_changer/experiments/ws_related/train/nni/v25/default.yaml",
    "trialCodeDirectory": "/life_changer/experiments/ws_related/train",
    "trialConcurrency": 1,
    "maxTrialDuration": "1h",
    "useAnnotation": false,
    "debug": false,
    "logLevel": "info",
    "experimentWorkingDirectory": "/life_changer/experiments/ws_related/train/nni/experiments",
    "tuner": {
      "name": "GridSearch"
    },
    "trainingService": {
      "platform": "local",
      "trialCommand": "python dl_run.py --use_nni --config /life_changer/experiments/ws_related/train/nni/v25/default.yaml",
      "trialCodeDirectory": "/life_changer/experiments/ws_related/train",
      "debug": false,
      "maxTrialNumberPerGpu": 1,
      "reuseMode": false
    }
  },
  "execDuration": "1m 40s",
  "nextSequenceId": 2,
  "revision": 14
}

Search space:

{
  "model.embedding.mark_unknown_as_padding": {
    "_type": "choice",
    "_value": [true, false]
  },
  "model.embedding.init_weight_by_numerical_intensity": {
    "_type": "choice",
    "_value": [true, false]
  },
  "model.cross_layer.sub_net_dims": {
    "_type": "choice",
    "_value": [4, 16]
  },
  "model.cross_layer.score_fn": {
    "_type": "choice",
    "_value": [
      {
        "softplus": {"use": true, "beta": 5, "threshold": 100}
      },
      {
        "softmax": {"use": true}
      },
      {
        "silu": {"use": true}
      }
    ]
  },
  "model.cross_layer.use_lhuc": {
    "_type": "choice",
    "_value": [true, false]
  },
  "model.cross_layer.global_score_fn": {
    "_type": "choice",
    "_value": [true, false]
  },
  "model.output_layer.moe.topk": {
    "_type": "choice",
    "_value": [
      null,
      2
    ]
  },
  "model.output_layer.moe.num_experts": {
    "_type": "choice",
    "_value": [8, 32]
  },
  "model.output_layer.moe.experts_share_input": {
    "_type": "choice",
    "_value": [true, false]
  },
  "model.output_layer.moe.gating_input": {
    "_type": "choice",
    "_value": ["odds_team_ha_emb", "same_as_expert"]
  },
  "model.loss": {
    "_type": "choice",
    "_value": [
      {
        "focal": {"use": true, "gamma": 2}
      },
      {
        "ghmc": {"use": true, "bins": 10}
      },
      {
        "wdl_prob_rank": {"use": true, "top_k_frac": 0.1, "top_k_weight"": 10}
      },
      {
        "odds_weighted": {"use": true, "pow_of_n": 2}
      },
      {
        "pred_odds_topk": {
          "use": true,
          "top_k_frac": 0.1,
          "top_k_weight": 10,
          "weight_by_odds": false
        }
      },
      {
        "pred_odds_topk": {
          "use": true,
          "top_k_frac": 0.1,
          "top_k_weight": null,
          "weight_by_odds": true
        }
      },
      {
        "pred_odds_topk": {
          "use": true,
          "top_k_frac": 0.2,
          "top_k_weight": 5,
          "weight_by_odds": false
        }
      },
      {
        "preds_correct_odds": {
          "use": true,
          "threshold": 3,
          "weight": 10,
          "weight_by_odds": false
        }
      },
      {
        "preds_correct_odds": {
          "use": true,
          "threshold": 3,
          "weight": null,
          "weight_by_odds": true
        }
      }
    ]
  },
  "model.degree_1.last_n": {
    "_type": "choice",
    "_value": [1, 3]
  },
  "model.degree_1.use_lhuc": {
    "_type": "choice",
    "_value": [true, false]
  },
  "optimizer": {
    "_type": "choice",
    "_value": [
      {
        "_name": "RAdam",
        "weight_decay": {
          "_type": "choice",
          "_value": [0.0001, 0.001]
        }
      },
      {
        "_name": "Ranger",
        "weight_decay": {
          "_type": "choice",
          "_value": [0.0001, 0.001]
        }
      }
    ]
  }
}

Log message:

nnimanager.log:

[2024-06-17 01:58:52] INFO (main) Start NNI manager
[2024-06-17 01:58:52] INFO (RestServer) Starting REST server at port 8888, URL prefix: "/"
[2024-06-17 01:58:52] INFO (RestServer) REST server started.
[2024-06-17 01:58:53] INFO (NNIDataStore) Datastore initialization done
[2024-06-17 01:58:54] INFO (NNIManager) Starting experiment: l2yv4dn1
[2024-06-17 01:58:54] INFO (NNIManager) Setup training service...
[2024-06-17 01:58:54] INFO (NNIManager) Setup tuner...
[2024-06-17 01:58:54] INFO (NNIManager) Change NNIManager status from: INITIALIZED to: RUNNING
[2024-06-17 01:58:54] INFO (NNIManager) Add event listeners
[2024-06-17 01:58:54] INFO (LocalV3.local) Start
[2024-06-17 01:58:54] INFO (NNIManager) NNIManager received command from dispatcher: ID, 
[2024-06-17 01:58:54] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "RAdam", "weight_decay": 0.0001}}, "parameter_index": 0}
[2024-06-17 01:58:55] INFO (NNIManager) submitTrialJob: form: {
  sequenceId: 0,
  hyperParameters: {
    value: '{"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "RAdam", "weight_decay": 0.0001}}, "parameter_index": 0}',
    index: 0
  },
  placementConstraint: { type: 'None', gpus: [] }
}
[2024-06-17 01:58:55] INFO (LocalV3.local) Register directory trial_code = /life_changer/experiments/ws_related/train
[2024-06-17 01:58:55] INFO (LocalV3.local) Created trial FOUTA
[2024-06-17 01:58:58] INFO (LocalV3.local) Trial parameter: FOUTA {"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "RAdam", "weight_decay": 0.0001}}, "parameter_index": 0}
[2024-06-17 02:00:08] INFO (NNIManager) Trial job FOUTA status changed from RUNNING to SUCCEEDED
[2024-06-17 02:00:09] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "RAdam", "weight_decay": 0.001}}, "parameter_index": 0}
[2024-06-17 02:00:09] INFO (NNIManager) submitTrialJob: form: {
  sequenceId: 1,
  hyperParameters: {
    value: '{"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "RAdam", "weight_decay": 0.001}}, "parameter_index": 0}',
    index: 0
  },
  placementConstraint: { type: 'None', gpus: [] }
}
[2024-06-17 02:00:09] INFO (LocalV3.local) Created trial sCcur
[2024-06-17 02:00:12] INFO (LocalV3.local) Trial parameter: sCcur {"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "RAdam", "weight_decay": 0.001}}, "parameter_index": 0}
[2024-06-17 02:01:20] INFO (NNIManager) Trial job sCcur status changed from RUNNING to SUCCEEDED
[2024-06-17 02:01:20] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 2, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "Ranger", "weight_decay": 0.0001}}, "parameter_index": 0}
[2024-06-17 02:01:20] INFO (NNIManager) submitTrialJob: form: {
  sequenceId: 2,
  hyperParameters: {
    value: '{"parameter_id": 2, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "Ranger", "weight_decay": 0.0001}}, "parameter_index": 0}',
    index: 0
  },
  placementConstraint: { type: 'None', gpus: [] }
}
[2024-06-17 02:01:20] INFO (LocalV3.local) Created trial y553t
[2024-06-17 02:01:23] INFO (LocalV3.local) Trial parameter: y553t {"parameter_id": 2, "parameter_source": "algorithm", "parameters": {"model.embedding.mark_unknown_as_padding": true, "model.embedding.init_weight_by_numerical_intensity": true, "model.cross_layer.sub_net_dims": 4, "model.cross_layer.score_fn": {"softplus": {"use": true, "beta": 5, "threshold": 100}}, "model.cross_layer.use_lhuc": true, "model.cross_layer.global_score_fn": true, "model.output_layer.moe.topk": null, "model.output_layer.moe.num_experts": 8, "model.output_layer.moe.experts_share_input": true, "model.output_layer.moe.gating_input": "odds_team_ha_emb", "model.loss": {"focal": {"use": true, "gamma": 2}}, "model.degree_1.last_n": 1, "model.degree_1.use_lhuc": true, "optimizer": {"_name": "Ranger", "weight_decay": 0.0001}}, "parameter_index": 0}

dispatcher.log:

[2024-06-17 01:58:54] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher started
[2024-06-17 01:58:54] INFO (nni.runtime.msg_dispatcher/Thread-1 (command_queue_worker)) Initial search space: {'model.embedding.mark_unknown_as_padding': {'_type': 'choice', '_value': [True, False]}, 'model.embedding.init_weight_by_numerical_intensity': {'_type': 'choice', '_value': [True, False]}, 'model.cross_layer.sub_net_dims': {'_type': 'choice', '_value': [4, 16]}, 'model.cross_layer.score_fn': {'_type': 'choice', '_value': [{'softplus': {'use': True, 'beta': 5, 'threshold': 100}}, {'softmax': {'use': True}}, {'silu': {'use': True}}]}, 'model.cross_layer.use_lhuc': {'_type': 'choice', '_value': [True, False]}, 'model.cross_layer.global_score_fn': {'_type': 'choice', '_value': [True, False]}, 'model.output_layer.moe.topk': {'_type': 'choice', '_value': [None, 2]}, 'model.output_layer.moe.num_experts': {'_type': 'choice', '_value': [8, 32]}, 'model.output_layer.moe.experts_share_input': {'_type': 'choice', '_value': [True, False]}, 'model.output_layer.moe.gating_input': {'_type': 'choice', '_value': ['odds_team_ha_emb', 'same_as_expert']}, 'model.loss': {'_type': 'choice', '_value': [{'focal': {'use': True, 'gamma': 2}}, {'ghmc': {'use': True, 'bins': 10}}, {'wdl_prob_rank': {'use': True, 'top_k_frac': 0.1, 'top_k_weight"': 10}}, {'odds_weighted': {'use': True, 'pow_of_n': 2}}, {'pred_odds_topk': {'use': True, 'top_k_frac': 0.1, 'top_k_weight': 10, 'weight_by_odds': False}}, {'pred_odds_topk': {'use': True, 'top_k_frac': 0.1, 'top_k_weight': None, 'weight_by_odds': True}}, {'pred_odds_topk': {'use': True, 'top_k_frac': 0.2, 'top_k_weight': 5, 'weight_by_odds': False}}, {'preds_correct_odds': {'use': True, 'threshold': 3, 'weight': 10, 'weight_by_odds': False}}, {'preds_correct_odds': {'use': True, 'threshold': 3, 'weight': None, 'weight_by_odds': True}}]}, 'model.degree_1.last_n': {'_type': 'choice', '_value': [1, 3]}, 'model.degree_1.use_lhuc': {'_type': 'choice', '_value': [True, False]}, 'optimizer': {'_type': 'choice', '_value': [{'_name': 'RAdam', 'weight_decay': {'_type': 'choice', '_value': [0.0001, 0.001]}}, {'_name': 'Ranger', 'weight_decay': {'_type': 'choice', '_value': [0.0001, 0.001]}}]}}
[2024-06-17 01:58:54] INFO (nni.tuner.gridsearch/Thread-1 (command_queue_worker)) Grid initialized, size: (2×2×2×3×2×2×2×2×2×2×9×2×2×2×2×2) = 442368

nnictl stdout and stderr:

--------------------------------------------------------------------------------
Experiment l2yv4dn1 start: 2024-06-17 01:58:52.009267
--------------------------------------------------------------------------------

How to reproduce it?:
Maybe hard to reproduce because the exact env is complicated on my machine.

The text was updated successfully, but these errors were encountered:

Interfish · 2024-06-17T15:32:24Z

After experiments of my own, I found that the root cause is "null" in my exp config yaml file. If I replace them for some values like "True" or "1", then details page can display normally. Since this repo is no longer maintained, just for record for those who is stilling use nni.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trial details page empty #5793

Trial details page empty #5793

Interfish commented Jun 17, 2024

Interfish commented Jun 17, 2024

Trial details page empty #5793

Trial details page empty #5793

Comments

Interfish commented Jun 17, 2024

Interfish commented Jun 17, 2024