/data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/onnx/utils.py:609: UserWarning: ONNX Preprocess - Removing mutation from node aten::masked_fill_ on block input: 'src_key_padding_mask'. This changes graph semantics. (Triggered internally at /home/runner/.termux-build/python-torch/src/torch/csrc/jit/passes/onnx/remove_inplace_ops_for_onnx.cpp:353.)
  _C._jit_pass_onnx_remove_inplace_ops_for_onnx(graph, module)
Torch IR graph at exception: graph(%traj_features : Float(*, 150, 6, strides=[900, 6, 1], requires_grad=0, device=cpu),
      %input.1 : Long(*, 150, 3, strides=[450, 3, 1], requires_grad=0, device=cpu),
      %src_key_padding_mask : Bool(*, 150, strides=[150, 1], requires_grad=0, device=cpu),
      %model.pe : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=0, device=cpu),
      %model.traj_proj.weight : Float(128, 6, strides=[6, 1], requires_grad=1, device=cpu),
      %model.traj_proj.bias : Float(128, strides=[1], requires_grad=1, device=cpu),
      %model.kb_embedding.weight : Float(30, 128, strides=[128, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.0.self_attn.in_proj_weight : Float(768, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.0.self_attn.in_proj_bias : Float(768, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.0.self_attn.out_proj.weight : Float(256, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.0.self_attn.out_proj.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.0.linear1.weight : Float(1024, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.0.linear1.bias : Float(1024, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.0.linear2.weight : Float(256, 1024, strides=[1024, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.0.linear2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.0.norm1.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.0.norm1.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.0.norm2.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.0.norm2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.1.self_attn.in_proj_weight : Float(768, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.1.self_attn.in_proj_bias : Float(768, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.1.self_attn.out_proj.weight : Float(256, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.1.self_attn.out_proj.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.1.linear1.weight : Float(1024, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.1.linear1.bias : Float(1024, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.1.linear2.weight : Float(256, 1024, strides=[1024, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.1.linear2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.1.norm1.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.1.norm1.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.1.norm2.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.1.norm2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.2.self_attn.in_proj_weight : Float(768, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.2.self_attn.in_proj_bias : Float(768, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.2.self_attn.out_proj.weight : Float(256, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.2.self_attn.out_proj.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.2.linear1.weight : Float(1024, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.2.linear1.bias : Float(1024, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.2.linear2.weight : Float(256, 1024, strides=[1024, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.2.linear2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.2.norm1.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.2.norm1.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.2.norm2.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.2.norm2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.3.self_attn.in_proj_weight : Float(768, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.3.self_attn.in_proj_bias : Float(768, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.3.self_attn.out_proj.weight : Float(256, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.3.self_attn.out_proj.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.3.linear1.weight : Float(1024, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.3.linear1.bias : Float(1024, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.3.linear2.weight : Float(256, 1024, strides=[1024, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.3.linear2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.3.norm1.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.3.norm1.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.3.norm2.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.3.norm2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.4.self_attn.in_proj_weight : Float(768, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.4.self_attn.in_proj_bias : Float(768, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.4.self_attn.out_proj.weight : Float(256, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.4.self_attn.out_proj.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.4.linear1.weight : Float(1024, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.4.linear1.bias : Float(1024, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.4.linear2.weight : Float(256, 1024, strides=[1024, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.4.linear2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.4.norm1.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.4.norm1.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.4.norm2.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.4.norm2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.5.self_attn.in_proj_weight : Float(768, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.5.self_attn.in_proj_bias : Float(768, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.5.self_attn.out_proj.weight : Float(256, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.5.self_attn.out_proj.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.5.linear1.weight : Float(1024, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.5.linear1.bias : Float(1024, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.5.linear2.weight : Float(256, 1024, strides=[1024, 1], requires_grad=1, device=cpu),
      %model.encoder.layers.5.linear2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.5.norm1.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.5.norm1.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.5.norm2.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder.layers.5.norm2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder_norm.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.encoder_norm.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.char_embedding.weight : Float(30, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.self_attn.in_proj_weight : Float(768, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.self_attn.in_proj_bias : Float(768, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.self_attn.out_proj.weight : Float(256, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.self_attn.out_proj.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.multihead_attn.in_proj_weight : Float(768, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.multihead_attn.in_proj_bias : Float(768, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.multihead_attn.out_proj.weight : Float(256, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.multihead_attn.out_proj.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.linear1.weight : Float(1024, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.linear1.bias : Float(1024, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.linear2.weight : Float(256, 1024, strides=[1024, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.linear2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.norm1.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.norm1.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.norm2.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.norm2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.norm3.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.0.norm3.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.self_attn.in_proj_weight : Float(768, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.self_attn.in_proj_bias : Float(768, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.self_attn.out_proj.weight : Float(256, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.self_attn.out_proj.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.multihead_attn.in_proj_weight : Float(768, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.multihead_attn.in_proj_bias : Float(768, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.multihead_attn.out_proj.weight : Float(256, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.multihead_attn.out_proj.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.linear1.weight : Float(1024, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.linear1.bias : Float(1024, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.linear2.weight : Float(256, 1024, strides=[1024, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.linear2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.norm1.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.norm1.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.norm2.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.norm2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.norm3.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.1.norm3.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.self_attn.in_proj_weight : Float(768, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.self_attn.in_proj_bias : Float(768, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.self_attn.out_proj.weight : Float(256, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.self_attn.out_proj.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.multihead_attn.in_proj_weight : Float(768, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.multihead_attn.in_proj_bias : Float(768, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.multihead_attn.out_proj.weight : Float(256, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.multihead_attn.out_proj.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.linear1.weight : Float(1024, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.linear1.bias : Float(1024, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.linear2.weight : Float(256, 1024, strides=[1024, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.linear2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.norm1.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.norm1.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.norm2.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.norm2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.norm3.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.2.norm3.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.self_attn.in_proj_weight : Float(768, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.self_attn.in_proj_bias : Float(768, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.self_attn.out_proj.weight : Float(256, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.self_attn.out_proj.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.multihead_attn.in_proj_weight : Float(768, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.multihead_attn.in_proj_bias : Float(768, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.multihead_attn.out_proj.weight : Float(256, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.multihead_attn.out_proj.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.linear1.weight : Float(1024, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.linear1.bias : Float(1024, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.linear2.weight : Float(256, 1024, strides=[1024, 1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.linear2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.norm1.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.norm1.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.norm2.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.norm2.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.norm3.weight : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.decoder.layers.3.norm3.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %model.output_proj.weight : Float(30, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %model.output_proj.bias : Float(30, strides=[1], requires_grad=1, device=cpu)):
  %7470 : NoneType = prim::Constant()
  %7471 : Bool(1, 150, strides=[150, 1], requires_grad=0, device=cpu) = aten::clone(%src_key_padding_mask, %7470), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5967:0
  %7473 : Long(device=cpu) = prim::Constant[value={1}](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::
  %5939 : Long(device=cpu) = aten::size(%traj_features, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper:: # /data/data/com.termux/files/home/git/swype/cleverkeys/model/export_onnx_3d.py:164:0
  %5945 : Float(1, 150, 128, strides=[19200, 128, 1], requires_grad=1, device=cpu) = aten::linear(%traj_features, %model.traj_proj.weight, %model.traj_proj.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.linear.Linear::traj_proj # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/linear.py:125:0
  %7474 : Double(device=cpu) = prim::Constant[value={16}]()
  %7457 : Float(1, 150, 128, strides=[19200, 128, 1], requires_grad=1, device=cpu) = aten::mul(%5945, %7474), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper:: # /data/data/com.termux/files/home/git/swype/cleverkeys/model/export_onnx_3d.py:167:0
  %7475 : Long(device=cpu) = prim::Constant[value={-1}](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.sparse.Embedding::kb_embedding
  %7476 : Bool(device=cpu) = prim::Constant[value={0}](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.sparse.Embedding::kb_embedding
  %5951 : Float(1, 150, 3, 128, strides=[57600, 384, 128, 1], requires_grad=1, device=cpu) = aten::embedding(%model.kb_embedding.weight, %input.1, %7475, %7476, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.sparse.Embedding::kb_embedding # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:2551:0
  %7430 : int[] = prim::Constant[value=[2]]()
  %5955 : NoneType = prim::Constant(), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::
  %5956 : Float(1, 150, 128, strides=[19200, 128, 1], requires_grad=1, device=cpu) = aten::mean(%5951, %7430, %7476, %5955), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper:: # /data/data/com.termux/files/home/git/swype/cleverkeys/model/export_onnx_3d.py:172:0
  %5957 : Tensor[] = prim::ListConstruct(%7457, %5956), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::
  %5959 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::cat(%5957, %7475), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper:: # /data/data/com.termux/files/home/git/swype/cleverkeys/model/export_onnx_3d.py:175:0
  %7477 : Long(device=cpu) = prim::Constant[value={0}](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::
  %7478 : Long(device=cpu) = prim::Constant[value={9223372036854775807}](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::
  %5964 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=0, device=cpu) = aten::slice(%model.pe, %7477, %7477, %7478, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper:: # /data/data/com.termux/files/home/git/swype/cleverkeys/model/export_onnx_3d.py:178:0
  %5968 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=0, device=cpu) = aten::slice(%5964, %7473, %7477, %5939, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper:: # /data/data/com.termux/files/home/git/swype/cleverkeys/model/export_onnx_3d.py:178:0
  %7479 : Long(device=cpu) = prim::Constant[value={2}](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::
  %5973 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=0, device=cpu) = aten::slice(%5968, %7479, %7477, %7478, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper:: # /data/data/com.termux/files/home/git/swype/cleverkeys/model/export_onnx_3d.py:178:0
  %src : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::add(%5959, %5973, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper:: # /data/data/com.termux/files/home/git/swype/cleverkeys/model/export_onnx_3d.py:178:0
  %7480 : Long(device=cpu) = prim::Constant[value={6}](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder
  %7481 : Double(device=cpu) = prim::Constant[value={-inf}](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder
  %5981 : Float(1, 150, strides=[150, 1], requires_grad=0, device=cpu) = aten::zeros_like(%src_key_padding_mask, %7480, %5955, %5955, %7476, %5955), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5967:0
  %7472 : Float(1, 150, strides=[150, 1], requires_grad=0, device=cpu) = aten::masked_fill(%5981, %7471, %7481), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5967:0
  %query.1 : Float(150, 1, 256, strides=[256, 38400, 1], requires_grad=1, device=cpu) = aten::transpose(%src, %7473, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/activation.py:1339:0
  %6031 : Long(device=cpu) = aten::size(%query.1, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %6037 : Long(device=cpu) = aten::size(%query.1, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %6046 : Long(device=cpu) = aten::size(%query.1, %7479), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %6060 : Long(requires_grad=0, device=cpu) = prim::Constant[value={8}](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6207:0
  %6061 : str = prim::Constant[value="trunc"](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6207:0
  %6062 : Long(requires_grad=0, device=cpu) = aten::div(%6046, %6060, %6061), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6207:0
  %6094 : Long(device=cpu) = aten::size(%query.1, %7475), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5610:0
  %proj.1 : Float(150, 1, 768, strides=[768, 768, 1], requires_grad=1, device=cpu) = aten::linear(%query.1, %model.encoder.layers.0.self_attn.in_proj_weight, %model.encoder.layers.0.self_attn.in_proj_bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5614:0
  %7482 : Long(device=cpu) = prim::Constant[value={3}](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6100 : int[] = prim::ListConstruct(%7482, %6094), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6101 : Float(150, 1, 3, 256, strides=[768, 768, 256, 1], requires_grad=1, device=cpu) = aten::unflatten(%proj.1, %7475, %6100), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/_tensor.py:1421:0
  %6103 : Float(1, 150, 1, 3, 256, strides=[115200, 768, 768, 256, 1], requires_grad=1, device=cpu) = aten::unsqueeze(%6101, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5618:0
  %7483 : Long(device=cpu) = prim::Constant[value={-2}](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6106 : Float(3, 150, 1, 1, 256, strides=[256, 768, 768, 115200, 1], requires_grad=1, device=cpu) = aten::transpose(%6103, %7477, %7483), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5619:0
  %6108 : Float(3, 150, 1, 256, strides=[256, 768, 768, 1], requires_grad=1, device=cpu) = aten::squeeze(%6106, %7483), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5620:0
  %6110 : Float(3, 150, 1, 256, strides=[38400, 256, 256, 1], requires_grad=1, device=cpu) = aten::contiguous(%6108, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5621:0
  %6113 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%6110, %7477, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %6116 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%6110, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %6119 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%6110, %7477, %7479), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %7484 : Long(device=cpu) = prim::Constant[value={8}]()
  %7459 : Long(requires_grad=0, device=cpu) = aten::mul(%6037, %7484), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %6123 : int[] = prim::ListConstruct(%6031, %7459, %6062), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6124 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%6113, %6123), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %6127 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6124, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %6129 : Long(device=cpu) = aten::size(%6116, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %6141 : int[] = prim::ListConstruct(%6129, %7459, %6062), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6142 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%6116, %6141), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %6145 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6142, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %6147 : Long(device=cpu) = aten::size(%6119, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %6159 : int[] = prim::ListConstruct(%6147, %7459, %6062), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6160 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%6119, %6159), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %6163 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6160, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %6165 : Long(device=cpu) = aten::size(%6145, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6335:0
  %6174 : int[] = prim::ListConstruct(%6037, %7473, %7473, %6165), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6175 : Float(1, 1, 1, 150, strides=[150, 150, 150, 1], requires_grad=0, device=cpu) = aten::view(%7472, %6174), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6343:0
  %7431 : int[] = prim::Constant[value=[-1, 8, -1, -1]]()
  %6182 : Float(1, 8, 1, 150, strides=[150, 0, 150, 1], requires_grad=0, device=cpu) = aten::expand(%6175, %7431, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6344:0
  %6187 : int[] = prim::ListConstruct(%7459, %7473, %6165), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6188 : Float(8, 1, 150, strides=[0, 150, 1], requires_grad=0, device=cpu) = aten::reshape(%6182, %6187), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6345:0
  %7485 : Long(device=cpu) = prim::Constant[value={8}](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6196 : int[] = prim::ListConstruct(%6037, %7485, %7475, %6165), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6197 : Float(1, 8, 1, 150, strides=[0, 0, 150, 1], requires_grad=0, device=cpu) = aten::view(%6188, %6196), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6404:0
  %6199 : int[] = prim::ListConstruct(%6037, %7485, %6031, %6062), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6200 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%6127, %6199), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6406:0
  %6202 : int[] = prim::ListConstruct(%6037, %7485, %6165, %6062), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6203 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%6145, %6202), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6407:0
  %6206 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%6163, %6202), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6408:0
  %7486 : Double(device=cpu) = prim::Constant[value={0}](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6211 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::scaled_dot_product_attention(%6200, %6203, %6206, %6197, %7486, %7476, %5955, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6410:0
  %7432 : int[] = prim::Constant[value=[2, 0, 1, 3]]()
  %6217 : Float(150, 1, 8, 32, strides=[256, 256, 32, 1], requires_grad=1, device=cpu) = aten::permute(%6211, %7432), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6219 : Float(150, 1, 8, 32, strides=[256, 256, 32, 1], requires_grad=1, device=cpu) = aten::contiguous(%6217, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6220 : Long(requires_grad=0, device=cpu) = aten::mul(%6037, %6031), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6222 : int[] = prim::ListConstruct(%6220, %6046), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6223 : Float(150, 256, strides=[256, 1], requires_grad=1, device=cpu) = aten::view(%6219, %6222), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6224 : Float(150, 256, strides=[256, 1], requires_grad=1, device=cpu) = aten::linear(%6223, %model.encoder.layers.0.self_attn.out_proj.weight, %model.encoder.layers.0.self_attn.out_proj.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6417:0
  %6226 : Long(device=cpu) = aten::size(%6224, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6418:0
  %6229 : int[] = prim::ListConstruct(%6031, %6037, %6226), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6230 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::view(%6224, %6229), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6418:0
  %input.3 : Float(1, 150, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6230, %7473, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/activation.py:1395:0
  %6236 : Float(1, 150, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::dropout(%input.3, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.dropout.Dropout::dropout1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.5 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::add(%src, %6236, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/transformer.py:919:0
  %7433 : int[] = prim::Constant[value=[256]]()
  %7487 : Double(device=cpu) = prim::Constant[value={1e-05}](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.normalization.LayerNorm::norm1
  %7488 : Bool(device=cpu) = prim::Constant[value={1}](), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.normalization.LayerNorm::norm1
  %6243 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::layer_norm(%input.5, %7433, %model.encoder.layers.0.norm1.weight, %model.encoder.layers.0.norm1.bias, %7487, %7488), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.normalization.LayerNorm::norm1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:2910:0
  %6244 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::linear(%6243, %model.encoder.layers.0.linear1.weight, %model.encoder.layers.0.linear1.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.linear.Linear::linear1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/linear.py:125:0
  %input.7 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::relu(%6244), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1704:0
  %6248 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::dropout(%input.7, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.dropout.Dropout::dropout # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.9 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::linear(%6248, %model.encoder.layers.0.linear2.weight, %model.encoder.layers.0.linear2.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.linear.Linear::linear2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/linear.py:125:0
  %6252 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::dropout(%input.9, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.dropout.Dropout::dropout2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.11 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::add(%6243, %6252, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/transformer.py:922:0
  %6259 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::layer_norm(%input.11, %7433, %model.encoder.layers.0.norm2.weight, %model.encoder.layers.0.norm2.bias, %7487, %7488), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.0/torch.nn.modules.normalization.LayerNorm::norm2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:2910:0
  %query.3 : Float(150, 1, 256, strides=[256, 38400, 1], requires_grad=1, device=cpu) = aten::transpose(%6259, %7473, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/activation.py:1339:0
  %6264 : Long(device=cpu) = aten::size(%query.3, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %6270 : Long(device=cpu) = aten::size(%query.3, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %6279 : Long(device=cpu) = aten::size(%query.3, %7479), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %6295 : Long(requires_grad=0, device=cpu) = aten::div(%6279, %6060, %6061), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6207:0
  %6327 : Long(device=cpu) = aten::size(%query.3, %7475), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5610:0
  %proj.3 : Float(150, 1, 768, strides=[768, 768, 1], requires_grad=1, device=cpu) = aten::linear(%query.3, %model.encoder.layers.1.self_attn.in_proj_weight, %model.encoder.layers.1.self_attn.in_proj_bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5614:0
  %6333 : int[] = prim::ListConstruct(%7482, %6327), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6334 : Float(150, 1, 3, 256, strides=[768, 768, 256, 1], requires_grad=1, device=cpu) = aten::unflatten(%proj.3, %7475, %6333), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/_tensor.py:1421:0
  %6336 : Float(1, 150, 1, 3, 256, strides=[115200, 768, 768, 256, 1], requires_grad=1, device=cpu) = aten::unsqueeze(%6334, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5618:0
  %6339 : Float(3, 150, 1, 1, 256, strides=[256, 768, 768, 115200, 1], requires_grad=1, device=cpu) = aten::transpose(%6336, %7477, %7483), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5619:0
  %6341 : Float(3, 150, 1, 256, strides=[256, 768, 768, 1], requires_grad=1, device=cpu) = aten::squeeze(%6339, %7483), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5620:0
  %6343 : Float(3, 150, 1, 256, strides=[38400, 256, 256, 1], requires_grad=1, device=cpu) = aten::contiguous(%6341, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5621:0
  %6346 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%6343, %7477, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %6349 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%6343, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %6352 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%6343, %7477, %7479), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %7489 : Long(device=cpu) = prim::Constant[value={8}]()
  %7461 : Long(requires_grad=0, device=cpu) = aten::mul(%6270, %7489), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %6356 : int[] = prim::ListConstruct(%6264, %7461, %6295), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6357 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%6346, %6356), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %6360 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6357, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %6362 : Long(device=cpu) = aten::size(%6349, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %6374 : int[] = prim::ListConstruct(%6362, %7461, %6295), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6375 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%6349, %6374), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %6378 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6375, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %6380 : Long(device=cpu) = aten::size(%6352, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %6392 : int[] = prim::ListConstruct(%6380, %7461, %6295), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6393 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%6352, %6392), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %6396 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6393, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %6398 : Long(device=cpu) = aten::size(%6378, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6335:0
  %6407 : int[] = prim::ListConstruct(%6270, %7473, %7473, %6398), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6408 : Float(1, 1, 1, 150, strides=[150, 150, 150, 1], requires_grad=0, device=cpu) = aten::view(%7472, %6407), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6343:0
  %6415 : Float(1, 8, 1, 150, strides=[150, 0, 150, 1], requires_grad=0, device=cpu) = aten::expand(%6408, %7431, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6344:0
  %6420 : int[] = prim::ListConstruct(%7461, %7473, %6398), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6421 : Float(8, 1, 150, strides=[0, 150, 1], requires_grad=0, device=cpu) = aten::reshape(%6415, %6420), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6345:0
  %6429 : int[] = prim::ListConstruct(%6270, %7485, %7475, %6398), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6430 : Float(1, 8, 1, 150, strides=[0, 0, 150, 1], requires_grad=0, device=cpu) = aten::view(%6421, %6429), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6404:0
  %6432 : int[] = prim::ListConstruct(%6270, %7485, %6264, %6295), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6433 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%6360, %6432), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6406:0
  %6435 : int[] = prim::ListConstruct(%6270, %7485, %6398, %6295), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6436 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%6378, %6435), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6407:0
  %6439 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%6396, %6435), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6408:0
  %6444 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::scaled_dot_product_attention(%6433, %6436, %6439, %6430, %7486, %7476, %5955, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6410:0
  %6450 : Float(150, 1, 8, 32, strides=[256, 256, 32, 1], requires_grad=1, device=cpu) = aten::permute(%6444, %7432), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6452 : Float(150, 1, 8, 32, strides=[256, 256, 32, 1], requires_grad=1, device=cpu) = aten::contiguous(%6450, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6453 : Long(requires_grad=0, device=cpu) = aten::mul(%6270, %6264), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6455 : int[] = prim::ListConstruct(%6453, %6279), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6456 : Float(150, 256, strides=[256, 1], requires_grad=1, device=cpu) = aten::view(%6452, %6455), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6457 : Float(150, 256, strides=[256, 1], requires_grad=1, device=cpu) = aten::linear(%6456, %model.encoder.layers.1.self_attn.out_proj.weight, %model.encoder.layers.1.self_attn.out_proj.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6417:0
  %6459 : Long(device=cpu) = aten::size(%6457, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6418:0
  %6462 : int[] = prim::ListConstruct(%6264, %6270, %6459), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6463 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::view(%6457, %6462), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6418:0
  %input.13 : Float(1, 150, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6463, %7473, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/activation.py:1395:0
  %6469 : Float(1, 150, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::dropout(%input.13, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.dropout.Dropout::dropout1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.15 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::add(%6259, %6469, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/transformer.py:919:0
  %6476 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::layer_norm(%input.15, %7433, %model.encoder.layers.1.norm1.weight, %model.encoder.layers.1.norm1.bias, %7487, %7488), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.normalization.LayerNorm::norm1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:2910:0
  %6477 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::linear(%6476, %model.encoder.layers.1.linear1.weight, %model.encoder.layers.1.linear1.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.linear.Linear::linear1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/linear.py:125:0
  %input.17 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::relu(%6477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1704:0
  %6481 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::dropout(%input.17, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.dropout.Dropout::dropout # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.19 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::linear(%6481, %model.encoder.layers.1.linear2.weight, %model.encoder.layers.1.linear2.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.linear.Linear::linear2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/linear.py:125:0
  %6485 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::dropout(%input.19, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.dropout.Dropout::dropout2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.21 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::add(%6476, %6485, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/transformer.py:922:0
  %6492 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::layer_norm(%input.21, %7433, %model.encoder.layers.1.norm2.weight, %model.encoder.layers.1.norm2.bias, %7487, %7488), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.1/torch.nn.modules.normalization.LayerNorm::norm2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:2910:0
  %query.5 : Float(150, 1, 256, strides=[256, 38400, 1], requires_grad=1, device=cpu) = aten::transpose(%6492, %7473, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/activation.py:1339:0
  %6497 : Long(device=cpu) = aten::size(%query.5, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %6503 : Long(device=cpu) = aten::size(%query.5, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %6512 : Long(device=cpu) = aten::size(%query.5, %7479), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %6528 : Long(requires_grad=0, device=cpu) = aten::div(%6512, %6060, %6061), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6207:0
  %6560 : Long(device=cpu) = aten::size(%query.5, %7475), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5610:0
  %proj.5 : Float(150, 1, 768, strides=[768, 768, 1], requires_grad=1, device=cpu) = aten::linear(%query.5, %model.encoder.layers.2.self_attn.in_proj_weight, %model.encoder.layers.2.self_attn.in_proj_bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5614:0
  %6566 : int[] = prim::ListConstruct(%7482, %6560), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6567 : Float(150, 1, 3, 256, strides=[768, 768, 256, 1], requires_grad=1, device=cpu) = aten::unflatten(%proj.5, %7475, %6566), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/_tensor.py:1421:0
  %6569 : Float(1, 150, 1, 3, 256, strides=[115200, 768, 768, 256, 1], requires_grad=1, device=cpu) = aten::unsqueeze(%6567, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5618:0
  %6572 : Float(3, 150, 1, 1, 256, strides=[256, 768, 768, 115200, 1], requires_grad=1, device=cpu) = aten::transpose(%6569, %7477, %7483), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5619:0
  %6574 : Float(3, 150, 1, 256, strides=[256, 768, 768, 1], requires_grad=1, device=cpu) = aten::squeeze(%6572, %7483), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5620:0
  %6576 : Float(3, 150, 1, 256, strides=[38400, 256, 256, 1], requires_grad=1, device=cpu) = aten::contiguous(%6574, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5621:0
  %6579 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%6576, %7477, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %6582 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%6576, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %6585 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%6576, %7477, %7479), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %7490 : Long(device=cpu) = prim::Constant[value={8}]()
  %7463 : Long(requires_grad=0, device=cpu) = aten::mul(%6503, %7490), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %6589 : int[] = prim::ListConstruct(%6497, %7463, %6528), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6590 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%6579, %6589), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %6593 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6590, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %6595 : Long(device=cpu) = aten::size(%6582, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %6607 : int[] = prim::ListConstruct(%6595, %7463, %6528), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6608 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%6582, %6607), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %6611 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6608, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %6613 : Long(device=cpu) = aten::size(%6585, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %6625 : int[] = prim::ListConstruct(%6613, %7463, %6528), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6626 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%6585, %6625), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %6629 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6626, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %6631 : Long(device=cpu) = aten::size(%6611, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6335:0
  %6640 : int[] = prim::ListConstruct(%6503, %7473, %7473, %6631), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6641 : Float(1, 1, 1, 150, strides=[150, 150, 150, 1], requires_grad=0, device=cpu) = aten::view(%7472, %6640), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6343:0
  %6648 : Float(1, 8, 1, 150, strides=[150, 0, 150, 1], requires_grad=0, device=cpu) = aten::expand(%6641, %7431, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6344:0
  %6653 : int[] = prim::ListConstruct(%7463, %7473, %6631), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6654 : Float(8, 1, 150, strides=[0, 150, 1], requires_grad=0, device=cpu) = aten::reshape(%6648, %6653), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6345:0
  %6662 : int[] = prim::ListConstruct(%6503, %7485, %7475, %6631), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6663 : Float(1, 8, 1, 150, strides=[0, 0, 150, 1], requires_grad=0, device=cpu) = aten::view(%6654, %6662), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6404:0
  %6665 : int[] = prim::ListConstruct(%6503, %7485, %6497, %6528), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6666 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%6593, %6665), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6406:0
  %6668 : int[] = prim::ListConstruct(%6503, %7485, %6631, %6528), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6669 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%6611, %6668), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6407:0
  %6672 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%6629, %6668), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6408:0
  %6677 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::scaled_dot_product_attention(%6666, %6669, %6672, %6663, %7486, %7476, %5955, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6410:0
  %6683 : Float(150, 1, 8, 32, strides=[256, 256, 32, 1], requires_grad=1, device=cpu) = aten::permute(%6677, %7432), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6685 : Float(150, 1, 8, 32, strides=[256, 256, 32, 1], requires_grad=1, device=cpu) = aten::contiguous(%6683, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6686 : Long(requires_grad=0, device=cpu) = aten::mul(%6503, %6497), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6688 : int[] = prim::ListConstruct(%6686, %6512), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6689 : Float(150, 256, strides=[256, 1], requires_grad=1, device=cpu) = aten::view(%6685, %6688), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6690 : Float(150, 256, strides=[256, 1], requires_grad=1, device=cpu) = aten::linear(%6689, %model.encoder.layers.2.self_attn.out_proj.weight, %model.encoder.layers.2.self_attn.out_proj.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6417:0
  %6692 : Long(device=cpu) = aten::size(%6690, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6418:0
  %6695 : int[] = prim::ListConstruct(%6497, %6503, %6692), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6696 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::view(%6690, %6695), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6418:0
  %input.23 : Float(1, 150, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6696, %7473, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/activation.py:1395:0
  %6702 : Float(1, 150, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::dropout(%input.23, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.dropout.Dropout::dropout1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.25 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::add(%6492, %6702, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/transformer.py:919:0
  %6709 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::layer_norm(%input.25, %7433, %model.encoder.layers.2.norm1.weight, %model.encoder.layers.2.norm1.bias, %7487, %7488), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.normalization.LayerNorm::norm1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:2910:0
  %6710 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::linear(%6709, %model.encoder.layers.2.linear1.weight, %model.encoder.layers.2.linear1.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.linear.Linear::linear1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/linear.py:125:0
  %input.27 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::relu(%6710), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1704:0
  %6714 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::dropout(%input.27, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.dropout.Dropout::dropout # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.29 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::linear(%6714, %model.encoder.layers.2.linear2.weight, %model.encoder.layers.2.linear2.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.linear.Linear::linear2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/linear.py:125:0
  %6718 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::dropout(%input.29, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.dropout.Dropout::dropout2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.31 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::add(%6709, %6718, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/transformer.py:922:0
  %6725 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::layer_norm(%input.31, %7433, %model.encoder.layers.2.norm2.weight, %model.encoder.layers.2.norm2.bias, %7487, %7488), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.2/torch.nn.modules.normalization.LayerNorm::norm2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:2910:0
  %query.7 : Float(150, 1, 256, strides=[256, 38400, 1], requires_grad=1, device=cpu) = aten::transpose(%6725, %7473, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/activation.py:1339:0
  %6730 : Long(device=cpu) = aten::size(%query.7, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %6736 : Long(device=cpu) = aten::size(%query.7, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %6745 : Long(device=cpu) = aten::size(%query.7, %7479), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %6761 : Long(requires_grad=0, device=cpu) = aten::div(%6745, %6060, %6061), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6207:0
  %6793 : Long(device=cpu) = aten::size(%query.7, %7475), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5610:0
  %proj.7 : Float(150, 1, 768, strides=[768, 768, 1], requires_grad=1, device=cpu) = aten::linear(%query.7, %model.encoder.layers.3.self_attn.in_proj_weight, %model.encoder.layers.3.self_attn.in_proj_bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5614:0
  %6799 : int[] = prim::ListConstruct(%7482, %6793), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6800 : Float(150, 1, 3, 256, strides=[768, 768, 256, 1], requires_grad=1, device=cpu) = aten::unflatten(%proj.7, %7475, %6799), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/_tensor.py:1421:0
  %6802 : Float(1, 150, 1, 3, 256, strides=[115200, 768, 768, 256, 1], requires_grad=1, device=cpu) = aten::unsqueeze(%6800, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5618:0
  %6805 : Float(3, 150, 1, 1, 256, strides=[256, 768, 768, 115200, 1], requires_grad=1, device=cpu) = aten::transpose(%6802, %7477, %7483), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5619:0
  %6807 : Float(3, 150, 1, 256, strides=[256, 768, 768, 1], requires_grad=1, device=cpu) = aten::squeeze(%6805, %7483), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5620:0
  %6809 : Float(3, 150, 1, 256, strides=[38400, 256, 256, 1], requires_grad=1, device=cpu) = aten::contiguous(%6807, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5621:0
  %6812 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%6809, %7477, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %6815 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%6809, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %6818 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%6809, %7477, %7479), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %7491 : Long(device=cpu) = prim::Constant[value={8}]()
  %7465 : Long(requires_grad=0, device=cpu) = aten::mul(%6736, %7491), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %6822 : int[] = prim::ListConstruct(%6730, %7465, %6761), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6823 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%6812, %6822), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %6826 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6823, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %6828 : Long(device=cpu) = aten::size(%6815, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %6840 : int[] = prim::ListConstruct(%6828, %7465, %6761), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6841 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%6815, %6840), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %6844 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6841, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %6846 : Long(device=cpu) = aten::size(%6818, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %6858 : int[] = prim::ListConstruct(%6846, %7465, %6761), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6859 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%6818, %6858), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %6862 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6859, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %6864 : Long(device=cpu) = aten::size(%6844, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6335:0
  %6873 : int[] = prim::ListConstruct(%6736, %7473, %7473, %6864), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6874 : Float(1, 1, 1, 150, strides=[150, 150, 150, 1], requires_grad=0, device=cpu) = aten::view(%7472, %6873), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6343:0
  %6881 : Float(1, 8, 1, 150, strides=[150, 0, 150, 1], requires_grad=0, device=cpu) = aten::expand(%6874, %7431, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6344:0
  %6886 : int[] = prim::ListConstruct(%7465, %7473, %6864), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6887 : Float(8, 1, 150, strides=[0, 150, 1], requires_grad=0, device=cpu) = aten::reshape(%6881, %6886), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6345:0
  %6895 : int[] = prim::ListConstruct(%6736, %7485, %7475, %6864), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6896 : Float(1, 8, 1, 150, strides=[0, 0, 150, 1], requires_grad=0, device=cpu) = aten::view(%6887, %6895), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6404:0
  %6898 : int[] = prim::ListConstruct(%6736, %7485, %6730, %6761), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6899 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%6826, %6898), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6406:0
  %6901 : int[] = prim::ListConstruct(%6736, %7485, %6864, %6761), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6902 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%6844, %6901), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6407:0
  %6905 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%6862, %6901), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6408:0
  %6910 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::scaled_dot_product_attention(%6899, %6902, %6905, %6896, %7486, %7476, %5955, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6410:0
  %6916 : Float(150, 1, 8, 32, strides=[256, 256, 32, 1], requires_grad=1, device=cpu) = aten::permute(%6910, %7432), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6918 : Float(150, 1, 8, 32, strides=[256, 256, 32, 1], requires_grad=1, device=cpu) = aten::contiguous(%6916, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6919 : Long(requires_grad=0, device=cpu) = aten::mul(%6736, %6730), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6921 : int[] = prim::ListConstruct(%6919, %6745), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6922 : Float(150, 256, strides=[256, 1], requires_grad=1, device=cpu) = aten::view(%6918, %6921), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %6923 : Float(150, 256, strides=[256, 1], requires_grad=1, device=cpu) = aten::linear(%6922, %model.encoder.layers.3.self_attn.out_proj.weight, %model.encoder.layers.3.self_attn.out_proj.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6417:0
  %6925 : Long(device=cpu) = aten::size(%6923, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6418:0
  %6928 : int[] = prim::ListConstruct(%6730, %6736, %6925), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn
  %6929 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::view(%6923, %6928), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6418:0
  %input.33 : Float(1, 150, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%6929, %7473, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/activation.py:1395:0
  %6935 : Float(1, 150, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::dropout(%input.33, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.dropout.Dropout::dropout1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.35 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::add(%6725, %6935, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/transformer.py:919:0
  %6942 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::layer_norm(%input.35, %7433, %model.encoder.layers.3.norm1.weight, %model.encoder.layers.3.norm1.bias, %7487, %7488), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.normalization.LayerNorm::norm1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:2910:0
  %6943 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::linear(%6942, %model.encoder.layers.3.linear1.weight, %model.encoder.layers.3.linear1.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.linear.Linear::linear1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/linear.py:125:0
  %input.37 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::relu(%6943), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1704:0
  %6947 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::dropout(%input.37, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.dropout.Dropout::dropout # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.39 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::linear(%6947, %model.encoder.layers.3.linear2.weight, %model.encoder.layers.3.linear2.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.linear.Linear::linear2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/linear.py:125:0
  %6951 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::dropout(%input.39, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.dropout.Dropout::dropout2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.41 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::add(%6942, %6951, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/transformer.py:922:0
  %6958 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::layer_norm(%input.41, %7433, %model.encoder.layers.3.norm2.weight, %model.encoder.layers.3.norm2.bias, %7487, %7488), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.3/torch.nn.modules.normalization.LayerNorm::norm2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:2910:0
  %query.9 : Float(150, 1, 256, strides=[256, 38400, 1], requires_grad=1, device=cpu) = aten::transpose(%6958, %7473, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/activation.py:1339:0
  %6963 : Long(device=cpu) = aten::size(%query.9, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %6969 : Long(device=cpu) = aten::size(%query.9, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %6978 : Long(device=cpu) = aten::size(%query.9, %7479), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %6994 : Long(requires_grad=0, device=cpu) = aten::div(%6978, %6060, %6061), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6207:0
  %7026 : Long(device=cpu) = aten::size(%query.9, %7475), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5610:0
  %proj.9 : Float(150, 1, 768, strides=[768, 768, 1], requires_grad=1, device=cpu) = aten::linear(%query.9, %model.encoder.layers.4.self_attn.in_proj_weight, %model.encoder.layers.4.self_attn.in_proj_bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5614:0
  %7032 : int[] = prim::ListConstruct(%7482, %7026), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7033 : Float(150, 1, 3, 256, strides=[768, 768, 256, 1], requires_grad=1, device=cpu) = aten::unflatten(%proj.9, %7475, %7032), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/_tensor.py:1421:0
  %7035 : Float(1, 150, 1, 3, 256, strides=[115200, 768, 768, 256, 1], requires_grad=1, device=cpu) = aten::unsqueeze(%7033, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5618:0
  %7038 : Float(3, 150, 1, 1, 256, strides=[256, 768, 768, 115200, 1], requires_grad=1, device=cpu) = aten::transpose(%7035, %7477, %7483), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5619:0
  %7040 : Float(3, 150, 1, 256, strides=[256, 768, 768, 1], requires_grad=1, device=cpu) = aten::squeeze(%7038, %7483), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5620:0
  %7042 : Float(3, 150, 1, 256, strides=[38400, 256, 256, 1], requires_grad=1, device=cpu) = aten::contiguous(%7040, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5621:0
  %7045 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%7042, %7477, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %7048 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%7042, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %7051 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%7042, %7477, %7479), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %7492 : Long(device=cpu) = prim::Constant[value={8}]()
  %7467 : Long(requires_grad=0, device=cpu) = aten::mul(%6969, %7492), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %7055 : int[] = prim::ListConstruct(%6963, %7467, %6994), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7056 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%7045, %7055), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %7059 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%7056, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %7061 : Long(device=cpu) = aten::size(%7048, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %7073 : int[] = prim::ListConstruct(%7061, %7467, %6994), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7074 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%7048, %7073), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %7077 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%7074, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %7079 : Long(device=cpu) = aten::size(%7051, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %7091 : int[] = prim::ListConstruct(%7079, %7467, %6994), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7092 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%7051, %7091), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %7095 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%7092, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %7097 : Long(device=cpu) = aten::size(%7077, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6335:0
  %7106 : int[] = prim::ListConstruct(%6969, %7473, %7473, %7097), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7107 : Float(1, 1, 1, 150, strides=[150, 150, 150, 1], requires_grad=0, device=cpu) = aten::view(%7472, %7106), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6343:0
  %7114 : Float(1, 8, 1, 150, strides=[150, 0, 150, 1], requires_grad=0, device=cpu) = aten::expand(%7107, %7431, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6344:0
  %7119 : int[] = prim::ListConstruct(%7467, %7473, %7097), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7120 : Float(8, 1, 150, strides=[0, 150, 1], requires_grad=0, device=cpu) = aten::reshape(%7114, %7119), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6345:0
  %7128 : int[] = prim::ListConstruct(%6969, %7485, %7475, %7097), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7129 : Float(1, 8, 1, 150, strides=[0, 0, 150, 1], requires_grad=0, device=cpu) = aten::view(%7120, %7128), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6404:0
  %7131 : int[] = prim::ListConstruct(%6969, %7485, %6963, %6994), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7132 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%7059, %7131), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6406:0
  %7134 : int[] = prim::ListConstruct(%6969, %7485, %7097, %6994), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7135 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%7077, %7134), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6407:0
  %7138 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%7095, %7134), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6408:0
  %7143 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::scaled_dot_product_attention(%7132, %7135, %7138, %7129, %7486, %7476, %5955, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6410:0
  %7149 : Float(150, 1, 8, 32, strides=[256, 256, 32, 1], requires_grad=1, device=cpu) = aten::permute(%7143, %7432), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %7151 : Float(150, 1, 8, 32, strides=[256, 256, 32, 1], requires_grad=1, device=cpu) = aten::contiguous(%7149, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %7152 : Long(requires_grad=0, device=cpu) = aten::mul(%6969, %6963), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %7154 : int[] = prim::ListConstruct(%7152, %6978), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7155 : Float(150, 256, strides=[256, 1], requires_grad=1, device=cpu) = aten::view(%7151, %7154), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %7156 : Float(150, 256, strides=[256, 1], requires_grad=1, device=cpu) = aten::linear(%7155, %model.encoder.layers.4.self_attn.out_proj.weight, %model.encoder.layers.4.self_attn.out_proj.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6417:0
  %7158 : Long(device=cpu) = aten::size(%7156, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6418:0
  %7161 : int[] = prim::ListConstruct(%6963, %6969, %7158), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7162 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::view(%7156, %7161), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6418:0
  %input.43 : Float(1, 150, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%7162, %7473, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/activation.py:1395:0
  %7168 : Float(1, 150, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::dropout(%input.43, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.dropout.Dropout::dropout1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.45 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::add(%6958, %7168, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/transformer.py:919:0
  %7175 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::layer_norm(%input.45, %7433, %model.encoder.layers.4.norm1.weight, %model.encoder.layers.4.norm1.bias, %7487, %7488), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.normalization.LayerNorm::norm1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:2910:0
  %7176 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::linear(%7175, %model.encoder.layers.4.linear1.weight, %model.encoder.layers.4.linear1.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.linear.Linear::linear1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/linear.py:125:0
  %input.47 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::relu(%7176), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1704:0
  %7180 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::dropout(%input.47, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.dropout.Dropout::dropout # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.49 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::linear(%7180, %model.encoder.layers.4.linear2.weight, %model.encoder.layers.4.linear2.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.linear.Linear::linear2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/linear.py:125:0
  %7184 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::dropout(%input.49, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.dropout.Dropout::dropout2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.51 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::add(%7175, %7184, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/transformer.py:922:0
  %7191 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::layer_norm(%input.51, %7433, %model.encoder.layers.4.norm2.weight, %model.encoder.layers.4.norm2.bias, %7487, %7488), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.4/torch.nn.modules.normalization.LayerNorm::norm2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:2910:0
  %query : Float(150, 1, 256, strides=[256, 38400, 1], requires_grad=1, device=cpu) = aten::transpose(%7191, %7473, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/activation.py:1339:0
  %7196 : Long(device=cpu) = aten::size(%query, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %7202 : Long(device=cpu) = aten::size(%query, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %7211 : Long(device=cpu) = aten::size(%query, %7479), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6163:0
  %7227 : Long(requires_grad=0, device=cpu) = aten::div(%7211, %6060, %6061), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6207:0
  %7259 : Long(device=cpu) = aten::size(%query, %7475), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5610:0
  %proj : Float(150, 1, 768, strides=[768, 768, 1], requires_grad=1, device=cpu) = aten::linear(%query, %model.encoder.layers.5.self_attn.in_proj_weight, %model.encoder.layers.5.self_attn.in_proj_bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5614:0
  %7265 : int[] = prim::ListConstruct(%7482, %7259), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7266 : Float(150, 1, 3, 256, strides=[768, 768, 256, 1], requires_grad=1, device=cpu) = aten::unflatten(%proj, %7475, %7265), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/_tensor.py:1421:0
  %7268 : Float(1, 150, 1, 3, 256, strides=[115200, 768, 768, 256, 1], requires_grad=1, device=cpu) = aten::unsqueeze(%7266, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5618:0
  %7271 : Float(3, 150, 1, 1, 256, strides=[256, 768, 768, 115200, 1], requires_grad=1, device=cpu) = aten::transpose(%7268, %7477, %7483), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5619:0
  %7273 : Float(3, 150, 1, 256, strides=[256, 768, 768, 1], requires_grad=1, device=cpu) = aten::squeeze(%7271, %7483), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5620:0
  %7275 : Float(3, 150, 1, 256, strides=[38400, 256, 256, 1], requires_grad=1, device=cpu) = aten::contiguous(%7273, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5621:0
  %7278 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%7275, %7477, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %7281 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%7275, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %7284 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::select(%7275, %7477, %7479), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:5623:0
  %7493 : Long(device=cpu) = prim::Constant[value={8}]()
  %7469 : Long(requires_grad=0, device=cpu) = aten::mul(%7202, %7493), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %7288 : int[] = prim::ListConstruct(%7196, %7469, %7227), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7289 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%7278, %7288), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %7292 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%7289, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6296:0
  %7294 : Long(device=cpu) = aten::size(%7281, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %7306 : int[] = prim::ListConstruct(%7294, %7469, %7227), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7307 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%7281, %7306), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %7310 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%7307, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6298:0
  %7312 : Long(device=cpu) = aten::size(%7284, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %7324 : int[] = prim::ListConstruct(%7312, %7469, %7227), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7325 : Float(150, 8, 32, strides=[256, 32, 1], requires_grad=1, device=cpu) = aten::view(%7284, %7324), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %7328 : Float(8, 150, 32, strides=[32, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%7325, %7477, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6309:0
  %7330 : Long(device=cpu) = aten::size(%7310, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6335:0
  %7339 : int[] = prim::ListConstruct(%7202, %7473, %7473, %7330), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7340 : Float(1, 1, 1, 150, strides=[150, 150, 150, 1], requires_grad=0, device=cpu) = aten::view(%7472, %7339), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6343:0
  %7347 : Float(1, 8, 1, 150, strides=[150, 0, 150, 1], requires_grad=0, device=cpu) = aten::expand(%7340, %7431, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6344:0
  %7352 : int[] = prim::ListConstruct(%7469, %7473, %7330), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7353 : Float(8, 1, 150, strides=[0, 150, 1], requires_grad=0, device=cpu) = aten::reshape(%7347, %7352), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6345:0
  %7361 : int[] = prim::ListConstruct(%7202, %7485, %7475, %7330), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7362 : Float(1, 8, 1, 150, strides=[0, 0, 150, 1], requires_grad=0, device=cpu) = aten::view(%7353, %7361), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6404:0
  %7364 : int[] = prim::ListConstruct(%7202, %7485, %7196, %7227), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7365 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%7292, %7364), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6406:0
  %7367 : int[] = prim::ListConstruct(%7202, %7485, %7330, %7227), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7368 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%7310, %7367), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6407:0
  %7371 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::view(%7328, %7367), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6408:0
  %7376 : Float(1, 8, 150, 32, strides=[256, 32, 256, 1], requires_grad=1, device=cpu) = aten::scaled_dot_product_attention(%7365, %7368, %7371, %7362, %7486, %7476, %5955, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6410:0
  %7382 : Float(150, 1, 8, 32, strides=[256, 256, 32, 1], requires_grad=1, device=cpu) = aten::permute(%7376, %7432), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %7384 : Float(150, 1, 8, 32, strides=[256, 256, 32, 1], requires_grad=1, device=cpu) = aten::contiguous(%7382, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %7385 : Long(requires_grad=0, device=cpu) = aten::mul(%7202, %7196), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %7387 : int[] = prim::ListConstruct(%7385, %7211), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7388 : Float(150, 256, strides=[256, 1], requires_grad=1, device=cpu) = aten::view(%7384, %7387), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6414:0
  %7389 : Float(150, 256, strides=[256, 1], requires_grad=1, device=cpu) = aten::linear(%7388, %model.encoder.layers.5.self_attn.out_proj.weight, %model.encoder.layers.5.self_attn.out_proj.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6417:0
  %7391 : Long(device=cpu) = aten::size(%7389, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6418:0
  %7394 : int[] = prim::ListConstruct(%7196, %7202, %7391), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn
  %7395 : Float(150, 1, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::view(%7389, %7394), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:6418:0
  %input.53 : Float(1, 150, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::transpose(%7395, %7473, %7477), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.activation.MultiheadAttention::self_attn # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/activation.py:1395:0
  %7401 : Float(1, 150, 256, strides=[256, 256, 1], requires_grad=1, device=cpu) = aten::dropout(%input.53, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.dropout.Dropout::dropout1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.55 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::add(%7191, %7401, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/transformer.py:919:0
  %7408 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::layer_norm(%input.55, %7433, %model.encoder.layers.5.norm1.weight, %model.encoder.layers.5.norm1.bias, %7487, %7488), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.normalization.LayerNorm::norm1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:2910:0
  %7409 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::linear(%7408, %model.encoder.layers.5.linear1.weight, %model.encoder.layers.5.linear1.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.linear.Linear::linear1 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/linear.py:125:0
  %input.57 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::relu(%7409), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1704:0
  %7413 : Float(1, 150, 1024, strides=[153600, 1024, 1], requires_grad=1, device=cpu) = aten::dropout(%input.57, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.dropout.Dropout::dropout # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.59 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::linear(%7413, %model.encoder.layers.5.linear2.weight, %model.encoder.layers.5.linear2.bias), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.linear.Linear::linear2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/linear.py:125:0
  %7417 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::dropout(%input.59, %7486, %7476), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.dropout.Dropout::dropout2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:1425:0
  %input.Warning: onnx package not available, skipping model validation
======================================================================
ONNX Export with 3D nearest_keys Tensor
======================================================================
Loading checkpoint: /data/data/com.termux/files/home/git/swype/cleverkeys/model/full-model-49-0.795.ckpt
Model loaded: 79.5% word accuracy

=== Exporting Encoder ===
61 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::add(%7408, %7417, %7473), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/modules/transformer.py:922:0
  %input : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::layer_norm(%input.61, %7433, %model.encoder.layers.5.norm2.weight, %model.encoder.layers.5.norm2.bias, %7487, %7488), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.transformer.TransformerEncoder::encoder/torch.nn.modules.transformer.TransformerEncoderLayer::layers.5/torch.nn.modules.normalization.LayerNorm::norm2 # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:2910:0
  %7429 : Float(1, 150, 256, strides=[38400, 256, 1], requires_grad=1, device=cpu) = aten::layer_norm(%input, %7433, %model.encoder_norm.weight, %model.encoder_norm.bias, %7487, %7488), scope: __main__.export_encoder_onnx.<locals>.EncoderWrapper::/torch.nn.modules.normalization.LayerNorm::encoder_norm # /data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/nn/functional.py:2910:0
  return (%7429)

Traceback (most recent call last):
  File "/data/data/com.termux/files/home/git/swype/cleverkeys/model/export_onnx_3d.py", line 621, in <module>
    main()
  File "/data/data/com.termux/files/home/git/swype/cleverkeys/model/export_onnx_3d.py", line 605, in main
    export_encoder_onnx(model, encoder_path)
  File "/data/data/com.termux/files/home/git/swype/cleverkeys/model/export_onnx_3d.py", line 317, in export_encoder_onnx
    torch.onnx.export(
  File "/data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/onnx/__init__.py", line 383, in export
    export(
  File "/data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/onnx/utils.py", line 495, in export
    _export(
  File "/data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/onnx/utils.py", line 1428, in _export
    graph, params_dict, torch_out = _model_to_graph(
                                    ^^^^^^^^^^^^^^^^
  File "/data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/onnx/utils.py", line 1057, in _model_to_graph
    graph = _optimize_graph(
            ^^^^^^^^^^^^^^^^
  File "/data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/onnx/utils.py", line 632, in _optimize_graph
    graph = _C._jit_pass_onnx(graph, operator_export_type)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/onnx/utils.py", line 1695, in _run_symbolic_function
    k: symbolic_helper._node_get(node, k) for k in node.attributeNames()
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/data/com.termux/files/usr/lib/python3.12/site-packages/torch/onnx/symbolic_helper.py", line 120, in _node_get
    return getattr(node, sel)(key)
           ^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: required keyword attribute 'value' has the wrong type
