Commits · softmaxcrossentropyloss · IMS_hardware_AI / elebrusORT

26 Mar, 2020 1 commit
- changes. · ab832fbe
  Zeeshan Siddiqui authored 4 years ago
  
  ab832fbe
18 Mar, 2020 2 commits
- Update submodule to private onnx repo. · 2567e07f
  Zeeshan Siddiqui authored 4 years ago
  
  2567e07f
- Revert removing int32/int64 type for label tensor. · 14948acd
  Zeeshan Siddiqui authored 4 years ago
  
  14948acd
17 Mar, 2020 2 commits
- fix test failures. · eb2114d4
  Zeeshan Siddiqui authored 4 years ago
  
  eb2114d4
- SoftmaxCrossEntropyLoss forward and backward kernel implementation. · e18bbbf1
  Zeeshan Siddiqui authored 4 years ago
  
  e18bbbf1
16 Mar, 2020 1 commit
- Udpate License Header (#3212) · 4b2c8e88
  Sherlock authored 4 years ago
  
  4b2c8e88
14 Mar, 2020 2 commits
- Update bert-base convergence values · 3a7539e0
  Jesse Benson authored 4 years ago
  
  3a7539e0
- Tweak the dropout calculation. · dc11b829
  Jesse Benson authored 4 years ago
  
  dc11b829
12 Mar, 2020 5 commits

Revert change from RelWithDebInfo to Release in OnnxRuntime.CSharp.sln. · 24793f5f
Edward Chen authored 4 years ago

24793f5f

Merged PR 5688: Upgrade ONNX submodule to the latest from github ONNX master. · 2cad08bd

We want to implement SoftmaxCrossentropy and NegativeLossLikelihoodLoss forward training ops for opset-12 but that requires ONNX submodule to point to the latest commit to have the latest and greatest ONNX spec!

- Reverse integrate changes from *.in.proto files in github ONNX repo.
- Regenerate csharp/test/Microsoft.ML.OnnxRuntime.Tests/OnnxMl.cs
- Disable ONNX tests that don't have op implementation for the latest opset.

2cad08bd

Merged PR 5686: fix P100/fp16 issues · 2f1e997e

Ethan Tao authored 4 years ago

1. misaligned address in atomic_add()
2. GatherNDGradKernel to use atomic_add
3. enable/add UTs for GatherNDGrad and reduction_ops using half
- __CUDA_ARCH__ won't take effect on .cc code, leverage HasCudaEnvironment() instead
4. verified convergence graph and perf test
- p100 is much slower than v100 on fp16
- fp16/128 need to reduce batch size from 66 to 64 to avoid OOM issue
5. verify convergence test on Dev3/v100

TBD - broken UTs related to MatmulIntegerOpTest (works on v100/windows, though)

2f1e997e

Initial implementation of graph cut and pipeline · 75025461

Ke Deng authored 4 years ago

This is a draft of graph cut and wait/record to demonstrate cut and Wait/Record design. You may find sub models and profiling json under onnxruntime/test if you run "onnxruntime_test_all --gtest_filter=GradientGraphBuilderTest.TrainingSession_WithPipeline"

75025461

Add back orttraining-linux-gpu-inference-only-ci-pipeline.yml. (#3182) · fa4dd51e
edgchen1 authored 4 years ago

fa4dd51e

11 Mar, 2020 4 commits
- Change Tensor::[Set]ByteOffset() to use ptrdiff_t. · 3af5a2a2
  Edward Chen authored 4 years ago
  
  3af5a2a2
- Enable CI for training. · 80dd62a2
  Edward Chen authored 4 years ago
  
  80dd62a2
- Introduce training changes. · e542cfd0
  Edward Chen authored 4 years ago
  
  e542cfd0
- Support custom ops targeting the CUDA EP (#3165) · a912415b
  Hariharan Seshadri authored 4 years ago
```
* Initial commit

* Minor nit

* Comment

* Fix build

* Fix build
```
  a912415b
10 Mar, 2020 5 commits
- Explicitly specify NugetPackage parameter while validating nuget in some release pipelines (#3139) · 3464801c
  Hariharan Seshadri authored 4 years ago
  
  3464801c
- Move zero point inputs of MatmulInteger to CPU memory (#3159) · 3de1fc09
  Yufeng Li authored 4 years ago
  
  3de1fc09
- Update bert optimization script for SQuAD model exported by keras2onnx (#3163) · 51a8c829
  Tianlei Wu authored 4 years ago
```
Update script to make it work on fine-tuned bert model exported by keras2onnx
```
  51a8c829
- Make quantization parameters as constant weigth instead of overrideable (#3160) · 876d0c54
  Yufeng Li authored 4 years ago
  
  876d0c54
- Use GEMM for LinearRegressor and LinearClassifier operators to improve performance (#3154) · 3d928de7
  Scott McKay authored 4 years ago
  
  3d928de7
09 Mar, 2020 2 commits
- Add package download step before pushing to feeds (#3162) · f87b6913
  Dmitri Smirnov authored 4 years ago
```
Add package download step before publishing.
```
  f87b6913
- Update post_binary_sizes_to_dashboard.py (#3161) · 6ed5d7c3
  Changming Sun authored 4 years ago
```
Discussed with Faith, because the data size is very small and changes are gradual, there is no need to delete the old data. We want to keep all the history.
```
  6ed5d7c3
06 Mar, 2020 5 commits
- Publish release symbols (#3152) · a5924309
  Tiago Koji Castro Shibata authored 4 years ago
```
* Publish release symbols

* Publish symbols if IsReleaseBuild
```
  a5924309
- Updated Ruby supported versions · 781a6ebb
  Andrew Kane authored 4 years ago
  
  781a6ebb
- Help output typo fix · cfd18b58
  pranavm-nvidia authored 4 years ago
```
Fixes a typo in the help output for `symbolic_shape_infer`
```
  cfd18b58
- Update Gelu Fusion to support new graph pattern from PyTorch 1.4 (#3148) · 5be6665b
  Tianlei Wu authored 4 years ago
```
* update GeluFusion to support pattern from PyTorch 1.4; 
* Fix a bug that missing the check of an edge between mul2 and root.
* update script to fuse gelu from PyTorch 1.4
* Add test for python optimizer
```
  5be6665b
- Fix package name overrides (#3150) · e2894c5f
  Dmitri Smirnov authored 4 years ago
```
Add env var with the package name.
```
  e2894c5f
05 Mar, 2020 3 commits
- Support u8u8 in quantization tool (#3140) · 1d2b8115
  Yufeng Li authored 4 years ago
  
  1d2b8115
- Disable delayload for cuda dlls (#3147) · ade4fa10
  KeDengMS authored 4 years ago
```
This change fixes #3129. When running onnxruntime as dll on Windows, CUDA does some internal cleanups when process exits. After this, any call to CUDA would cause crash. Delayload makes thread_local destructor to happen after CUDA cleanup, thus the crash.
```
  ade4fa10
- Add push to ORT-NIGHTLY. (#3146) · 2c446a7f
  Dmitri Smirnov authored 4 years ago
  
  2c446a7f
04 Mar, 2020 6 commits

Implement QuantizeLinear and DequantizeLinear (#3098) · fbb658e6
Yufeng Li authored 4 years ago
```
* Implement QuantizeLinear and DequantizeLinear
```
fbb658e6
Suppress maybe uninitialized warning in gcc-9 · 83753bcb
take-cheeze authored 4 years ago

83753bcb

Override native package name. Preserve managed package name the same. (#3133) · ef8768a5

Dmitri Smirnov authored 4 years ago

Override native package name. Preserve managed package name the same.
  Specify pckage name for validation purposes.
 Fix up validation package name parameter.

ef8768a5

Optimised kernel_dot() in SVM op (#3135) · a2eeb126
Prabhat authored 4 years ago

a2eeb126

Add bert performance and correctness test tools (#3108) · 9d874c12

Tianlei Wu authored 4 years ago

(1) Add performance test tool for bert model.
(2) Add accuracy test tool to compare inference results of original and optimized bert models.
(3) Add test data generator tool to create test data for onnxruntime_perf_test.exe
(4) Improve bert optimization script: Verify model producer for model_type; Add warning if model is not fully optimized.
(5) Add shape optimizer tool to assist developing optimization script.
(6) Update readme.

9d874c12

Implement MatmulInteger on GPU (#3070) · 84ad4eda
Yufeng Li authored 4 years ago
```
* Implement MatmulInteger
```
84ad4eda

03 Mar, 2020 2 commits

Fix CUDA PATH (#3131) · 12605f05

Changming Sun authored 4 years ago

Previously, we put the "bin" folder of all the CUDA verions in the system PATH. And 10.2 is in the front. It's a mess.
So I've removed all of them from the system PATH env. But I need to add one of them back through build scripts.

(The problem only affect the C# test, not the C/C++ tests that forked from build.py).

12605f05

Enable DML Nuget Package for x64 or x86 architectures (#3120) · 6cdd2b49

smk2007 authored 5 years ago

* add dml gpu pipelines

* add x86 to the gpu dml dev build pipeline

* Enable DML x86 builds

* Fix uint64_t -> size_t warning

* fix warnings

* enable dml on x86 ci builds

* operatorHelper 773 error uint32_t vs uint64_t

* operatorHelper 773 error uint32_t vs uint64_t

* make x86 pipeline use the gpu pool

* more warnings

* fix x86 directml path

* make dml nuget package

* disable tf_pnasnet_large

* disable zfnet512

* make validation use wildcards

* disable x86 dml gpu tests

* add args.

* update gpu.yml

* change nupkg wildcard

* add debug statements

* package x86 dml nupkg

* dont drop managed nuget again from dml pipeline build

* Add DML EULA

* directml license should be renamed to not clobber the existing license

* casing on dml package....

* {} to ()

* fix license name

* disable dml from x86 ci

* typo and cr feedback

* remove featurizers

* ship the dml pdb as well

6cdd2b49