- 
                Notifications
    You must be signed in to change notification settings 
- Fork 579
RFC: TensorFlow Build Improvements #179
Conversation
        
          
                rfcs/20191122-tf-build-tlc.md
              
                Outdated
          
        
      | @@ -0,0 +1,213 @@ | |||
| # Tensorflow Build TLC | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Tensorflow Build TLC | |
| # Tensorflow Build Improvements | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
        
          
                rfcs/20191122-tf-build-tlc.md
              
                Outdated
          
        
      | Simplify Tensorflow’s build so that | ||
| 1. Builds are more granular | ||
| 2. Builds are easier to reason about | ||
| 3. TF’s build is well positioned to immediately benefit from [Bazel shared library support](https://docs.google.com/document/d/1qYHGcxP9BEi7t9b53r8STr9G3XVtvwy9V-EXqr54Xxg/edit#) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please link to a public document on this feature, or remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this is the one https://docs.google.com/document/d/13nolQXvXDsJ4yjWP1FRd-WscP2-rHZcR3Zb__js6mLA/edit#heading=h.7795zogmlqtv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (although the original doc link was a public doc).
Didn't mean to approve before feedback period.
| It's great to see this proposal. For DeepSpeech we have similar constraints as Android, we want to keep code size down, so for a while now we've been using very fine grained deps, and for one case we had to break up the TensorFlow target by hand to avoid a large binary size increase from a seemingly small dependency. That change in particular leads to several megabytes difference in our final binary size compared to simply depending on  | 
| 
 Thank you! We hope this will help make Tensorflow's build more approachable and usable in other projects! | 
… is now guaranteed to have cc_libraries foo_proto_cc_headers_only and foo_proto_cc_impl. The name foo_proto_cc aliases to either foo_proto_cc_headers_only or foo_proto_cc_impl depending on whether make_default_target_header_only is True. This also fixes a bug where header only tf_proto_libraries did not declare dependencies on the header-only tf_proto_libraries that they depend on. This is needed as part of the Tensorflow Build Improvement RFC, described here: tensorflow/community#179 PiperOrigin-RevId: 282674607 Change-Id: I408ef6a9dd841d2c3779c7d526c4b83d75040b53
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much needed proposal, thanks! Looking forward to the lowered barrier
        
          
                rfcs/20191122-tf-build-tlc.md
              
                Outdated
          
        
      |  | ||
| When multiple build targets contain the same source files, they form parallel build graphs. This is currently the case with the build targets [//tensorflow/core:mobile_srcs_no_runtime](https://github.com/tensorflow/tensorflow/blob/500438a24419f8f0921bc8bb52d1951e3ee4235e/tensorflow/core/BUILD#L1572-L1582) and [//tensorflow/core:framework_internal_impl](https://github.com/tensorflow/tensorflow/blob/500438a24419f8f0921bc8bb52d1951e3ee4235e/tensorflow/core/BUILD#L2654-L2666) with respect to any C++ files in tensorflow/core/util/*. Any build target that accidentally includes both of these targets in its dependencies will have [ODR violations](https://en.wikipedia.org/wiki/One_Definition_Rule). | ||
|  | ||
| This parallel build graph exists because Android would like to produce a minimal Tensorflow shared object by pulling in only a subset of Tensorflow's codebase. The reason why Android couldn't simply use the existing Tensorflow Build targets is because they are [too large](https://docs.google.com/document/d/1kg_tVB1g3c5isPaP1OXW_KZwF_hcTiKz5afLkKjiSz8/edit#heading=h.8tfaww3ddb83). Therefore, if we address [problem 1](https://docs.google.com/document/d/1kg_tVB1g3c5isPaP1OXW_KZwF_hcTiKz5afLkKjiSz8/edit#heading=h.ytms1uc3twy), we should be able to refactor the Android build to re-use these new smaller targets. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were these meant to link to a google doc instead of headers in this doc? If so they are private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good catch! Yes these were supposed to links back to headers in the same markdown file. Just fixed, thanks!
This change is part of the refactoring described in Tensorflow Build Improvements RFC: tensorflow/community#179 PiperOrigin-RevId: 282970277 Change-Id: I1430d0beea8917bd4fb167f30c740e2e9329b3cc
This change is part of the refactoring described in Tensorflow Build Improvements RFC: tensorflow/community#179 PiperOrigin-RevId: 283675621 Change-Id: I355d72b476c3f4222dd3e83768a374bf3cc8beb3
…ules. This allows tensorflow to start experimenting with rules_cc's cc_shared_library implementation. This change is part of the Tensorflow Build Improvements RFC: tensorflow/community#179 PiperOrigin-RevId: 283908432 Change-Id: Id497dfee6211650f0e10ccb1332ac625af2f23fd
        
          
                rfcs/20191122-tf-build-tlc.md
              
                Outdated
          
        
      |  | ||
| For tensorflow/core/BUILD, we will introduce BUILD files in: | ||
|  | ||
| - [tensorflow/core/common_runtime](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/common_runtime) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are MKL components in several of these folders but default builds don't include them and won't detect errors during the refactoring process. How can we help make this smoother?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Clayne,
That's a great point; so far the changes we've made don't touch anything MKL specific, but as we start refactoring core/common_runtime we will need to be careful not to break MKL integration.
My understanding is that Tensorflow builds use MKL by default via Eigen: https://github.com/tensorflow/tensorflow/blob/d26ba01420ed9e4e1a55f22ca0e3034fdd3d3671/tensorflow/core/kernels/BUILD#L756-L758, meaning hopefully our CI will start yelling at us if a breakage occurs; but I'll double check with @gunan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, most of MKL is not covered under the presubmits we are running. By default, only tensor contraction ops use MKL-DNN.
I think the majority of MKL code is under tensorflow/core/kernels, as MKL kernel implementations. The requirement there is, nothing outside tensorflow/core/kernels should define kernels as dependencies (except for binaries, or pip package).
I think MKL also has some code outside kernels, and for these, it depends on where exactly the code is. As Brian suggested, we will  start from the bottom layer of the code, and slowly build up our work.
But if you could refactor your code to minimize number of dependencies, break any include cycles and build a clear hierarchy within your code/files, that would help us greatly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can look at that. Most of the MKL code is in mkl_* files and/or surrounded by #ifdef INTEL_MKL_DNN code blocks so should be fairly easy to identify when restructuring BUILDs
…d wires it into tensorflow/core/BUILD. This change is part of the refactoring described in Tensorflow Build Improvements RFC: tensorflow/community#179 PiperOrigin-RevId: 284279060 Change-Id: I1ac535f3f811643dceccd8caafa4c4dcfdfd3b88
…rm:stacktrace_handler into tensorflow/core/BUILD. This change also removes tf_additional_lib_srcs, which after bazel starlark expansion amounts to just stacktrace_handler. This change is part of the refactoring described in the Tensorflow Build Improvements RFC: tensorflow/community#179 PiperOrigin-RevId: 284385812 Change-Id: I2ad704e196cf354008a3f7e5132d6bdd9f4569d9
This change also removes the glob tensorflow/core/platform:legacy_lib_internal_srcs, and replaces it with the leftover from the glob's expansion. This change is part of the Tensorflow Build Improvements RFC: tensorflow/community#179 PiperOrigin-RevId: 284386433 Change-Id: I052f3381c76fbaea52f3299618c6961ec506fbcc
…latform-specific headers explicit, and also helps make checking in BUILD files in subdirectories of tf/core/platform easier. This change is part of the Tensorflow Build Improvements RFC: tensorflow/community#179 PiperOrigin-RevId: 284392848 Change-Id: I635d2051bc064d88dbdd8fb146636887adc19dbb
…e_google, and extract their common files into a single filegroup. This change helps prepare for a subsequent change adding BUILD files to tf/core/platform/default and tf/core/platform/windows. This change is part of the Tensorflow Build Improvements RFC described here: tensorflow/community#179 PiperOrigin-RevId: 284436373 Change-Id: I695ef8042388ad2bc17c763b87f4f7d0e189cd74
…tform/windows/BUILD. This is part of the refactoring described in the Tensorflow Build Improvements RFC: tensorflow/community#179 Subsequent changes will migrate targets from build_refactor.bzl into the new BUILD files. PiperOrigin-RevId: 284712709 Change-Id: I650eb200ba0ea87e95b15263bad53b0243732ef5
…/build_refactor.bzl into tf/core/platform/windows/BUILD. Subsequent changes will do the same for tf/core/platform/default/BUILD, and remove build_refactor.bzl. This refactoring is part of the Tensorflow Build Improvements RFC described here: tensorflow/community#179 PiperOrigin-RevId: 284858768 Change-Id: I60dc0b81c6d5e7c81722e33c4d8ce290499e9b54
…/build_refactor.bzl into tf/core/platform/default/BUILD. Subsequent changes will change tensorflow/core/platform/BUILD to use these targets, and remove build_refactor.bzl. This refactoring is part of the Tensorflow Build Improvements RFC described here: tensorflow/community#179 PiperOrigin-RevId: 284885829 Change-Id: I80b0722f11d89b4cb0bef628beedd24d882ac1b4
…BUILD and tf/core/platform/windows/BUILD, allowing us to finally remove build_refactor.bzl. This refactoring is part of the Tensorflow Build Improvements RFC described here: tensorflow/community#179 PiperOrigin-RevId: 284910748 Change-Id: I6bd2447a22d3f5c81988959a7fece803ba34bc70
…BUILD and tf/core/platform/windows/BUILD, allowing us to finally remove build_refactor.bzl. This refactoring is part of the Tensorflow Build Improvements RFC described here: tensorflow/community#179 PiperOrigin-RevId: 284913256 Change-Id: I46b7bd50f2d2056ac54f7adf49a7a3ecc5ccbe86
This is part of the refactoring described in tensorflow/community#179 PiperOrigin-RevId: 286819947 Change-Id: Iff7841d87244a10ab4554e61b2731350020ff236
…build_config:protos_cc, and tensorflow/core/platform/default/build_config:protos_cc_impl. This is part of the refactoring described in tensorflow/community#179 PiperOrigin-RevId: 286821940 Change-Id: I7295a47489788f7ad52dad639ea8db60b100c6a6
…ild_config/BUILD. This helps pave the way for removing the build_config folder. This is part of the refactoring described in tensorflow/community#179 PiperOrigin-RevId: 286822930 Change-Id: Ic9d0d064ed14c5c9bfb1822ae958b402ac2581bd
…ve custom macros defined in tf/core/platform/default/build_config.bzl, and remove the build_config folder. This is part of the refactoring described in tensorflow/community#179 PiperOrigin-RevId: 286827875 Change-Id: Ied11f461cba3a18b48669be4853be8fd9a1f0f72
| As mentioned in SIG Build, here's a link to the master spreadsheet of the current status of the refactoring so far: | 
This is necessary to start using bazel's experimental cc_shared_library support. This change is part of the refactoring described in tensorflow/community#179 PiperOrigin-RevId: 288808507 Change-Id: Ie78fb4ff8dad128ebef280037cce4d3c4f42addc
…d moving existing usages of cc_library, cc_test, and cc_binary to rules_cc's version for a subset of the build known to be part of libtensorflow_framework. We will migrate further subdirectories of tf/core as we go along. This is part of Tensorflow's build refactoring, described in: tensorflow/community#179 PiperOrigin-RevId: 289161138 Change-Id: Ic28a5b032a44315ea0528ad8c6737b36eb1d27a6
…ibrary to tf/.bazelrc after patching the iOS build failure. This basically is e635ec0 with an additional patch to rules_swift. This change is part of the build refactoring described in tensorflow/community#179 PiperOrigin-RevId: 289776116 Change-Id: I7f29e0e0b4447334a334ad888f464e12fbe29485
…ibrary to tf/.bazelrc after patching the iOS build failure. This basically is e635ec0 with an additional patch to rules_swift. This change is part of the build refactoring described in tensorflow/community#179 PiperOrigin-RevId: 289909322 Change-Id: I06f10d811f0ca598047e837acc0230afbf290e6f
…on of cc build rules. This is part of the build refactoring described in tensorflow/community#179 PiperOrigin-RevId: 290466858 Change-Id: I9b38b1b7f44f1defea9be6ffb9e5da0c5ca99fb5
…factoring described in tensorflow/community#179. PiperOrigin-RevId: 290757344 Change-Id: I2c4214ad7b34372c1ff1fbdd94baf5d3721c2a5e
…refactoring described in tensorflow/community#179. PiperOrigin-RevId: 291839268 Change-Id: I4ace5769ae4a41753c56114bbb9d841a0035d9e8
This is necessary to start using bazel's experimental cc_shared_library support. This change is part of the refactoring described in tensorflow/community#179 PiperOrigin-RevId: 294736824 Change-Id: I8d13c71545cf2f2edd47818b40fa413e3e9c3254
| 
 Can we extend the deadline? Is this closed for comment? | 
| Hi, @bhack | 
| @mihaimaruseac Without any other label feedback I don't understand the status of an open PR after the public RFC deadline. Can you clarify this a little bit related to the point 5 of our RFC governance at https://github.com/tensorflow/community/blob/master/governance/TF-RFCs.md? | 
| @ematejska can you check on status of this RFC and make sure it's in the right bucket? | 
| Oops as I was updating my fork, I accidentally rewrote this commit, which automatically closed this PR; unfortunately, it looks like re-pushing will create a new PR. | 
This RFC will be open for comment until Friday, December 13th, 2019.
cc @gunan @martinwicke
Tensorflow Build TLC
Objective
Simplify Tensorflow’s build so that