Adding Gloo as a subtree #2196

soumith · 2017-07-24T22:13:50Z

[DONOT MERGE], only for contbuild

Summary: In the GitHub repository this directory will be mirrored similar to folly, such that the repository has a single top level directory called "gloo". This allows for versioning or renaming of the project root, without having to mangle the include paths; they will always use the "gloo" prefix. fbshipit-source-id: 24502e4185fc7cbe19b5249f83609e2b8118e9d7

Summary: Testing pull request again. Closes pytorch/gloo#2 Reviewed By: pietern Differential Revision: D4542327 Pulled By: Yangqing fbshipit-source-id: 5bd66c32c7249f1327225117815bef64b8708722

Summary: The CUDA benchmark suite will be a separate build target, so the runner should be reused. Reviewed By: Yangqing Differential Revision: D4545092 fbshipit-source-id: 6ccf2d30f5d35c74fc59851b25416bfe6863d62c

Summary: This CUDA-aware ring allreduce is based on the regular ring allreduce. It runs the reduction algorithm on the CPU and is therefore most suited for smaller buffers. Both the device-to-host memcpy's at the start of the algorithm and the host-to-device memcpy's at the end of the algorithm are kicked off asynchronously in an attempt to parallize as much as possible. Reviewed By: Yangqing Differential Revision: D4542816 fbshipit-source-id: 101dfad276ca79703e37ff93fb1b6d467295f66b

Summary: TSIA Reviewed By: plapukhov Differential Revision: D4549105 fbshipit-source-id: 61c8966e429e0701677f441aeaaf27fdc5e669e7

Summary: Separate benchmark build target for CUDA-aware algorithms. This is needed to keep CUDA an optional dependency. Differential Revision: D4546932 fbshipit-source-id: b73176ae9067233f883d51ba3ab4efbb13a6f86f

Summary: Implement CUDA BroadcastOneToAll algorithm for GPU addresses. Refactor cuda.h into cuda_private.h to allow inclusion of <cuda.h> in public headers without polluting the namespace. Port broadcast tests to GPU variants. * this revision is based on Peter's revision D4546932 Differential Revision: D4547382 fbshipit-source-id: 3d294ad8862b04fb783ba22e5c925b8d7cbc8a8d

Summary: In synchronous mode, it is not the device thread that is responsible for handling I/O, but the user thread itself. Calling waitRecv on a buffer will trigger the read function on the pair to be called. This eliminates the context switch necessary if the device thread is handling all I/O. For benchmarks with small numbers of elements this reduces latency by as much as 20%. Reviewed By: plapukhov Differential Revision: D4549998 fbshipit-source-id: ab718ba090c06d7c7aa4065cc9f92bd96b9e4a35

Summary: The CudaDevicePointer optionally takes an existing stream on which it runs any operation associated with the pointer (for now just memcpy's, but this likely will includes kernel execution in the future). Differential Revision: D4574035 fbshipit-source-id: ddd7972a3874012059f1fde1b341fd6edd69102d

Summary: Latency optimization is going well and I've seen the odd case of <10us measurements. This option makes the benchmark tool display nanos instead. Differential Revision: D4575925 fbshipit-source-id: 98dbd3b39e31cbcdd4c146613f6630e721187e1e

Summary: Ideally we would want the driver to busy-poll for us. In absence of driver support, spinning with MSG_DONTWAIT flag seems to be helping a lot too. Of course, we pay the price of burning one core for polling. Sigh. Reviewed By: pietern Differential Revision: D4576242 fbshipit-source-id: 85d9e1b786fbb6053864fba80f3e5ecc80fe221d

Summary: First pass at a CUDA-aware allreduce chunked implementation. For now the algorithm runs on the CPU and is mostly copy/paste from allreduce_ring.h. A subsequent pass will offload to the GPU. Serialize cuda test to avoid intermittent failures due to memory contention. Reviewed By: pietern Differential Revision: D4576959 fbshipit-source-id: e1f292a05b88ff24c33e549d4a52e770a21f85d2

Summary: I was mistakenly calling the non-chunked algorithm for the chunked test. Reviewed By: pietern Differential Revision: D4580160 fbshipit-source-id: 9d62a68e9e86cc6e596d90ff8854c585a0e8855c

Summary: Work may be queued on CUDA streams for asynchronous execution. The memory backed by pointers passed to any algorithm can therefore be mutated after constructing an algorithm instance. By also passing in the streams these mutations happen on, the algorithms can synchronize with these mutations to ensure no invalid data is used. By passing in these streams, any work done by these algorithms will *also* be queued, which effectively removes a single synchronization step from any algorithm run. Differential Revision: D4589394 fbshipit-source-id: 0c8cd6ba9c9018f33d6f4c55a037083fc4164acb

Summary: TSIA Differential Revision: D4591755 fbshipit-source-id: fa435f4ad6b97453c3c9516b4bfc9f8f0fb2e4f1

Summary: Adds script to populate third-party directory. Differential Revision: D4591509 fbshipit-source-id: 28934feb536a9f3a066d8c40988337f3dddffaed

Summary: The AllReduceChunked algorithm currently performs the local reduce/broadcast of local device buffers in host memory. This diff updates the algorithm to execute the local reduce/broadcast steps using NCCL operations before copying a single device buffer to/from host memory. Reviewed By: pietern Differential Revision: D4587441 fbshipit-source-id: 4de689f59a6cf898b8eecd3c3b9f57f77124c0e3

Summary: Allow gloo consumers to assign a mutex to synchronize CUDA malloc/free and NCCL operations. Reviewed By: pietern Differential Revision: D4622135 fbshipit-source-id: 60acd7c01a677a0df5415fe38e6ef5a2e7c8606a

Summary: std::atomic was not defined for cuda.cu. Reviewed By: andrewwdye Differential Revision: D4624611 fbshipit-source-id: 973bba10026e065667d6a576055d00505ee02d62

Summary: TSIA Reviewed By: andrewwdye Differential Revision: D4626965 fbshipit-source-id: 2d32b07182202f65e673795aefacc6cc991d3c7c

Summary: All pairs created by a device would use the same completion queue. Supporting sync mode that way is difficult, as there is no way to filter completions for a particular pair. This change refactors this to use a single completion queue per pair so that this is no longer an issue. This change is a preparation for supporting synchronous mode (where the calling thread itself will poll the ibv library for completions instead of the device thread). This change also includes a refactoring of the way transient memory regions are handled so that they are properly deregistered and deallocated when no longer needed. Reviewed By: andrewwdye Differential Revision: D4625146 fbshipit-source-id: 21bf5ab321534fbd5c03f12049c10fc67da68944

Summary: Synchronous mode means using the calling thread instead of the device thread for completion handling. Since this saves a context switch in the critical path, this is very beneficial for low latency algorithms. For example: the p99 of a 4-way barrier drops from 17us to 4us. Reviewed By: andrewwdye Differential Revision: D4626948 fbshipit-source-id: 013b1680497589fe5ad0bca38600bce6a410200b

Summary: CUDA documentation detailing high-level support for CUDA in gloo algorithms, usage of streams, and synchronizing memory management. Reviewed By: pietern Differential Revision: D4633120 fbshipit-source-id: d88e230c8dc82fe48cda0f401b61758fa4f07f2e

Summary: With this change, every buffer gets assigned a different value at every index. This means reordering of segments (e.g. in the chunked algorithm) would surface as test errors. Reviewed By: andrewwdye Differential Revision: D4636368 fbshipit-source-id: 464eb1515d1590e12481961d427a92e2ebb3be82

…ssed in Summary: Cuda algorithms take an optional set of device streams to sequence operations. If streams are provided, the algorithms should enqueue final output buffer operations on the associated stream and return asynchronously. Destructors that allocate streams/events should synchronize before tearing down. Reviewed By: pietern Differential Revision: D4636447 fbshipit-source-id: 32ec2adc214c83b0b4bc0fff8993ab196459117b

Summary: The NCCL code used in CUDA-aware allreduce does local reduction of N buffers prior to putting anything on the wire. Supporting this in the benchmark tool to measure the impact under various configurations. Other minor tweaks in this change: * Specify sub-second iteration time * Templatize allreduce benchmarks (the algorithms share a constructor prototype) Reviewed By: andrewwdye Differential Revision: D4639517 fbshipit-source-id: f7417d3e9f79278a3b1eca48d779f48b77e5260c

Summary: TSIA Reviewed By: andrewwdye Differential Revision: D4644734 fbshipit-source-id: 50f5fadd2c5cd04e06a025f5538187ed852e669a

Summary: Remove underscores from public fields in NCCLContext Reviewed By: pietern Differential Revision: D4645857 fbshipit-source-id: 2c28a1c23d31097d685c0768dad9b99bbef7b171

Summary: The fields are public so their names should not end with an underscore. Reviewed By: andrewwdye Differential Revision: D4645038 fbshipit-source-id: c12b47affbe511383a4722717a06abb61918473b

Summary: TSIA Reviewed By: andrewwdye Differential Revision: D4647587 fbshipit-source-id: a804e7479e6e2f511bfa59712b4b4a88bdf657e3

Summary: TSIA Reviewed By: romain-intel Differential Revision: D5158642 fbshipit-source-id: 6e55a69a140c1f5f6e4ce6262afaf5014c412414

Summary: Machines may not create their Gloo pairs at the same time, due to earlier variable time work. Increase the timeout used to establish the initial tcp connection to accommodate without sacrificing the shorter default timeout for outstanding reads/writes. No related change required for ibverbs as there is no communication on init. Reviewed By: akyrola Differential Revision: D5184518 fbshipit-source-id: 0e6c9704a2d2f1406b3927f75887f0a42199450b

Summary: While debugging #43 I found common/common.h missing some headers as well. Fixes #43. Closes pytorch/gloo#44 Differential Revision: D5194970 Pulled By: pietern fbshipit-source-id: 4861cd04c56931d4759f5bc050816788252003ee

Fix NCCL directory typo

Summary: Replace call to function that is only supported in CUDA 8.0 with one that has been supported in previous releases. Reviewed By: pietern Differential Revision: D5231755 fbshipit-source-id: d72aec2a4a1c511064a65142887f8a05b51dad55

Summary: \cc pietern Minimal changes to allow gloo to compile and run with NCCL 2.0 Closes pytorch/gloo#46 Differential Revision: D5268074 Pulled By: pietern fbshipit-source-id: 58d625d57b31cfc932f3dbbdd7a4b83d9a2e60a8

Summary: This changes prepares for having a separate set of collectives that use native CUDA calls instead of NCCL. This is needed to workaround the issue where NCCL deadlocks when it is interleaved with CUDA memory management operations in other processes on the same machine. Includes a modification to the host reduction functions to bring them up to parity with the NCCL reduction functions (they now incorporate offset/counter arguments). Reviewed By: wesolwsk Differential Revision: D5276291 fbshipit-source-id: 8844731760d2c48577d207c026ce0cd641f2fc6d

Summary: Previously, `gloo/math.h` inlined methods which use AVX builtins, which required propagating the `-mavx` flag. This diff moves these definitions out of the header and into a source file to prevent avoid this. Reviewed By: pixelb Differential Revision: D5271043 fbshipit-source-id: dde4dc560dfb557b46d1a582a8b38e7cb8eb0c37

Summary: Code in tcp/transport tries to find the network interface a socket was bound to when create a TCP device context. Per getifaddrs(3), it is possible for the ifa_addr field to be NULL (supposedly when an interface doesn't have an address). Ignore such entries. Thanks to slayton58 for reporting this. Reviewed By: wesolwsk Differential Revision: D5279376 fbshipit-source-id: 039380b95ba4d6d94942c30581e0b230a060870c

Summary: Adds a separate set of CUDA collectives that run on device as an alternative to NCCL. Use these collectives as default on-device collectives instead of NCCL. Whenever multiple processes on the same machine use Gloo with NCCL and end up doing concurrent CUDA memory allocations and algorithm execution, we risk deadlock. A follow up change will enable opt-in usage of NCCL (e.g. through environment variable). Benchmark output below with varying number of elements. It shows a minor improvement over using NCCL for local reduction and broadcast. Number of elements equal to on-device threshold (256K): ``` Device: tcp, pci=0000:25:00.0, iface=eth0, speed=50000 Algorithm: cuda_allreduce_ring Options: processes=2, inputs=8, gpudirect=no elements min (us) p50 (us) p99 (us) max (us) samples (before) 262144 2685 2907 3035 3215 562 (after) 262144 2682 2874 3013 3395 577 Device: tcp, pci=0000:25:00.0, iface=eth0, speed=50000 Algorithm: cuda_allreduce_ring_chunked Options: processes=2, inputs=8, gpudirect=no elements min (us) p50 (us) p99 (us) max (us) samples (before) 262144 2045 2133 2325 2643 725 (after) 262144 1533 1673 1834 2048 800 Device: tcp, pci=0000:25:00.0, iface=eth0, speed=50000 Algorithm: cuda_allreduce_halving_doubling Options: processes=2, inputs=8, gpudirect=no elements min (us) p50 (us) p99 (us) max (us) samples (before) 262144 1580 1640 1718 2069 893 (after) 262144 1371 1446 1539 1748 1125 ``` Larger number of elements (4M): ``` Device: tcp, pci=0000:25:00.0, iface=eth0, speed=50000 Algorithm: cuda_allreduce_ring Options: processes=2, inputs=8, gpudirect=no elements min (us) p50 (us) p99 (us) max (us) samples (before) 4194304 55543 58058 60103 62659 32 (after) 4194304 54490 57923 60893 66058 33 Device: tcp, pci=0000:25:00.0, iface=eth0, speed=50000 Algorithm: cuda_allreduce_ring_chunked Options: processes=2, inputs=8, gpudirect=no elements min (us) p50 (us) p99 (us) max (us) samples (before) 4194304 18049 22820 24997 26634 105 (after) 4194304 18356 20463 21695 22589 99 Device: tcp, pci=0000:25:00.0, iface=eth0, speed=50000 Algorithm: cuda_allreduce_halving_doubling Options: processes=2, inputs=8, gpudirect=no elements min (us) p50 (us) p99 (us) max (us) samples (before) 4194304 18584 24345 27809 29722 95 (after) 4194304 19541 22718 25408 26688 88 ``` Reviewed By: akyrola Differential Revision: D5278192 fbshipit-source-id: 53f09e404663ddc8bb46d06ac87afd8ee3ffc3a2

Summary: Closes pytorch/gloo#47 Differential Revision: D5283752 Pulled By: pietern fbshipit-source-id: 8ad3353b3455c5416e31e75b46755e2f7fcaad52

Summary: Adds basic CUDA 9 support, including adding Volta arch, and making appropriate modifications for half precision datatype changes Closes pytorch/gloo#49 Differential Revision: D5315336 Pulled By: pietern fbshipit-source-id: 6468b0f357206d604bdcfec69ba82509a2c91407

Summary: A simple benchmark to determine network bandwidth for pairwise communication. Reviewed By: plapukhov Differential Revision: D5159607 fbshipit-source-id: d16c3ed3a0c2ae182138df91bdae821f5508c6ac

Summary: Use the CreateCommonWorld timeout for the storehandler as well, not just the device connect. Reviewed By: andrewwdye Differential Revision: D5425923 fbshipit-source-id: 936d2129e2db3bfed8759ca097b75843d3931d5f

Summary: CodeMod: Prefer `ADD_FAILURE()` over `EXPECT_TRUE(false)`, et cetera. The tautologically-conditioned and tautologically-contradicted boolean expectations/assertions have better alternatives: unconditional passes and failures. Reviewed By: Orvid Differential Revision: D5432398 Tags: codemod, codemod-opensource fbshipit-source-id: d16b447e8696a6feaa94b41199f5052226ef6914

Summary: To reduce round trips with store handlers, it is better to store all addresses in one key instead of one address per pair. This is what this implements. Reviewed By: andrewwdye Differential Revision: D5435893 fbshipit-source-id: 2d3ea3a2822c3b934ff2578d44a262e7bfbde6d0

Summary: When compiled with -Werror=shadow-compatible-local, cannot reuse a variable name. This passed our tests, but some people use stronger settings to compile. Differential Revision: D5440805 fbshipit-source-id: a246af748717fb7e0e7a321e1ac4ddfef68ae524

…igned buffers Summary: When performing reductions on fp16 buffers, gloo assumed that both buffers were either aligned to 32 bytes or misaligned by the same offset. This may not hold in intermediate steps of halving-doubling allreduce, when the reduction is performed on some offset within the receive buffer. The fix is to use intrinsic instructions that work with unaligned pointers. Reviewed By: akyrola Differential Revision: D5450103 fbshipit-source-id: 9a1c8f8c34d2e62223f6d5c21573ea1cfad6537f

Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually. Reviewed By: igorsugak Differential Revision: D5454343 fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2

…cedecc' git-subtree-dir: torch/lib/gloo git-subtree-mainline: 4a4d884 git-subtree-split: 1978bba

…9a6052 Summary: Previous import was 707064980b9825b8705b9d1c9aad34d8b022d5dd Included changes: - **[28ca699b](onnx/onnx@28ca699b)**: Member Company logo guidelines (pytorch#2196) <Prasanth Pulavarthi> - **[47acb06a](onnx/onnx@47acb06a)**: remove link to outdated issue for contributions wanted (pytorch#2186) <Prasanth Pulavarthi> - **[168519f6](onnx/onnx@168519f6)**: Create sigs.md (pytorch#2103) <Prasanth Pulavarthi> - **[b9320746](onnx/onnx@b9320746)**: mintor format update (pytorch#2180) <Prasanth Pulavarthi> - **[65b8e0f9](onnx/onnx@65b8e0f9)**: add more types support for Equal op (pytorch#2176) <Ke Zhang> - **[dc5e62a9](onnx/onnx@dc5e62a9)**: Update AddNewOP document. (pytorch#2172) <Emad Barsoum> - **[bae8b530](onnx/onnx@bae8b530)**: Add missing space (pytorch#2150) <Takeshi Watanabe> - **[5952b7f5](onnx/onnx@5952b7f5)**: python api example typo fix (pytorch#2155) <LeicongLi> - **[904cb842](onnx/onnx@904cb842)**: Fix errors in RoiAlign shape inference code (pytorch#2167) <G. Ramalingam> Differential Revision: D16502373 fbshipit-source-id: 68b9479a30fc330d876947cb4ea8227848f576e3

pietern and others added 30 commits February 9, 2017 12:33

Fix std::accumulate

72fd605

Summary: Testing pull request again. Closes pytorch/gloo#2 Reviewed By: pietern Differential Revision: D4542327 Pulled By: Yangqing fbshipit-source-id: 5bd66c32c7249f1327225117815bef64b8708722

Split benchmark code into multiple files

b82c4b3

Summary: The CUDA benchmark suite will be a separate build target, so the runner should be reused. Reviewed By: Yangqing Differential Revision: D4545092 fbshipit-source-id: 6ccf2d30f5d35c74fc59851b25416bfe6863d62c

Fix race in benchmark tool

8821f4a

Summary: TSIA Reviewed By: plapukhov Differential Revision: D4549105 fbshipit-source-id: 61c8966e429e0701677f441aeaaf27fdc5e669e7

Benchmark for CUDA-aware algorithms

6aa8c93

Summary: Separate benchmark build target for CUDA-aware algorithms. This is needed to keep CUDA an optional dependency. Differential Revision: D4546932 fbshipit-source-id: b73176ae9067233f883d51ba3ab4efbb13a6f86f

AllreduceRingChunked/CudaAllReduceTest should use the chunked algorithm

0722775

Summary: I was mistakenly calling the non-chunked algorithm for the chunked test. Reviewed By: pietern Differential Revision: D4580160 fbshipit-source-id: 9d62a68e9e86cc6e596d90ff8854c585a0e8855c

README and docs skeleton

df68230

Summary: TSIA Differential Revision: D4591755 fbshipit-source-id: fa435f4ad6b97453c3c9516b4bfc9f8f0fb2e4f1

CMake fixes

478d744

Summary: Adds script to populate third-party directory. Differential Revision: D4591509 fbshipit-source-id: 28934feb536a9f3a066d8c40988337f3dddffaed

add mutex getter/setter to synchronize CUDA and NCCL ops

0e78a59

Summary: Allow gloo consumers to assign a mutex to synchronize CUDA malloc/free and NCCL operations. Reviewed By: pietern Differential Revision: D4622135 fbshipit-source-id: 60acd7c01a677a0df5415fe38e6ef5a2e7c8606a

Fix compile error

9c114e6

Summary: std::atomic was not defined for cuda.cu. Reviewed By: andrewwdye Differential Revision: D4624611 fbshipit-source-id: 973bba10026e065667d6a576055d00505ee02d62

Downcase setMutex

9f18f83

Summary: TSIA Reviewed By: andrewwdye Differential Revision: D4626965 fbshipit-source-id: 2d32b07182202f65e673795aefacc6cc991d3c7c

CUDA documentation

0c88194

Summary: CUDA documentation detailing high-level support for CUDA in gloo algorithms, usage of streams, and synchronizing memory management. Reviewed By: pietern Differential Revision: D4633120 fbshipit-source-id: d88e230c8dc82fe48cda0f401b61758fa4f07f2e

More documentation

70fc15c

Summary: TSIA Reviewed By: andrewwdye Differential Revision: D4644734 fbshipit-source-id: 50f5fadd2c5cd04e06a025f5538187ed852e669a

Remove underscores from public fields in NCCLContext

a2b2880

Summary: Remove underscores from public fields in NCCLContext Reviewed By: pietern Differential Revision: D4645857 fbshipit-source-id: 2c28a1c23d31097d685c0768dad9b99bbef7b171

Rename public member fields on gloo::Context

5fbcd88

Summary: The fields are public so their names should not end with an underscore. Reviewed By: andrewwdye Differential Revision: D4645038 fbshipit-source-id: c12b47affbe511383a4722717a06abb61918473b

Document algorithm semantics

7e3b572

Summary: TSIA Reviewed By: andrewwdye Differential Revision: D4647587 fbshipit-source-id: a804e7479e6e2f511bfa59712b4b4a88bdf657e3

pietern and others added 26 commits May 31, 2017 19:50

Expand ibverbs read timeout messages

7b5af7d

Summary: TSIA Reviewed By: romain-intel Differential Revision: D5158642 fbshipit-source-id: 6e55a69a140c1f5f6e4ce6262afaf5014c412414

Fix header inclusion in math.h

e50d599

Summary: While debugging #43 I found common/common.h missing some headers as well. Fixes #43. Closes pytorch/gloo#44 Differential Revision: D5194970 Pulled By: pietern fbshipit-source-id: 4861cd04c56931d4759f5bc050816788252003ee

Fix NCCL directory typo

5300aaf

Fix use of nccl_INCLUDE_DIRS in nccl.cmake

21a5c8e

Merge pull request #45 from slayton58/nccl_cmake_fix

fab5bef

Fix NCCL directory typo

change function to remove dependence on CUDA 8.0

59d438d

Summary: Replace call to function that is only supported in CUDA 8.0 with one that has been supported in previous releases. Reviewed By: pietern Differential Revision: D5231755 fbshipit-source-id: d72aec2a4a1c511064a65142887f8a05b51dad55

Add basic API support for NCCL 2.0

49586d9

Summary: \cc pietern Minimal changes to allow gloo to compile and run with NCCL 2.0 Closes pytorch/gloo#46 Differential Revision: D5268074 Pulled By: pietern fbshipit-source-id: 58d625d57b31cfc932f3dbbdd7a4b83d9a2e60a8

Missing includes in cuda_collective_device.h

6d97ac0

Summary: Closes pytorch/gloo#47 Differential Revision: D5283752 Pulled By: pietern fbshipit-source-id: 8ad3353b3455c5416e31e75b46755e2f7fcaad52

CUDA 9

194bc40

Summary: Adds basic CUDA 9 support, including adding Volta arch, and making appropriate modifications for half precision datatype changes Closes pytorch/gloo#49 Differential Revision: D5315336 Pulled By: pietern fbshipit-source-id: 6468b0f357206d604bdcfec69ba82509a2c91407

Pairwise-exchange benchmark with bandwidth measurement

9cba97a

Summary: A simple benchmark to determine network bandwidth for pairwise communication. Reviewed By: plapukhov Differential Revision: D5159607 fbshipit-source-id: d16c3ed3a0c2ae182138df91bdae821f5508c6ac

CreateCommonWorld: pass timeout for storehandler

1c0135b

Summary: Use the CreateCommonWorld timeout for the storehandler as well, not just the device connect. Reviewed By: andrewwdye Differential Revision: D5425923 fbshipit-source-id: 936d2129e2db3bfed8759ca097b75843d3931d5f

fix shadowed variable name

e13704c

Summary: When compiled with -Werror=shadow-compatible-local, cannot reuse a variable name. This passed our tests, but some people use stronger settings to compile. Differential Revision: D5440805 fbshipit-source-id: a246af748717fb7e0e7a321e1ac4ddfef68ae524

Add 'torch/lib/gloo/' from commit '1978bba3e421eceab6181bcbc838553091…

56da2b0

…cedecc' git-subtree-dir: torch/lib/gloo git-subtree-mainline: 4a4d884 git-subtree-split: 1978bba

build gloo only under linux

6a989ef

gloo cmake fixes

2eb9405

more cmake fixes

14c425a

fix for old glibc (CentOS6)

b273bc2

soumith closed this Jul 25, 2017

soumith deleted the gloocontbuild branch July 25, 2017 02:04

houseroad mentioned this pull request Jul 25, 2019

Automatic update of fbcode/onnx to 28ca699b69b5a31892619defca2391044a9a6052 #23404

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding Gloo as a subtree #2196

Adding Gloo as a subtree #2196

Uh oh!

soumith commented Jul 24, 2017

Uh oh!

Uh oh!

Adding Gloo as a subtree #2196

Adding Gloo as a subtree #2196

Uh oh!

Conversation

soumith commented Jul 24, 2017

Uh oh!

Uh oh!