Thoughts on Deedy Das's Photonics Post on LinkedIn
Fri 29 November 2024 by Andrew Athanno part of this is AI generated
Please Support My Work
I would appreciate a like, follow, or connection request on my LinkedIn. Also, please consider leaving a thumbs up and/or a comment at the link to this post: LinkedIn Post.
Thoughts on Deedy Das's Photonics Post on LinkedIn
@DeedyDas on LinkedIn makes an interesting post about photonics, and I think the topic deserves some clarification for the uninitiated. I write for a general audience below, who may not be familiar with things like O() notation or analog vs digital computing; and in order to address some potential misunderstandings that could arise from that viral video that was linked.
First, it is very important to distinguish between:
- the speedups (and power/entropy/etc gains) that come from the use of light (photons) vs electricity (electrons) to propagate information either at distance (WAN) or on-die (e.g., for chip interconnects) in the digital domain, and
- the speedups that come from adopting a completely different computational model.
The O(1) part of his post is about the latter, while most of the rest of what’s referenced in the post is about the former. In the comments, @DeedyDas does link to a technology related to changing the computational model (see Neuromorphic Silicon Photonics).
In the digital domain, photonics have been used for decades now. Fiber optic telecommunications, which is what the Intel 400G silicon photonics transceiver is about, are about transmitting digital signals. Here, there is direct relevance to both the speed at which light travels over long distances, but also the speed at which transitions in the physical properties of light (phase, spin, etc) are possible. These are generally much faster and require much less energy than do electrical (i.e., electronic, electron) based approaches and as a result of that speed, we get a concomitant increase in the available bandwidth of telecommunications equipment based on photonics relative to electronics. When used for computation, similar improvements in speed and power may be achieved. Faster, cooler, smaller compute. Nice.
Being able to move 1s and 0s around faster means we can multiply digital numbers faster, and send them around digital architectures faster, and as a result, we can do more compute in less time (faster). However, that has no bearing on the O() factor of the computation. A computation that is O(n^2) will still be O(n^2) on a digital photonic CPU that implements photonic binary logic gates. I.e., the photonics make the thing faster by some multiplicative constant factor, but do not change the “shape” of the complexity curve. That shape is what O() notation is all about.[CONTINUATION 1] Say you are running inference on a 400B parameter model; if you now want to run an 800B model, the problem will still grow by the n^2 (or whatever function is inside the O() notation). The photonics will simply make whatever that n^2 number of operations is complete in less time. This is similar to putting a bigger engine in a car. The wheels turn faster, but you still have to solve the problem the same way. Getting from point A to point B via the same route, taking the same turns, and paying the same tolls.
Separately, we have technologies which involve changing the way we do the computation itself. Instead of representing neurons digitally, or computing attention (as in a transformer) digitally, we could make those computations by measuring the physical properties of some physical thing after pushing it around in a way that causes it to mimic the computation we need. For example, if we attach a 4 prong gear to an 8 prong gear, and we turn the 8 prong gear once around, we will have computed 2x1 by measuring how many times the 4 prong gear turned. This is an analog computer. In the 50s, people used slide rules to do multiplication and we had analog computers utilized in defense and other verticals. The O(1) speedups about which @DeedyDas writes are in this analog domain. Those photonics are a different animal than the Intel 400G photonics he also references. Therefore, I am taking the time to write this article in order to tease these concepts apart from each other.
Yes, analog computing using photons is feasible at the small physical scales that are necessary in order to fit enough “gears” (as in my example above) onto a chip practicably; and some of the techniques for doing so are shared by the digital photonics (and photonic digital to analog transceivers in telecoms). Note that we could perform analog computations the same way using electronics. Analog computing is not unique to photonics. In fact, there are plenty of example of analog electronic (vs photonic) computers. A ubiquitous example still in use today is the analog audio synthesizer. Moog (and other analog) synths are still used in music production to “compute” waveforms that are pleasant to our ears.
The O(1) speedups that would come from commercialized large-volume but micro or nano-scale photonic analog computers would come not (solely or primarily) from the speed of light propagation or speed of digital state changes that it enables, but rather from the idea of building bespoke photonic circuits that, in one step, “output” the result of a computation that would otherwise have required billions or trillions of digital computations. One step means O(1) vs O(f(N)). In such an analog computer, we put input light at level X and we get output/answer light at level Y in the time it takes the light to propagate through the circuit and be measured (“immediately”).
To do matrix multiplication (MatMul), which is the basis of most current neural network (NN), we would build a bespoke photonic circuit that yields the MatMul result in one step (see link above). Again, this could indeed be done in electronics vs photonics, but not practicably; at least not at the nano/micro scales necessary to represent a large enough MatMul in a small enough package, with anywhere near competitive performance. The linked paper says a lot about this issue.
Anyway, it’s not clear to me that photonic approaches to AI speedups would be via implementing MatMul except as a way of representing arbitrary NNs on a general purpose photonic die vs freezing a specific (single purpose) NN onto a die. This would be analogous to how an FPGA is a general purpose electronic substrate onto which you can express arbitrary electronic computations by changing the 0s and 1s of its matrix.
So again, the O(1) speedup discussed by @DeedyDas has AFAIK almost nothing to do with most of the rest of the photonics in his post, other than the industry’s overall direction towards using light vs electrons. However, I make this statement somewhat flippantly—maybe there is some deep connection in the physics and fabrication.
I’m only posting here to discuss and reach a correct understanding. His post is inspirational and indeed points to many coming advances in the field. Therefore, at this point in my writeup is where I also want to say something about the viral video: The video is both visually striking and a great metaphor, but it is also not likely to generate quite the right understanding about analog photonics. The “scanning” in the video seems to be aimed at creating some notion of computation or transmission occurring due to the scan itself. I think that’d be the wrong analogy or metaphor.
As the light hits a given crystal during the scan, it scatters across other crystals, and we get some pattern of light across the substrate. Of course the pattern changes as the scan proceeds. This illustrates that if the crystals were an analog computer, the outputs would change based on differing inputs. If we were to pause the video at the instant the flashlight gets pointed at a particular crystal, this would be an aspirational representation of an analog photonic computer doing a single computation “instantly” (in O(1)). The “input” is the light going into the crystal, the algorithm of the computation is embodied in shape of the crystals and their position relative to each other, as well as the composition of each crystal and how they scatter light. The computation itself (as in, the answer, not the act of computing the answer), i.e. the “output,” would presumably be read off by measuring the properties of the light at various points in the scattered light pattern.[FINAL CONTINUATION] In other words, what is potentially misleading about the video to non-technical folks, is the scanning of the light across those crystals. Pointing the light source differently might represent changing the inputs to the system, but it should not be interpreted as anything more than that (such as an operational aspect of a putative photonic circuit).
Let me know your thoughts. I’m eager to learn about photonics in both digital and analog computing domains, particularly where the technology may lead to significant power savings and speedups.
How To Reach Me
Please feel free to reach me in the comments of this LinkedIn post: LinkedIn post
Your comments are welcome!
© 2024 Andrew Athan