In addition to posting my more developed ideas in the form of (highly informal) research proposals, I’m working on documenting my completed projects as though they were (heavily abbreviated) scholarly papers. I find that it helps me keep focus. And it doesn’t hurt that I dream of returning to academia at some point.
So I’ll start with this project, my Color Compass.
Chances are I’ll update this post with more detail. But I’d also like to prepare a build log or detailed instructions so that you can make your own.
Many animals and even plants exhibit magnetocetption, an ability to sense magnetic fields, though it has been reasonably well established experimentally that humans don’t. It is believed that, in at least some cases, magnetoception manifests within the organism’s visual field. If magnetoception is in fact visual in nature, then it should be easy enough to augment the human visual sense with the capability of perceiving magnetism as well. Without building a whole mixed reality rig to overlay a grid or provide a heads-up display on the visual field Terminator style, I propose to provide that sense by simply presenting the user a color that corresponds to compass heading.
The KidRobot Munny is a do-it-yourself vinyl figurine for making one of a kind art toys. Most of the time, people color them with markers or paint. Sometimes they augment the figures with polymer clay, building ears, tails, and weapons. But most of these customizations have been along the same lines—making a figurine that sits on a shelf—impressive though they may be.
But, as an engineer, I thought “how can I make these interactive?”
Meanwhile, I read that some scientists believe that pigeons may see magnetic fields, possibly overlaid in their visual field. One theory is that a light-sensitive protein, called cryptochrome, found in the retinas of many animals is actually responsive to magnetic fields. Interestingly, it takes extremely strong magnetic fields to affect change in a “normal” chemical reaction (fields much stronger than that of the Earth). However, there is the case of something called a radical pair wherein the fates of two electrons are entangled (yes, we’re talking about quantum mechanics here), resulting in magnetic field-dependent outcomes of reactions with these radial pairs.
That lead to some daydreaming on my part about how that would work for a human. In the case of birds, though no one has been able to see it for themselves yet, we have an idea of how the magnetic field might look.
Lastly, at more or less the same time, it occurred to me that hue in the HSL (hue, saturation, and level) color model, represented as an angle, is analogous to compass heading which is also represented as an angle. Why not display heading as a color on the color wheel?
On a more conceptual level, I’ve been interested for some time in using our existing senses at the edge of conscious awareness such that they provide a new form sensory perception. In other words, though the device described in this paper uses one’s sense of sight, the user really gets an extra sense. The important part is that when we use an existing sense but become unaware of it (that is, at the edge of conscious awareness or beyond), we basically earn a new sense. In a way, this is not really a new concept—we’ve grown used to watches providing us with a more precise sense of time than we have biologically. As technology makes seamless human-machine interface easier and less expensive, we can expect these kinds of extrasensory perception to become more like senses and less like carrying a gadget around and looking at it from time to time.
Those of us that use computers for drawing likely know that colors can be represented in quite a few ways from a mix of red, green, and blue to an angle around the color wheel. The latter, often referred to as HSV (hue, saturation, and value) or HSL (hue, saturation, and luminance or level) represents colors as polar coordinates in a three-dimensional space. But if you just look at it in one plane, from above, you see the same old color wheel from elementary school art class, with red on top, or zero degrees; green on the lower right, or 120 degrees; blue on the lower left, or 240 degrees; and blending back to red again on top at 360 degrees. (For the pickier, more technically minded of you out there, you might know that the artists’ color wheel and the HSV color wheel are different, but for the purposes of this project those differences are small enough as to be irrelevant.)
Now, if you’ve ever watched a movie where pilots or ship captains bark out headings, you will have noted that they issue commands in degrees around the compass rose, where zero degrees is north, ninety degrees is east, 180 degrees is south, and so on.
Why not, then, make a color compass? If you’ve been maintaining mental images of the last two paragraphs, and can lay them on top of each other, you’ll see that red is north, yellow-green is east, aqua is south, indigo is west, easing through violet as we get back to north.
Note that in mathematics and engineering, angles are usually represented as increasing in a counterclockwise direction from the 3:00 position, while compass headings increase clockwise from north. In any case, if you line them up and flip them around, this is what you get.
From that point it was almost trivial getting the compass heading and converting it to an RGB combination to send to the LED. You can check out the code at the project’s GitHub repository and follow along with the explanation.
I have not provided this device to anyone other than myself to use for an extended period of time. So I do not have a lot of data on how people end up using it. That said, this is only a proof-of-concept; hence the cute appearance. An experimental setup would place the “guts” of a similar device in a wearable of some type, with the indication just visible within the user’s field of view.
I’ve found that simply varying red, green, and blue proportionally around the compass rose does not result in a perceptually even range of colors. It seems that we see some colors as being further apart from or closer to each other than they would appear to be if viewed as simply a recipe of red, green, and blue proportions.
There is also the critical concept of gamma, which relates the perceived brightness of a light source to the actual quantity of light emitted (or more accurately landing on your retina). This is not yet implemented in the Color Compass. Because I don’t have a nice mathematics renderer installed on this blog, here is how the relationship might look in a C-like language:
brightness = pow(luminance, (1/gamma));
A typical value for gamma is two, which is handy for quick calculations because pow(luminance, 1/2) is sqrt(luminance).
“Quick” is a relative term, of course, and your microcontroller might not be able to calculate powers and roots in real time, in which case you would simply use a lookup table. (Lookup tables are almost always the right answer for low power embedded applications.) I should also note that working with gamma is a little different when we’re talking about full RGB instead of just the brightness of a single light source. Again, a lookup table is ultimately the way to go in most cases, and can be fine-tuned on a computer and copied and pasted into an embedded program quickly.
Another option might be to slide the hue of the user’s entire field-of-view in the direction of the color associated with the heading; to provide a color cast over the image. We know that people can handle this sort of thing without losing their ability to see what color something actually is. For example, we deal with color casts from incident light while maintaining a pretty good idea what color things actually are (unless, presumably, it’s a black and blue—not white and gold—striped dress). The user could even turn her head a little (presumably instinctively) to see how much of the color seen is due to the direction, using that to instinctively average out the color cast.
Using color to convey heading is not really useful in terms of communicating orientation precisely. But when used in conjunction with all the other senses, at a minimum it can make our sense of direction that much better.
So, to bring it back to the little fella in the pictures, the cute vinyl toy is not exactly at the edge of conscious awareness and hence doesn’t really work to augment the senses as I’ve described above. It’s really just a proof of concept (and why not throw in a bit of whimsy to make it interesting). But, imagine if the RGB LED could sit just outside your field of vision, casting a faint glow. If tuned right, it’s not hard to see that glow falling below your threshold of conscious awareness while still providing you with valuable information. That is the future of human-machine interface.
Welp, I suppose I should be a little more proud of what I do on the nine-to-five in addition to the six-to-eleven. So check out this video:
No, I’m not a drone pilot. I designed the electrical and control system for that bridge. It’s a pretty typical project, of which we do several a year. (Well, this particular one is not entirely typical because most of our bridges are bascules—seesaw types—instead of swings as you see here. But close enough.)
I’ve always been interested in new technology that could have existed years ago. For example, the household TV could have been used since at least the 1980s to display family pictures. Imagine a service that takes your photos, scans them, and sends you back a VHS tape to play over and over again.
I don’t recall ever seeing such a service, though I was too young to have been in the market for it anyway. But that idea really makes the most sense in the context of the modern day, where we can buy digital picture frames on impulse at the drug store.
And in that context the idea of using an analog TV as a picture frame is ironic. Or amusing. Humor is a big factor in many of my projects. So fast forward to now and, partly as a proof of concept and partly just to amuse myself, I built this:
Ahh, gotta love how it warms up like that before you see the snow. And snow! Analog noise seems so nice (visually and aurally) when compared to today’s digital glitches. I wonder if in a few years we will look back longingly at digital glitches.
Anyway, I made this particular implementation around a readily available single board computer about the size of a credit card, the Raspberry Pi, and a few other odds and ends to tie it all together. The system wirelessly downloads new photos from an album on my (or any) Flickr account and displays them on the screen in a random order.
You might have noticed that pictures of the actual device are conspicuously absent. I threw the thing together pretty quickly, but I really need to document the build and I hope to update this post (or make another) soon. In any case, the business end resides in a bundle about the size of two packs of playing cards, along with another bundle of wall-wart power supplies, strapped together along the power cable. In a future implementation, such as for a gallery setting, I‘ll place the entire assemblage directly inside the TV. I expect to have to remove the individual components from their enclosures and wire them directly together, wrapping them in insulating tape or potting them in epoxy to let them sit safely among the high voltage components in the TV.
A friend of mine is about to open a candy store and I offered to take some pictures for marketing and whatnot. So to practice and to give him an idea of what I have in mind, I took a few pictures and made this:
I plan to make some adjustments in a final version, including redoing my lighting scheme to make sharper (but smaller) shadows. I’ll obviously use a tripod for the “real” version as well, to get a little more sharpness overall. I’ll also be more precise with the hyperfocal distance so I can get exactly what I want in focus. Switching to a more telephoto lens will also likely help if the proportions work out otherwise. In a pinch I might have to do some focus stacking, but I’m not a fan of too much post-processing so we’ll see about that.
By the way, I used Bamboo Paper on my tablet for the lettering. Bamboo Paper is by far my favorite drawing app, at least on the Surface with the Surface Pen, because of the feel. Everyone should use Bamboo Paper’s technology for inking in their apps.
This post has been a good while in the making. For maybe a year or so, I’ve been mulling an idea for a novel application of augmented reality systems. (I’ve actually been mulling several ideas, but this one is perhaps the most exciting to me at the moment.) After a good bit of research, it’s pretty clear to me that the programming required is beyond my ability to perform on evenings and weekends—those of which are free of other obligations, no less. So, I’m going to present my research to-date here, summarized loosely in the form of a (very informal) research proposal.
I’m kind of excited now to develop some ideas further and to make more exhibits for people to better see what I’m talking about. But for now this will do.
And over the next few weeks and months I’ll free other ideas from the confines of my notebooks in similar form.
Synesthesia, the crossing in perception of one sense with another (think scents having color or sounds manifesting with a tangible texture), is a trait shared by perhaps four to five percent of people. In popular literature, synesthesia is often anecdotally considered beneficial to creativity, priming the mind with hard-wired metaphor. If synesthesia is in fact beneficial to creativity, an augmented- or mixed-reality system designed to mix senses could provide non-synesthetes with nearly instinctive capabilities for metaphor and all the creative benefits that arise therein. In this proposal I focus on the design of a system for simulating grapheme-color synesthesia. However, there are many other forms of synesthesia equally amenable, or perhaps more amenable, to implementation through an augmented reality system. In the most general terms, what I propose is simply a novel extension of human senses, as machines have been providing us for many years.
Augmented reality (AR) and now mixed reality (MR) are the new cool thing, as those of us in the tech world have no doubt noticed. No one can disagree that this technology will be useful for all the obvious things like heads-up displays of contextually useful information. But to me that has always seemed boring, for lack of a better term. Perhaps it’s because it’s obvious; Of course you would display relevant facts next to what you are looking at through your MR headset. That’s what they’ve been doing in every cyborg movie for who knows how long. For that matter, Thad Starner pioneered this in real life with the predecessor to Google Glass during his time at the MIT Media Lab back in 1993.
To me the real neat stuff is when you come up with an entirely new use case. I think I have thought of one.
Synesthesia is a neurological phenomenon wherein the stimulation of one sense (or more generally one cognitive pathway) stimulates another. Imagine a situation where you see the letter ‘S’ and it has a iridescent, swirly blue tint perceived simultaneously with whatever color the letter was printed in. In other cases, synesthetes (that is the name for people who exhibit synesthesia) may hear a pitch or timbre in music and feel a very real sense of its texture; perhaps prickly, or soft like blades of grass under one’s hand.
The locations of the brain corresponding to the perception of the senses involved in a particular case of synesthesia are typically physically adjacent. Researchers therefore theorize that synesthesia may be as simple as brain activity spilling out of one sensory region and into adjacent regions.
There is a seemingly high concentration of synesthetes among the world’s most creative people. I’ve heard synesthesia described as an instinctive form of metaphor. Perhaps what a “normal” person thinks is an incredibly imaginative metaphor is simply the way the writer sees the world on a neurological level. Many believe that synesthesia is also common in children, though they do not often have the facilities to communicate it or the mentorship to nurture it. If synesthesia is in fact common in children, it may, at least in part, explain the nature of their highly active imaginations. In any case, examples of synesthesia exist in our daily language, even among non-synesthetes, such as the timbre of a musical instrument being dark or bright, or even things like sharp cheddar cheese.
While reading about Vladimir Nabokov’s particular case of grapheme-color synesthesia—the condition in which one sees letters with color—it seemed to me like it should be pretty straightforward to build an augmented reality app to simulate it for those of us not fortunate enough to have been endowed with such a gift: Run the video stream of a cell phone camera through an OCR (optical character recognition) program to find text within the field of view, outline each letter, and fill those outlines with the color called for on a lookup table.
It turns out that implementing this is not as straightforward as I thought. In basic AR applications the developer trains a computer vision (CV) library such as OpenCV with an image called a “fiducial marker.” OpenCV then looks for this image in the video stream and can overlay something on top. This fiducial marker often has a geometric pattern such that its orientation in space is easily determined, allowing the overlay to make sense in three-dimensions. There are several toys and video games that make use of this, with three-dimensional characters rendered on a playing field.
I had hoped to more or less replace the fiducial marker used in AR applications with the output of an OCR library; and I suspect that might still be possible. But, as far as I can tell at this point, the gold standard open source OCR library, Tesseract, doesn’t work like that. Rather, it just outputs the characters corresponding to the text it finds in the image.
Update: I’ve come across the Google Cloud Vision API which might be helpful with its TEXT_DETECTION functionality, though perhaps slow being cloud-based.
Other ideas included training the CV library with hundreds or thousands of fiducial markers; one for every letter in every common typeface. However, that will likely be very computationally intensive. And it seems outright inelegant when we know that OCR software exists.
In terms of hardware, as noted above, in my original concept the device would simply be a cell phone running an AR app developed in Unity or something similar, but with an OCR library providing the fiducial marker in real time instead of OpenCV finding it in the video stream. For a more immersive experience, MR hardware like Microsoft’s HoloLens or the Magic Leap device are particularly exciting. The fact that grapheme-color synesthetes see both the visual color of the text and the synesthetic color of the text is perfect for the MR devices. On those systems the user sees the projected images as slightly translucent, with the real world still there in the background. (I should note that “projected image” is probably not the correct term to use for these light-field based systems.)
How the System Works
On the input end, a computer receives a video stream of the scene. This is processed as a series of images and, in the case of a virtual reality (VR) or cell phone based system, passed through to the output screen.
That video stream is also passed to the OCR subsystem, which analyzes each frame of video (or every nth frame of video as needed to accomodate the algorithm and the hardware) for text characters anywhere in the field of view. When the OCR subsystem locates letters, it sends closed vector outlines of those letters, along with information as to which letter is represented, to something which will overlay the video. For the most authentic experience, the OCR subsystem should process frames at a rate comparable to that at which a person can recognize a letter in her visual field.
The video overlay subsystem takes the vector outlines of the letters and fills them with the appropriate color based on the attached letter information compared against the lookup table. Speaking of lookup tables, here is one I compiled from Nabokov’s description of his grapheme-color synesthesia in his autobiography:
I take this to mean the æ sound like “at” or “that.” Nabokov says this is in the black group.
The French a, which I assume is the phonetic a.
In the yellow group.
Specifically a hard g.
rich rubbery tone
This is the soft g, and is in the brown group.
Also in the brown group. I guess it’s a dirty shoelace.
rich rubbery tone
Similar to the soft g, but paler.
a fold of pink flannel
ivory-backed hand mirror
the brimming tension-surface of alcohol in a small glass
The French on, which I assume is nasalized.
browner than k
This must be blue, still, but somehow “browner.”
sooty rag being ripped
This will be interesting to convey simply as a color.
a curious mixture of azure and mother-of-pearl
This is partly to show that shape matters in color in addition to sound.
brassy with an olive sheen
Also in the yellow group.
dull green, combined somehow with violet
* I am working on selecting RGB values and may ultimately select a different method to store the color. This is particularly important for the colors that Nabokov describes with motion or texture, such as those for ‘s’ and ‘a.’
Finally, the overlay is mixed back in to the video stream and displayed on the screen. Even though we are talking about video of the “real world,” you can see from the signal flow described above that this is all still a two-dimensional system (at least when implemented as a simple cell phone based AR app).
In the case of the MR systems, the signal flow must account for the third dimension so that the letter’s color overlay shows up on top of the letter in space and not just as paint on a screen. This is obviously very important for MR systems and is implemented in the Microsoft HoloLens with something they call “environment understanding cameras” providing spatial mapping. In this case I can use the spatial mapping to tell my program where in three dimensions a piece of text is, and more importantly the orientation of the plane that text is written on. The color fills can be drawn in the same plane.
Software Implementation and Problems
Given the problems discussed in the Background section, based on my current research it appears that I will have to explore the innards of the Tesseract OCR library to extract the needed outline information. It is my understanding that Tesseract finds outlines—or uses outlines—to make a judgment as to what text is represented. But it is not clear that that process is made transparent enough to reach in and play with intermediate steps.
I suppose another problem will be determining if something is in fact text. Otherwise, my program might just end up demanding that Tesseract make a guess as to what text an outline represents, whether it actually is text or not. That will lead to all sorts of funny results with random color areas all over the view. That is not necessarily a bad thing, and could provide the user with something like artifical imagination as a row of light poles starts to look like a string of capital ‘I’s or lowercase ‘L’s. Hallucination is only a step further, as some research suggests that the mind trying too hard to make sense of senses results in spurious perceptions. In fact, I believe this idea warrants further study as a separate concept.
But if we want to stick with the clean artificial synesthesia implementation, we need to remove the noise. If Tesseract provides a certainty of its output (another pending research question), we can use that to determine whether we should in fact apply a color overlay at a particular location. If not, we will again have to reach in to the library and see if we can find some variable somewhere that we can use to make some sort of judgment on the quality of the character recognition.
Aside from the character recognition problems, the 3D programming aspect of the system, though difficult, seems at this point to be “straightforward” in that what we want to do, essentially just drawing planes in three dimensions, is not new or unusual.
Taking a look at the grapheme-color table a few paragraphs above, it looks like I might need to take a little artistic license in the implementation. Nabokov certainly took some artistic license, piling synesthetic descriptions on top of synesthesia (see on in particular).
Also note that many of these letters rely on the sound, or the phoneme, and not just the shape of the letter (the grapheme). (I believe they should call this phoneme-color synesthesia instead of grapheme-color, but that’s neither here nor there.) So to really get this right, the OCR algorithm would also need to look at the word, extract the sounds, and map them back to the letters. But for a proof-of-concept we don’t need to do that right now.
In future implementations, the color overlay step of image processing could include some more “life,” perhaps animating a swirly, colored cloud within the outlines to really get the user to see things vividly like the original synesthete.
Speaking of “original synesthete,” imagine being able to load up some particular famous synesthete’s “program” in your MR headset and see the world through their eyes. When you’re done with Nabokov, just load up the program for Nikola Tesla, or Wassily Kandinsky, or whoever (to the extent that we have enough infomration to recreate their synesthesia). One could even license her synesthesia description, providing artists a form of income.
Of course grapheme-color synesthesia is not the only type. Chromesthesia, when sounds trigger a color sensation, could be relatively easily implemented in an MR headset. In fact, I probably should have started with that one since many chromesthetes (I just made that word up, I think) report that the visual phenomena show up as though projected on a screen. Notably, many, or perhaps all, MR systems have a sense of the user’s gaze, so they can tone down or relocate the visual manifestations away from where the user is looking. This ensures that the chromesthesia does not interfere with normal visual function.
Some other interesting synesthesiae (I think I made that word up, too) to explore include auditory-tactile and visual-tactile synesthesia, where the user can have a tactile generator, such as a fabric with embedded vibrating motors or something like a braille display with pins of various materials, activated by sound or visual cues extracted by a CV algorithm. Smell-color synesthesia could have some very practical applications if amplified to give users a heightened sense of olfaction, or chemical (explosives, poison) detection, while still allowing the user to be present for other experiences. There are of course many more types, and given that this is all done in software, we can easily invent yet more.
Lastly, as mentioned above in the Problems section, glitches in the practical implementation of these artificial synesthesia systems can actually be used as something like articifially augmented imagination. Recall the case of the OCR system mistaking light poles for letters ‘I’ and ‘l,’ wheels for ‘o’s, etc. and coloring them all in accordingly. That sounds to me like a very interesting psychedelic world and I’d love to experience it. From there it is not a stretch to imagine purposefully coding an agorithm for a sort of generative reality bending system, drug-free. Leave it to Adult Swim to be on the cutting edge.