PTom Logo

Connecting Dots

I love the broad pattern-recognitions of the ol’ gray matter – the simple connections made between thises and thats, spreading inexorably from sites of stimulation and parellel processing, intersecting with ever-larger patterns to create surprising and enriching tangents and leading to great “Ah-HA!” moments.

I had such a moment, of a remarkable nature (which is why I’m remarking on it I suppose) last week while I was reading through a Matter article on “The Charisma Coach” (I highly recommend checking out the magazine in general – the long-form articles have not yet failed to be intriguing, and I’m happy to have been in the kickstarting crowd).  Anyway, the topic of learning charisma and people skills is interesting in its own right, and follows a significant trend in moving the concept of successful execution from the extraordinary skills of an individual to the attitudes of connected groups (“Team Geek“, “How NASA Builds Teams“, “Delivering Happiness“, “Tribal Leadership“, etc.).  That’s not particularly new – this is a trend that’s been evolving for a while, and on which I’ve been keeping a keen eye even when struggling with it myself.  I readily admit that I’m not the most social person, and have primarily attributed/excused these tendencies by way of being deeply introverted (“people time” exhausts me) and dealing with chronic pain (which lent me an attitude that life is too short to put up with some kinds of crap, such as disingenuousness or celebrated mediocracy, especially when there are under-appreciated park benches out there what need sitting on) – which are weak, selfish excuses that short-change others, but I’m working on it.

Anyway, I got to the part about the puppy: a cognitive shortcut recommended by the profiled coach to switch expression and communication modes to something warmer and friendlier by someone otherwise feeing impatient and condescending – in this case, by lending him a puppy. This as a means of engaging different patterns of relating and fulfillment, and making it easier to incorporate the target frame of mind into the desired context, modifying the outward behavior as a result.

That’s not new either – this is a form of Cognitive Behavioral Therapy (CBT), which frequently uses functional exercises as a means of shifting context to emphasize preferred behaviors and perspectives.  But something about it clicked this time, intersecting with tangents from other reading I’ve done over the last couple of years (“Willpower“, “Thinking Fast & Slow” – this latter one being my favorite of all the cited works thus far).  These books introduce the concept of cognitive energy – and not in a metaphysical woo-crap kind of way, but in terms of literal metabolic respiration, fatigue, and refractory periods.  “Willpower” especially, based on its (sound, well executed) studies determined that there is no great reservoir of human resolve that can be deepened through exercise, strengthened through exposure, or is particularly inherent to character.  This finding runs contrary to commonly inferred attributes of the American Success narrative, wherein one can overcome all obstacles through herculean application of self, and that character is the greatest personal attribute (or collection of attributes).  It’s a nice idea because it underscores the idea of being in control of one’s own destiny, but it’s a fiction (which is not to say that we don’t have control [of a sort – luck still probably plays the biggest role], but that the means by which we do so differ).

Instead, those “characters” of seemingly endless resolve have simply created sets of cognitive/behavioral short-cuts that fire automatically: rather than engaging with a situation directly, reasoning, rationalizing, and struggling through it, it’s delegated to a pre-established set of mental patterns and tools which can do so with little oversight (a subconscious, or “System 1” behavior in the terms of “Thinking Fast & Slow”).  They retain the precious and scarce resource of cognitive energy for other things – ideally, for creating new automatic behaviors in a virtuous cycle of reinforcement (note however that this is where bad habits come from too – caveat empty[sic] and all that).

Right, so, the epiphany?  I’ve seen lots of CBT-ish approaches to situations: visualization exercises, statements of affirmation, personal ritual, etc.  Often used by well-meaning individuals, probably more often used by sales-dudes and self-help gurus, and most irritatingly used by manipulative bosses and woo-peddlers.  In most cases that I’ve encountered them I felt immediately uncomfortable: it was obvious that the exercise in question was meant to manipulate context, and more often than not was meant to justify (or remove perceived excuses from engaging in) activities or behaviors which could not possibly be sustained for very long.  Not only living like there were no tomorrow, or whipping up crowd energy and excitement, but creating excuses for pushing too hard – I had a boss once who likened our situation of prepping for a marketing summit as reacting to an incoming ICBM: and in that circumstance, everything other than averting disaster was a secondary and expendable consideration (family, sleep, health, etc.).  Only, of course, there was no missile, and conflating the completion of brochures and organization of media materials with horribly explosive death doesn’t exactly square up (or if they do, I would argue that your priorities are probably off the mark as far as functional members of society go).  Unsurprisingly I left shortly thereafter – I wasn’t a fan of all the impending apocalypses or what they (unsustainably) required of me (including the marginalization of family – which is just not allowed).

The puppy exercise cued something for me – the concept of the context shift toward desired behaviors that were already established in a low energy mode, thereby preserving rather than extracting energy.  The realization was probably due to this being the exercise and perspective of a fellow introvert, rather than of a crowd-pleasing extrovert, and with the shift-mechanism being a recognizable pattern for me; extroverts I’ve seen were more likely to introduce shortcuts that worked for them but which were foreign to me, and therefore cognitively expensive and unrealistic.  It sounds straightforward, especially written out, but this created a mental pivot that opened up (and/or connected) several new lines of thinking that are making accessible previously unavailable considerations.

As a similar tangent this made me think of the high frequency of different mental routes from one place to another.  Or perhaps I should say “nervous” routes, since neural networks (physical ones) act more like consolidating grids than strict tree hierarchies, and there’s usually several different ways for A to get to B, whether it be in sensory networks or all the way up to the neocortex.  Specifically I was reminded of the experience of Scott Adams, the Dilbert Cartoonist, in his struggle with Spasmodic Dysphonia, and how engaging different modes of thought would periodically restore his ability to speak: by finding different routes from speech center to basal ganglia which were not compromised by the neural dysfunction at that moment (the higher-order rationalization of the re-routing into “rule” sets is fascinating too).

What kinds of pockets of low-cost, context-appropriate functionality are out there to be tapped into?  This is largely the strategy of hypnosis and neurolinguistics, so you’d think I would have figured this out earlier, but I always thought about it more in the abstract terms of establishing desired patterns rather than finding and interlinking existing ones (though I can see some of this in the underlying Ericksonian methodologies).  Thinking about it more at the practical and functional level is fascinating.

Maybe now I can figure out how to manage some “people time” without feeling like I’ve had to compromise on protection of personal identity (and sweet, sweet, brain juice reserves).  Perhaps I can find ways of dealing with others whose value systems run more contrary to my own without feeling like I’ve been made to endorse lukewarm insight as genius, or play cloak-and-dagger politics.

Sorry if I’ve been a jerk – per the above, it’s not you, it’s me (unless it is you, in which case knock it off already).

Oculus Rift: First Thoughts

I recently had a chance to play around with the Oculus Rift development kit, and came away thoroughly impressed.  I’ll side with Cliff Bleszinski’s comments from SxSW: “There are two types of people when it comes to the Oculus Rift – there are those who haven’t seen it, and those who have seen it and believe.”

The experience of wearing the Rift is transportative, if not outright transformative.  The bulk and weight feel natural in terms of fit and distribution on the head, and the sense of the screen itself quickly fades from view to be replaced by the whatever it’s showcasing – it gets out of the way as a platform and lets the content speak for the capabilities.  Its strength as a portal to the work of other creators and designers will hopefully cement its place as a mainstream product and bestow major success and market-share on the Oculus team (and their “just happy to be here, folks!” founder Palmer Luckey, who comes across as a really nice guy who loves what he does – I can’t help but wish him success).  Deep pockets can sink early ventures pretty easily, in the event that Sony, Microsoft, or others want to set their sights on the emerging VR market after the hard work of helping it onto its feet has already been done.

Back to the device though: the dev kit is a lower resolution and higher bulk than the target specs for the consumer release, but were still impressive in their own right until you wanted to read something (more on that later).  The stereoscopy is exceptional and provides not only depth but a serious demonstration of the scale of things as well.  A typical trope of first-person PoV in video games relies on making game worlds absolutely massive in order to create a sense of scope and scale to impress (and then let the character run through it at a bazillion miles an hour on indefatigable and inhuman legs capable of bounding over improbable obstacles so as to make it still navigable instead of tiresome).  The Tuscany demo provided by the Oculus team, however, manages to sit right between “comfortable” and “spacious” with a model and world that in any other mainstream title would actually come across as “quaint”, a piece of set dressing hardly worth exploration.

This is helped along considerably by the proprioceptive projection invoked by the medium – that is, the sense of the environment meshes so well with the brain’s expectations of how “the I that is me” relates to the world, that it (the brain) slips easily into the sense of reality we usually construct from our physical surroundings.  Standing on the balcony overlooking the courtyard and out to the sea, I wanted to crouch down and inspect the stonework of the banister.  Not only did it immediately relate to a concept of my own scale in that environment, but my brain craved additional subtle details it expected but found absent: I wanted my voice to echo off the wall, to feel shifting air and patterns of temperature as I moved from sun to shade or turned relative to wind, even humidity and smells.  I wanted to touch things not only to measure their position relative to myself, but also to become aware of their texture, solidity, and age.

The fact that I jumped so quickly into a realm of subtlety is a credit to the visual experience.  Human eyes are constantly making minute adjustments to correlate for how we bounce ourselves around, even for tiny head shakes resulting from speech.  The very sensitive 120hz positional sampling in the Rift caught and balanced these perfectly, providing a sense of stability and responsiveness unrivaled by any kind of 3d VR experience I’ve had before this, managing to simply disappear

In fact, the last time I remember anything close to this was the first time I played Doom (when it first came out in 1993, a full 20 years ago), and felt an emotional response to hearing the grunt of an unvanquished Imp somewhere in the level.  In that case the gameplay was engrossing, though not entirely immersive, until I reflected back on it later that first evening: my memory was not of the keyboard, screen, and speakers, but of the environment itself – it had provided enough detail for my brain to fill in the rest and appreciate it the same way it did with its other concepts of space.  The Rift does that up front, so that upon reflection the rest of the subtlety comes in to play.  In fact, I found myself referring to my physical presence as “the real world” as distinct from the world I was experiencing and inhabiting (as opposed to “physical” and “virtual”).  My sense of orientation relative to the desk and my developer friend whose kit I was inhabiting, even with his voice providing some orientation, became completely un-grounded – that’s just not the world my brain was in at the time.  This young lady’s reaction is quite illustrative.

Now to be fair, it does have its limitations.  Content developers are going to have to think a lot about interface – movement is very different when you add head tracking as an additional means of orientation.  The Team Fortress 2 demo does a good job of this, and separates the aiming reticle from the viewport (typically in first-person shooter’s its embedded in the center of the screen, and aiming the camera and the weapon are one in the same).  The combination of look + mouse + keyboard for movement was quickly natural, and will hopefully be used as a starting point or template for others.

Other details will also need to be worked out:

  • Movement matters – a walking view should feel like walking, complete with subtle shifts in position or bounce as weight changes feet, so one is not impossibly glide-stepping or rolling around the virtual environment in a wheelie chair.
  • Image textures are not enough – if a section of wall is simply painted to look like a grate, a 2D interface might let you get away with it.  But in full 3D, the eyes immediately register it as an utterly flat plane, making that image (no matter how nice) look like cheap wallpaper.  Bump mapping will help, but people are also going to want to stick their noses a lot of places they haven’t previously, so as much as a person will ever be able to peer through or around will need some life breathed into it.  For that matter, UV registration (the process by which shape and image are matched up) will need a lot of precision work as well – the corner of that brick had better match up with the corner of the picture.
  •  Structural integrity needs to be considered – in virtual space an object need not be 2-sided, or even 3-dimensional.  A pane of glass (or a railing, for that matter) can be depthless, and just because a box has 3 sides doesn’t mean the others are complete.  But unless the point of the virtual world is to explore Klein-style mathematical constructs, maintaining Euclidean geometry and physics is important for preserving the illusion.
  • Eliminate lag at all costs – if no other subtlety can be preserved, make sure that the physical head turn indicators in the headset translate seamlessly into the virtual representation.  Stutter or lag in visual perception is more 4th-wall shattering than anything else, in addition to being more nausea-inducing than awkward and/or rapid movement.
  • Reading is right out – the TF2 HUD was a wonderful experience, to have it really appear as though it were floating on top of the rest of the fluid environment, but in order to be viable as an interface it needs large, high-contrast text near the center of the field of view. It’s like going back to the barely post-DOS games of yore, and means that textual interaction has to be kept to a minimum and rely on other cues (such as color coding, simple distinct glyphs, etc.).  This will probably be worked out in successive iterations with higher resolution, but for now is a distinct limitation.

I’m excited for what this can do expand the possibilities of experiencing virtual worlds, and not just for gaming.  I’ve recently begun to do my sculpting digitally, trading in polymer clays for a pen and tablet – way less mess, no set-up and clean-up, and I don’t have to bother with planning out all my internal support structures in advance (letting me stay spontaneous throughout the course of the entire project).  An infinite level of detail, independent object addressing, layers, even “undo” are giving me as much freedom in a computer that I experienced when moving photography into PhotoShop.  To combine that with more natural modes of manipulation (still waiting on my twice-delayed Leap Motion controller) and perception will further decrease the barriers between imagination and creation.

Navigating infoscapes is another big one I’m looking forward to, and will have another write-up in that regard soon.

But really? One of the biggest reasons I’m excited for this is due to the McArdle’s disease: when I talk about inhuman and indefatigable feats in navigating virtual worlds, that goes doubly for me.   Even with good physical therapy and conditioning there’s stuff I just can’t do anymore, and being able to strap on a different set of eyes and overcome physical limitations is thoroughly enticing.

Musical Seeds

I can’t begin to count the number of musical ideas that have come and gone over the years.  Some of them I manage to commit to memory, some actually get written down at some point, and a very select few have actually been turned into pieces of music.  The criteria for that selection is usually “what sticks in my head” rather than “this is a worthy component”, which is unfair to a good many that were lost merely because my memory is bad.

I’ll eventually counter this by setting up a good studio or getting good enough at notation that I can do it while I’m at the keyboard instead of trying to deconstruct things later.  In the meantime, I tossed a recorder into the mix so I don’t permanently lose this one that I was toying around with this morning:

This is a small musical meditation – but also the germ from which larger music can be constructed and derived.  It’s the basic element which inspires connections to other components and themes that eventually gets turned into music – I can sit down with this in my head and play lots of other tangents, weed through them and find those that are cohesive to the feeling to be captured, and then develop that into a song draft.  It’s always a very organic process.

Most importantly, it’s a process I haven’t attempted in earnest for several years – glad to see that some of it has matured in the meantime.  This piece is best listened to with headphones, and is intended to be soft and subtle in some passages.  The timing’s not nailed down yet, but the subdivision of the rhythmic back-bone has a side effect of slowing down the listener’s breathing – it’s a nice mellow tune in that regard.


Triangulation Lookup Table as a Simple Solution for Time-to-Arrival Multilateration

Working with my buddy Brad we found out we were both contemplating the same problem, each having arrived there through different means (my own as a contemplation for isolating point sources for sound in a noisy environment and automatically canceling the ambience with 3 or 4 microphones instead of hundreds).  Simply put, we needed to, based on nothing more than multiple audio channels, find out where a sound was coming from within a grid (assuming the the audio channels are being produced by microphones placed in 4 corners).

As we dissected the problem, we found 3 things:

  1. This is a fun real-world problem to chew on.
  2. It is far more complex than it appears on the surface.
  3. There are very good, well-documented solutions to this problem – if you’re a mathematician (there’s lots of free math out there, but not a good public library of implemented multilateration code).

Seeing as how neither Brad nor I are super big on the mathematics (that’s what computers are for, after all), we decided to flip the problem on its head: instead of trying to find the intersection of hyperboles in three dimensions, why not pre-calculate the anticipated signal fingerprints in terms of sets of time-to-arrival differences for each of the sensor locations, and then do a nearest-neighbor calculation in the resulting lookup table?  I’ll go through the problem deconstruction and solution in steps so it’s easier to see what we were trying to do.

Imagine a grid, say 80′ by 60′, with one microphone placed in each of the 4 corners A, B, C, and D:

Figure 1: A basic 80 by 60 unit grid with corners labeled counter-clockwise from top left A, B, C, D

An “audio event” (say, a sneeze, a clap, a code word, etc.) takes place at a random location within our grid, and the sound begins to propagate outward in all directions (rebounding echoes are not shown here, since they will always reach any sensore after the initial sound wave – obstruction based on the direction of the emitter [e.g., which way our sneezer is facing] is also not directly factored in, but has less of a dampening effect than you might think – at least for sounds above a certain threshold):

Figure 2: Propagation of sound waves from a random location in the grid toward all 4 corners/microphones

It’s easy to see the distance from this randomly selected point to all 4 corners just by counting rings, but ring counting is not available to us, so we instead use some basic functions and turn the whole thing into right triangles and hypotenuses, so that we may infer from the hypotenuse what the relative lengths are of the other sides and thus our (x,y) coordinates:

Figure 3: The sound event assigned as point E with direct lines drawn to all 4 corners creating line segments AE, BE, CE, and DE

We have now assigned our audio event the label E and assigned the intermediate intersections a, b, c, and d (instead of sticking with just x and y because I want to be able to describe intermediate line segments without confusion).  There’s just one problem with this approach – we don’t actually know the distance of any line segment to E!  We do know what some of them are relative to each other, though as illustrated in the following figure – but since we don’t know exactly when the sound was first emitted we have to start our counting when the sound reaches the first microphone:

Figure 4: Timing of signal arrival from our random point to each of the 4 microphone locations

Our signal starts at -30.494ms relative to A, but we don’t know that – all we know is that it was the first to receive the signal, as the closest microphone, and can then count upward from there to the other sensors – which I’ve listed here in sensor order, rather than detection order.  Going counter-clockwise like this means that, from the first microphone starting at 0, we will always be seeing adjacent-opposite-adjacent corners (with opposite always the highest number as well).

The equation shown next to D is the crux of all this: the value that we record there relative to A is the same as the DE hypotenuse minus the AE hypotenuse as drawn in Figure 3.  This relative value is consistent, and follows a predictable curve (shown here from the perspective of time-of-arrival offset at D):

Figure 4.1: 3-dimensional plot of signal difference between A and D for each value x,y

Switching to a dark background here to make the plot more visible.  The peak and valley are the bounds of the grid – the fixed distance of the microphones.  The slope directly between those 2 points is far more linear, and actual crosses 0 (since AE is going to be greater than DE 50% of the time, some of the values on this graph will be negative, unless you wrap it in an abs function) at the same x on all values, with 0 representing the moment at which the signal will reach both points simultaneously – right smack in the middle.  It’s easier to see this from the side:

Figure 4.2: DE minus AE plot rotated to be seen from the Y axis

So we know there’s a plot function for creating this kind of value, but inverting that into something like f(80,18.804) = ED is a little trickier than I’m up for.  This is where multilateration comes in, looking for the intersect of 2 such measurements, which requires a lot of propagation along vectors to identify.  This is where Brad and I started to cheat: if we can compute what the signal difference would be relative to all 4 microphones for any x,y point on the grid, why not do that in advance at an acceptable resolution, and then look for nearest-neighbor matches to do a look-up instead?  With the microphone nearest the event always registering 0 (the starting point for counting), you’re left with the difference of the hypotenuses for the 3 other corners BE, CE, and DE, which produce a nice x,y,z set of coordinates that can be measured against each other.

In order to optimize our solution, we take our cue from the fact that the nearest microphone always being 0 means that we only have to compute one quadrant of our grid, which can then be rotated (technically we would only need to calculate half of the quadrant, split diagonally, but computers like grids instead of triangles so we stuck with that):

Figure 5: Plotting the x,y offsets from the perspective of the A quadrant

By setting our mapping interval to 1′ that’s the resolution we’re limited to, but for this application is sufficient.  The equivalency of the grid is shown through the following 2 figures demonstrating rotation of the event plot:

Figure 6.1: rotating the hypotenuses from the event to alternate equivalent grid points

Figure 5.2: showing all 4 equivalent grid point calculation rotations

Given the equivalency, we only need to calculate the relative values to its peers from the perspective of a single designated 0 corner, and then compensate for rotation (which includes swapping adjacent corners if we’ve rotated an interval of 90° [but not 180°]).  The real value of this optimization is that it cuts the amount of data we need to scan by 75%.  Another optimization is that, when searching row by row through each column, once the coordinate set we’ve evaluated begins to increase in distance the search within that column can be aborted.  The most optimal solution would probably be an octree implementation for the coordinate scanning, which would significantly cut the number of neighbor comparisons that need to be made especially on larger data sets – but we didn’t bother going that far.

One problem with this approach though, is that when you don’t have an equal number of units on both x and y axes, you lose a little precision when doing 90º interval rotation by a factor of x:y. Not terrible, but between the trade-offs in resolution and precision, this solution, simple though it may be, is not for everyone.

Crude though the code may be (I hacked it together in Perl initially, then put it into PHP as a lingua franca for Brad – all while commuting on the train) I offer it for your inspection and deconstruction.  Bon apetit!

  • sono.phps: class file containing logic – look at the static test method for hints on usage.
  • ping.phps: implementing file showing the most basic functions and routines, requires the sono class.

Older posts

Newer posts