Kuchipaku

Every sound is a shape. Some shapes are on your lips, some are hidden on your tongue. Here are both, for a Japanese line.

Words from イノチミジカシコイセヨオトメ · CreepHyp. Front mouths are Google's pronunciation visemes. Side views are the CC0 midsagittal set by Wright and McCloy.

The T you cannot see from the front

English and Japanese spell it the same. The tongue does not agree.

Words from the song

Slow Loop slower faster

How this works

Speech is animated by visemes and articulations, the handful of distinct poses the mouth makes. Many sounds share one pose, so the set is small. This tool maps each Japanese mora to a quick consonant pose then a held vowel pose, and plays the sequence.

The front view is Google's own pronunciation mouth images, recolored. The side view is a public-domain midsagittal set, the vocal tract sliced down the middle so the tongue, palate and teeth show. Switch views and play the same word to see the lips and the tongue tell two halves of one story.

Where the side view is honest about guessing

The midsagittal set is one drawing per international phonetic symbol, not per Japanese sound, so a few mappings are close cousins, not exact:

お (o) has no diagram in the set, so it borrows the u tongue shape.
え (e) uses the nearest front vowel, ɛ.
ら-row (r) is a quick tap in Japanese; it borrows the n tongue position as the closest still frame.
ふ (fu) is really made with both lips; the side view shows the lip-to-teeth f instead.
The American vs Japanese T above uses the one alveolar t drawing, with the contact point moved forward to mark the Japanese position.

None of this is your mouth doing it wrong. It is one diagram set stretched over a different language.

Side-view set: Wright and McCloy, CC0 · the T contrast follows Dogen.

The lip shapes, by color

In the front view, each mora's underline marks its mouth-posture family.