Why We Built Our Own Audio Firewall Format (wav9), and What It Actually Does
By the KAVANA engineering team — June 2026
The broadcast content review process at a Chinese county-level station typically involves three human approval stages, a scheduling review, and a compliance sign-off before a segment is cleared for air. That review takes time and involves real people making real judgments. When it works correctly, by the time a piece of content reaches the playout queue it has been examined more carefully than most published content most audiences ever encounter.
What nobody in that review chain is thinking about is what happens to the audio file between the moment it is written to the playout server and the moment it goes to air. That gap — which might be minutes or might be hours, depending on the station's workflow — is where we found a class of problems that content review cannot address, because content review is not running then.
This post explains what those problems are, how we built a format-level response to them, and why the approach is different from what the broadcast industry has historically used for content authentication.
The Gap Between Approval and Transmission
When a piece of content clears three-tier review and is committed to the broadcast schedule, it is written to storage as an audio file with associated metadata: air time, segment type, duration, editorial category. The system that reads that file at air time is the playout engine. In between, several things happen.
In a simple installation there is one playout PC and one storage location and the gap between write and play is a few milliseconds. In a more typical installation there is a content management system, a network-attached storage device, a playout server that syncs from the NAS on a schedule, and sometimes an intermediate transcode step that converts the production format to the broadcast delivery format. Each of those steps involves a system boundary. Each system boundary is a potential point where the file can be replaced, corrupted, or misrouted.
We are not primarily describing deliberate attacks here, though deliberate attacks are a real risk in broadcast environments where the output goes to a transmitter with significant reach. We are describing the broader category: the downstream IT bug that writes a stale version of a file over the approved version, the operator who accidentally overwrites the wrong segment during a manual edit, the transcode pipeline that silently produces a file with incorrect gain levels or a wrong sample rate because an upstream format change broke an assumption. These are ordinary operational failures. They happen. And when they happen, the content review record is irrelevant — the content that aired was not the content that was reviewed.
Three Threat Models We Actually Observed
Before designing anything, we catalogued what had actually happened at the stations we work with over a roughly five-year period. The incidents fell into three categories that shaped the design requirements.
Replay substitution. A segment from an earlier broadcast — sometimes days earlier — is played in place of the scheduled current segment. The most common cause is a file synchronization race condition: the playout server pulls from storage on a schedule, a network interruption mid-sync leaves the playout server with a partial file list, and when the expected file is not found the playout engine falls back to a previously cached version of a similarly named segment. The result is a segment that sounds superficially correct (it is from the same station, in the right format, at roughly the right length) but is not the content that was scheduled and reviewed for that slot. A second incident type within this category is more deliberate: a scheduled file is replaced with an archived copy of a different segment that passes format checks because it is a real file from the same system.
Format tampering. The approved audio file is modified after review — additional audio is concatenated, a portion is edited out, gain levels are changed, or in the most serious cases recorded-over content is inserted at an internal splice point. This category includes the advertising injection attack that concerns many broadcast engineers: an approved segment that has had a short commercial clip inserted into a natural pause point using standard audio editing tools. The resulting file has the same filename, approximately the same duration, and passes a basic format check. Its content is different from what was reviewed.
Silent degradation. Not a substitution or an edit, but a pipeline failure that produces a technically playable file with wrong audio characteristics — a sample rate mismatch that causes the playout engine to play the content at the wrong speed, a stereo encoding error that collapses to mono with phase cancellation, a gain normalization failure that delivers content 18 dB below target. These failures do not cause dead air; they cause degraded output that may go undetected for the entire broadcast window, particularly overnight when no operator is monitoring.
Why Existing Authentication Approaches Did Not Satisfy Us
The standard approach to audio file authentication in broadcast infrastructure is a hash-and-sign workflow at the point of ingest: the approved file is hashed (SHA-256 is common), the hash is signed with an operator key, and the signature is stored alongside the file. At playout time, the playout engine computes the hash of the file it is about to play and compares it against the stored signature.
This approach is sound for the tampering threat model. A file that has been modified after the signature was generated will fail the hash check. For the replay substitution threat model it is partially effective: if each version of a segment has a unique signature, playing an old version with the signature of the old version will fail the check against the current schedule's expected signature. But this depends on the schedule-to-signature binding being implemented correctly, which in practice it often is not: many implementations simply check whether the file has a valid signature, not whether the signature corresponds to the currently scheduled item.
For the silent degradation threat model, hash authentication does nothing. A file that has been correctly signed and then degraded by a pipeline failure will still pass the hash check — the hash was computed before the degradation. The authentication system certifies the pre-pipeline version of the file, not the version that is about to go to air.
More fundamentally, the hash-and-sign approach treats authentication as a property of the file object rather than a property of the content. This is a design choice with operational consequences: the authentication record lives outside the file, in a database or sidecar. If the sidecar is lost, corrupted, or not transferred correctly along with the file, the authentication chain breaks. In the kind of multi-system broadcast infrastructure we described — content management, NAS, playout server — the sidecar travels a separate path from the file and can be separated from it.
The wav9 Design: Authentication Embedded in the File
The wav9 format embeds the authentication record inside the audio container rather than as a sidecar. The format is not a new audio codec; it is a container extension to the standard RIFF/WAV structure that adds a reserved chunk at the beginning of the file. The wav9 specification describes the chunk layout in detail. The short version: the chunk contains a segment identity descriptor, a review authorization record, and a content integrity hash computed over the audio data portion of the file.
The review authorization record is the part that connects the format to the three-tier review process. When a segment clears third-tier review in KAVANA's review system, the authorization is written into the wav9 chunk of the finalized audio file. The chunk contains the review tier identifiers, the authorization timestamps, and a signature over both the audio hash and the review metadata. The signed structure binds the audio content to the review record: you cannot change one without invalidating the other.
At playout time, the KAVANA-DOG watchdog process performs wav9 verification before the playout engine is given access to the file. The verification checks: is the wav9 chunk present and structurally valid; does the audio hash match the current audio data; is the review authorization signature valid; does the scheduled air time and segment identity match what the schedule says should be playing right now. If any check fails, the file is quarantined and the playout engine does not receive it. DOG logs the failure, attempts to fall back to an alternate segment, and generates an alert.
Handling the Three Threat Models
Against replay substitution, the schedule-binding check in the wav9 verification is the operative control. The authorization record in the wav9 chunk specifies the segment identity and the authorized air window. An old segment with a valid wav9 record from a previous air date will fail the schedule-binding check — its authorized window has expired. This check works even if the file is otherwise valid and signed, because the binding is to the schedule entry, not just to the file identity.
Against format tampering, the audio hash check is the operative control. Any modification to the audio data portion of the file — concatenation, insertion, splice, gain adjustment — will change the hash and fail verification. The hash is computed over the raw sample data before any container-level encoding, which means that re-encoding attacks (modifying the container but leaving the sample data unchanged) do not affect the hash, but re-encoding attacks that touch the sample data do.
Against silent degradation, the verification layer adds one check that hash authentication alone does not provide: a technical validity scan of the audio parameters before the hash comparison. The scanner checks sample rate, bit depth, channel count, and duration against the expected parameters from the schedule metadata. A file that passes hash verification but has been through a pipeline that corrupted its sample rate will fail the parameter check. This is not authentication in the cryptographic sense; it is a format-level sanity check. But it catches the silent degradation failures that cryptographic authentication misses.
What the Open Specification Is For
The wav9 specification is published as an open standard at github.com/kavanafm/wav9-spec. We made this decision after some internal debate about whether to keep it proprietary.
The proprietary option would have created a lock-in moat: if wav9 verification is required for files to play in our system, and only our system can produce wav9-signed files, stations that adopt our content pipeline become dependent on us for the authentication chain. That is an attractive commercial position.
It is also exactly the wrong incentive structure for a broadcast security tool. Authentication mechanisms that are proprietary create two problems. First, stations that use other content management systems or playout engines cannot participate in the authentication chain, which means the chain has gaps at every system boundary where a non-KAVANA component touches the file. A partial authentication chain is worse than a clearly understood partial authentication chain, because partial chains create false confidence. Second, a proprietary format cannot be independently audited for security properties. The broadcast industry has enough experience with proprietary security claims that turned out to have implementation problems that an independently auditable specification is worth considerably more than a closed one.
We want wav9 to be adopted by other broadcast technology vendors, which means it has to be a specification that anyone can implement. If the standard is eventually maintained by a broadcast industry body rather than by us, that is a better outcome than a world where file-level content authentication requires vendor lock-in.
Known Limitations
The wav9 authentication chain is only as strong as the key management infrastructure that issues and verifies the authorization signatures. If an attacker can obtain a signing key, they can produce wav9-valid files with fabricated authorization records. The specification defines a key hierarchy and revocation mechanism, but implementing that hierarchy correctly in a real broadcast environment requires operational discipline: key rotation, secure key storage, and revocation processes that work in environments where the IT sophistication level varies considerably between stations.
We have found that in practice, many stations are not ready to operate a complete key management infrastructure on day one. We support a degraded mode where the wav9 verification performs format validation and schedule-binding checks without cryptographic signature verification. This catches silent degradation and some substitution attacks but does not catch an attacker with access to an unsigned file and the ability to create a valid-looking wav9 record. The tradeoff is explicitly documented in the deployment guide: degraded mode is better than no wav9 at all, but it is not a substitute for proper key management.
The format also does not address attacks that happen before the authorization record is written — during the production and review phase rather than the post-approval storage and delivery phase. If content is tampered with before final review, the wav9 record reflects the tampered content. This is a known scope boundary: wav9 is a post-approval integrity mechanism, not a production security system.
Why This Matters More as AI Content Volume Increases
When human producers create broadcast content, the production and review cycles are slow enough that there are natural checkpoints where a human eye and ear can confirm that what is queued for air is what was produced. When AI-generated content is introduced at volume — an AI host producing material for multiple slots per day, an AI news synthesis pipeline refreshing content every hour — the production rate outpaces the human review bandwidth. The scheduled content queue can contain dozens of AI-generated segments that will air before anyone has time to manually verify each one.
In this environment, the file-level authentication that wav9 provides becomes a load-bearing component of the content pipeline rather than an add-on assurance. The AI Three Gods review system handles the review process; wav9 handles the delivery chain integrity. The two components together provide a chain from AI generation through human review to transmission where each stage can verify the previous one.
This combination also addresses a regulatory question that is increasingly common in AI broadcast content discussions: can you demonstrate that the content that aired was the content that was reviewed, and not a version that was modified or substituted after review? With wav9 verification in the playout chain and the review authorization embedded in the file, the answer is yes, and the demonstration is mechanical rather than testimonial.
KAVANA is developed by Hunan ShengGuang Technology Co., Ltd. (湖南声广科技有限公司), incorporated 2012, team active since 2005. We hold a broadcast production and distribution license (湘字第00565号) and operate under Chinese cybersecurity Level 3 certification. Technical documentation and open specifications: github.com/kavanafm.