The central issue is that video audio is recorded at widely different volume levels. Basic recorders may have no recording volume control but some sort of automatic leveling that prevents clipping. That can result in flat audio. Even TV programs vary hugely by program and station. Video from different sources is highly variable in both volume and content. Trying to get a standard level is best explained in terms of mp3 music files and software.
In any music track, you may have quiet passages, loud drumming, overdriven guitar that produces annoying odd harmonics etc. Somehow you have to try and make it all an acceptable level while not making it a monotone or inaudible. The way mp3 normalizers work is to review the entire track volume levels, and have some formula for working out how to compress it to "normal". The "track gain". To make an entire set of mp3 tracks sound OK together, you then have to compare the new "normal" levels of each track and adjust relative to the others. The so-called "album gain". To get decent normalization, the software has to go through all the mp3 tracks. Ultimately it's a subjective "best guess".
If one is trying to get normalization solely on video playback, the results will not be as good as noted above. It can only work with the current video. Some players have a normalization feature, see:
VLC player normalizationSuch software/hardware must either review the track before playing or have a variable gain that progressively limits volume as it approaches 100% or increases as it approaches 0% (making the audio sound flatter). I can't speak for how it works in VLC player. These days we can at least be thankful that most systems have a remote control!