猫科研究所 - x264-changelog-jp r1100-r1199

このページの全ては誤っているかもしれません。x264関連の記事に関してを読んでください。

x264-changelog-jp r1100-r1199

r1100-r1199のchangelogの日本語訳。その他のリビジョンと注意事項についてはx264-changelog-jpへどうぞ。

前:x264-changelog-jp r1000-r1099 - 次:x264-changelog-jp r1200-r1299

x264r1199

git-id : cb0b71fb981a3c6020d628dc7a41855b81df8d54
Author : Jason Garrett-Glaser
Date: Fri Aug 7 10:31:16 2009 -0700
Fix delay calculation with multiple threads
Delay frames for threading don't actually count as part of lookahead.

マルチスレッドでのdelay（ディレイ・遅延）の計算を修正。
スレッドに対するdelayフレームは本来はlookaheadの一部としてカウントしない。

わかりにくい表現だが、bframesから求まるdelayとlookaheadによるdelayを比較する必要があり、その際にbframes側に含まれてしまっていたスレッドによるdelayを外したと言うこと。スレッドによるdelayは比較後に加算している。

x264r1198

git-id : a1ed468f67476fbbe49e1fbfe1a567be0c052d44
Author : Jason Garrett-Glaser
Date: Thu Aug 6 23:09:46 2009 -0700
Add "veryslow" preset
Apparently some people are actually *using* placebo, so I've added this preset to bridge the gap.

"veryslow"プリセットを追加。
明らかに一定の人数が実際にplaceboを*使っている*ので、そのギャップを埋めるためにこのプリセットを追加した。

プラセボなのに…と思ったのだろう。内容は、me=umh, subme=10, merange=24, ref=16, b-adapt=2, direct=auto, partitions+=p4x4, trellis=2, bframes=8, rc-lookahead=60に相当する。

x264r1197

git-id : bb66c482242a0747823661b212114c1a2f015fe3
Author : Jason Garrett-Glaser
Date: Tue Aug 4 17:46:33 2009 -0700
Macroblock-tree ratecontrol
On by default; can be turned off with --no-mbtree.
Uses a large lookahead to track temporal propagation of data and weight quality accordingly.
Requires a very large separate statsfile (2 bytes per macroblock) in multi-pass mode.
Doesn't work with b-pyramid yet.
Note that MB-tree inherently measures quality different from the standard qcomp method, so bitrates produced by CRF may change somewhat.
This makes the "medium" preset a bit slower. Accordingly, make "fast" slower as well, and introduce a new preset "faster" between "fast" and "veryfast".
All presets "fast" and above will have MB-tree on.
Add a new option, --rc-lookahead, to control the distance MB tree looks ahead to perform propagation analysis.
Default is 40; larger values will be slower and require more memory but give more accurate results.
This value will be used in the future to control ratecontrol lookahead (VBV).
Add a new option, --no-psy, to disable all psy optimizations that don't improve PSNR or SSIM.
This disables psy-RD/trellis, but also other more subtle internal psy optimizations that can't be controlled directly via external parameters.
Quality improvement from MB-tree is about 2-70% depending on content.
Strength of MB-tree adjustments can be tweaked using qcompress; higher values mean lower MB-tree strength.
Note that MB-tree may perform slightly suboptimally on fades; this will be fixed by weighted prediction, which is coming soon.

Macroblock-treeレートコントロール。
デフォルトでON；--no-mbtreeでOFFにできる。
時間軸でのデータのpropagation（伝播）を追跡し、質に適宜重み付けをするため、大きなlookahead（先読み）を使う。
マルチパスでは非常に大きな別途のstatsファイル（マクロブロックごとに2バイト）を要求する。
まだb-pyramidとは一緒に動作しない。
MB-treeは本質的に、標準のqcompの手法とは異なる品質測定を行うため、CRFによるビットレートはいくらか変わりうることに注意。
これは"medium"プリセットを若干遅くする。同様に"fast"をも遅くするので、"fast"と"veryfast"の間に新たなプリセット"faster"を導入した。
"fast"以上のプリセットはMB-treeがONである。
MB treeが先読みしpropagation解析を行う距離（訳注：フレーム数）をコントロールするための新オプションとして、--rc-lookaheadを追加。デフォルトは40；大きな値はより遅く、よりメモリを要求するがより精密な結果をもたらす。
この値は将来的にレートコントロールの先読み(VBV)をコントロールするために使用される。
PSNRやSSIMを向上させないpsy（心理的）最適化を全て無効にする新オプションとして、--no-psyを追加。
これはpsy-RD/trellisを無効にするが、同時に外部のパラメータでは直接コントロールできないような、その他の内部の細かなpsy最適化をも無効にする。
MB-treeによる質の向上は（映像の）内容により約2-70%である。
MB-treeのstrength（強度）の調整はqcompress(qcomp)により行える；高い値は、低いMB-tree strengthを意味する。
MB-treeはフェードにおいて若干最適ではないかもしれない事に注意；これは近々予定しているweighted prediction（重み付き予測）により修正されるだろう。

mbtreeは非常に大雑把に言うとCRFをマクロブロック単位で行うような処理で、より視覚的に最適な出力を行う。映像の内容によっては非常に強力で、かなり低ビットレートにしてもそこそこの画質が保てる。この例はなんと67kbpsとのこと。猫科研究所的な詳細解説はx264(mbtree)を参照。

"faster"は実際には以前の"fast"と同じであり、新しいfastはmixed-refs=ON, ref=2, subme=6, mbtree=ONの全く新しい"fast"であることに注意。

changelogの分量を見ても分かるだろうが、かなり変更が大きく、テスト期間も長かったとは言えないため、まだバグの可能性はあるように思う。

x264r1196

git-id : f21e71a04ba65aff9b5a4bfa8a73fd86c463f4ee
Author : Jason Garrett-Glaser
Date: Mon Aug 3 20:52:30 2009 -0700
Various 1-pass VBV tweaks
Make predictors have an offset in addition to a multiplier.
This primarily fixes issues in sources with lots of extremely static scenes, such as anime and CGI.
We tried linear regressions, but they were very unreliable as predictors.
Also allow VBV to be slightly more aggressive in raising QPs to avoid not having enough bits left in some situations.
Up to 1db improvement on some clips.

様々な1-pass VBVの調整。
乗数に加え、オフセットをpredictorに持たせた。
これは主に、アニメやCGIのような極端に静的なシーン（場面）を多く含むソースでの問題を修正する。
また、幾つかのシチュエーションで十分な残りビットを持たないことを避けるため、VBVが少し積極的にQPを上げることを可能に。
幾つかのクリップ（映像）で最大1dbの改善。

r1091の若干根本的な対策、と思えば良いのだろうか。

x264r1195

git-id : 5d75a9bd5b942392c4ab64156a266eed64c0793f
Author : Jason Garrett-Glaser
Date: Tue Jul 28 20:41:27 2009 -0700
Fix another 10L in QPRD
An entry in subpel_iterations was missing.
I have no idea how QPRD was working at all without this change.

QPRDのまた別の10Lを修正。
subpel_iterations内のエントリが足り無かった。
これ無しでQPRDがどう動いていたのかさっぱり分からない。

「エントリが足りない」とは、テーブル（2次元配列）の行。a行 * 4桁のテーブルだったはずが、(a - 1)行 x 4桁になってたということ。テーブルのサイズを超えてアクセスしてたとしたら、不定値を使ってた事になるので確かにどうなるのかさっぱり分からない。

x264r1194

git-id : 97ed27054005a85e7c49209c20fd0b280917ac02
Author : Jason Garrett-Glaser
Date: Tue Jul 28 01:16:23 2009 -0700
Update help and cleanup in ratecontrol.c
Deal with some out-of-date information.

ヘルプのアップデートとratecontrol.cの整理。
古くなった情報を片付けた。

x264r1193

git-id : 0538c56a95cdc41a42a206079276a57d5d76b5a5
Author : Loren Merritt
Date: Tue Jul 28 07:16:31 2009 +0000
15% faster refine_bidir_satd, 10% faster refine_bidir_rd (or less with trellis=2)
re-roll a loop (saves 44KB code size, which is the cause of most of this speed gain)
don't re-mc mvs that haven't changed

refine_bidir_satdを15%高速化、refine_bidir_rdを10%高速化（trellis=2では若干低い）。
ループを再ロール（44KBのコードサイズを節約、この高速化の大部分はこれにより生じる）。
変更されてないmvの再mcを行わないように。

ループアンロール（ループ展開）という高速化手法が行き過ぎていたので一部を通常のループに戻したということ。

ループアンロールはCPU内部に注目した高速化手法で、展開されるコードにもよるが以下のような効果がある。

パイプラインストールの回数減少（と命令先読み等の投機的動作の有効活用）。
スーパースカラにおける実行ユニットの稼働率向上。
余剰レジスタの有効活用（x86はレジスタ数が少ないので効果は薄いことが多い）。

これらにより非アンロールコードに比べ数倍以上速いことも珍しくないのだが、欠点としてコードサイズが大きくなる。これによってコードキャッシュが再利用できず、コードを読み込むためのメモリアクセスが多くなれば、外部的に遅くなってしまう。

x264r1192

git-id : 306c3ee4b1c3cae804185597305725d2484f21b9
Author : Jason Garrett-Glaser
Date: Mon Jul 27 21:03:00 2009 -0700
Faster bidir_rd plus some bugfixes
Cache chroma MC during refine_bidir_rd and use both the luma and chroma caches to skip MC in macroblock_encode.
Fix incorrect call to rd_cost_part; refine_bidir_rd output was incorrect for i8>0.
Remove some redundant clips.
~12% faster refine_bidir_rd.

bidir_rdの高速化と幾つかのバグフィックス。
refine_bidir_rd中でchromaのMCをキャッシュし、lumaとchromaの両方のキャッシュをmacroblock_encode中のMCをスキップするために使用。
誤ったrd_cost_partの呼び出しを修正；refine_birir_rdの出力はi8>0で不正だった。
幾つかの冗長な部分を削除。
refine_birir_rdが～12%高速化。

x264r1191

git-id : d6eed014d0af8f87045d6d5daf3376c486efdea7
Author : Jason Garrett-Glaser
Date: Mon Jul 27 04:45:03 2009 -0700
Add "fastdecode" tune option
It does what it says it does.

"fastdecode"のtuneオプション追加。
その名の通りの事を行う。

デブロック:OFF、CABAC:OFF(=CAVLC)、weightb:OFFになる。

x264r1190

git-id : 43773d27a6dd74c62b6d29d0ae0a80397469bfbf
Author : Jason Garrett-Glaser
Date: Sun Jul 26 12:20:09 2009 -0700
Fix two bugs in QPRD
fprofile settings now actually fprofile QPRD.
Don't use i_mbrd before initializing it.

QPRDのバグを2つ修正。
fprofile設定を実際にQPRDのfprofileにした。
初期化前にi_mbrdを使用しないようにした。

x264r1189

git-id : 9d0b5e95bbf3f9077806927264add762952f77ad
Author : Jason Garrett-Glaser
Date: Sun Jul 26 03:03:12 2009 -0700
Fix 10l in QPRD
Trellis used wrong lambda with trellis=1

QPRDでの10Lを修正。
trellis=1でTrellisが不正なラムダを使用していた。

10Lについては過去のchangelogを参照。

x264r1188

git-id : 4074956df13e058421fb5ba89b872be143742ffd
Author : Jason Garrett-Glaser
Date: Sat Jul 25 22:31:06 2009 -0700
Fix a nondeterminism with threads and subme>7
Also add a few more checks to eliminate the need for spel_border.

subme>7でスレッドを使用した場合の非決定性を修正。
また、spel_borderの必要性を除去する幾つかのチェックを追加。

決定性(determinism)とは、言い換えると再現性とも言える。ここではスレッドの処理タイミングによって出力が変わってしまわない事を指す。同じ入力なら同じ出力を何度でも再現できなければ比較可能性を損なうため、スレッド使用時の非決定性を修正したと言うこと。

x264r1187

git-id : 7733721e410acb96fdf740ca95d2a394b2a2b713
Author : Jason Garrett-Glaser
Date: Thu Jul 23 12:20:39 2009 -0700
Add QPRD support as subme=10
Refactor trellis lambda selection to be done in analyse_init instead of in trellis.
This will allow for more easy adaption of lambda later on; for now it allows constant lambda across variable QPs.
QPRD is only available with adaptive quantization enabled and generally improves SSIM and visual quality.
Additionally, weight the SSD values from RD based on the relative QP offset for chroma; helps visually at high QPs where chroma has a lower QP than luma.
This fixes some visual artifacts created by QPRD at high QPs.
Note that this generally hurts PSNR and SSIM, and so is only on when psy-RD is on.

subme=10としてQPRDのサポートを追加。
trellisラムダ選択をtrellis内に代わってanalyse_initで行われるようにリファクタリング。
これは後ほど、より簡単にラムダを適用することを可能にする：ひとまず現在では、変動するQPに跨る固定のラムダを可能にする。
QPRDはAQ（適応的量子化）が有効にされている場合に使用可能で、一般的にSSIMや視覚的な質を向上する。
加えて、chromaに対する相対的(relative)なQPのオフセットに基づくRDからのSSD値に重点を置く：chromaがlumaよりも低いQPを持つ高QP帯で視覚的に補助する。
これは高QP帯でQPRDにより作成される視覚的なアーティファクトをいくらか修正する。
これは一般にPSNRとSSIMには有害であり、そのためpsy-RDがONの場合にのみONとなることに注意。

ついにsubmeも2桁の時代。trellisのラムダがより自由に選択・適用されるようになった時が真価の見せ所になりそう。QPRD(subme=10)はtrellis=2、aq-mode>0が条件なので注意。満たさない場合はsubme=9となる。

QPRDとは、"Rate-distortion optimal QP selection"、つまりRD最適なQP選択の処理らしい。よく分からなければ、RD、RD refinementより上位のRD処理だと思うといい。現状のQPRDの特性についてはソースに以下のようにある。

/* Rate-distortion optimal QP selection.
* FIXME: More than half of the benefit of this function seems to be
* in the way it improves the coding of chroma DC (by decimating or
* finding a better way to code a single DC coefficient.)
* There must be a more efficient way to get that portion of the benefit
* without doing full QP-RD, but RD-decimation doesn't seem to do the
* trick. */

レート・歪み最適なQP選択。
要修正：この関数の利得の半分以上は、chroma DCの符号化の向上（単独DC係数のよりよい符号化方法を見つけるとかdecimateするとか）によるもののようだ。
完全なQP-RDを行わずとも利得の恩恵を受ける、より効率的な方法があるに違いないが、RD-decimationは上手くやってはくれないようだ。

まだ改善の余地がある模様。

x264r1186

git-id : f5e6980b3eb34ed610f5fc36a4378a0ed4277753
Author : Jason Garrett-Glaser
Date: Tue Jul 21 19:56:21 2009 -0700
SSSE3 cachesplit workaround for avg2_w16
Palignr-based solution for the most commonly used qpel function.
1-1.5% faster overall on Core 2 chips.

avg2_w16に対するSSSE3キャッシュ分割のワークアラウンド（次善策）。
最も一般的に使用されるqpel関数に対するPALIGNRベースの解決法。
Core2チップ上の全体で1-1.5%高速化。

changelogでは時々PALIGNRというSSSE3命令について言及されている。これはPackedAlignRightの意味で命名されていて、実行内容としては2つのレジスタ値を結合し、右シフトする。ニーモニック的には3オペランドの命令で、"palignr dst, src, shift"の形式。例えば"palignr xmm0, xmm1, 4"とすればxmm0+xmm1で内部的に256bit長のデータを作り出し、これを右に4byteシフトしたデータの下位128bitをxmm0に代入する。何が嬉しいかというと、SIMD命令の殆どはデータがアラインされてなければならないが、例えばRGBデータが8bit*3=24bitでアラインされている場合に、PALIGNRでアラインを補正しながらレジスタにロードできる。

Jason Garrett-Glaser(Dark_Shikari)氏はshufps命令がお気に入りみたいな事を過去に述べていて、その親戚(SSSE3版)であるpshufb命令と組み合わせると、色空間の変換なんかに非常に有用。そしてpshufb命令はPackedShuffleBytesの意味で、やはり過去に書いたSuperShuffleEngineの恩恵を受けているはず。こうして見ると、彼の興味がある分野に向いていることがよくわかる。

x264r1185

git-id : 29569051505a78db9dbbc8fda53ab11e7e08b994
Author : Loren Merritt
Date: Wed Jul 22 20:20:52 2009 +0000
shut up valgrind warnings in trellis

trellisでのvalgrindの警告を黙らせた。

valgrindとは、メモリリークなどを検出する、主にLinux用のメモリデバッグツール。Windows版はない。コンパイラの警告に対策するのと同じで、一般のユーザには関係ない変更。

x264r1184

git-id : 88b35c2d3bd86b42059e27db365752da9f2cd032
Author : Anton Mitrofanov
Date: Sat Jul 18 16:30:18 2009 -0700
New AQ algorithm option
"Auto-variance" uses log(var)^2 instead of log(var) and attempts to adapt strength per-frame.
Generates significantly better SSIM; on by default with --tune ssim.
Whether it generates visually better quality is still up for debate.
Available as --aq-mode 2.

新しいAQアルゴリズムオプション。
"自動分散"はlog(var)の代わりにlog(var)^2を使用しstrength（強度）をフレームごとに適用しようと試みる。
--tume ssimを使用したデフォルト状態で顕著に良好なSSIMを生成する。
視覚的に良好な質を生成するかは議論の余地がある。
--aq-mode 2として使用可能。

期待の新アルゴリズムであるAutoVAQが導入された。出力の変化の割にコードは少ない。

コードを確認してみると、このアルゴリズムでの変更点は、上記文中にある通り、varianceさせる基準をlog2(ac_energy)から(log2(ac_energy))^2に変えたことにある。これにより想定される出力の違いは、QP変動具合のコントラストが強くなることだ。従来のQP変動値はlog2(ac_energy)に対して線形だったが、新アルゴリズムは2乗しているので2次関数になる。

（2009/08/10：削除）以下の記述は正しくない可能性が高いため、大部分を削除とします。申し訳ありません。「仲間内の備忘録レベルのつもりがちょっと有名になってきてビビった」とも言う…。もっと正確に書けるようになったら別記事に追い出し、これらを削除します。

ユーザ向けに噛み砕いて言えば、より「変更の激しい」部分（iになったMBやp/bでも差分の大きいMB）ではより高QP＝低画質に、より「変更の少ない」部分ではより低QP＝高画質が維持されるように、AQによるQPの振れ幅が激しくなるということだ。AutoVAQという名前にも関わらず、特段自動的に何かを最適化したり、判断する要素があるわけではなく、純粋に計算方法が変わったものだと考えるべきだろう。

~~今回初めてAQのコードを覗いてみたが、AQが何処をどのように適応的にしているのか、大雑把に理解できた。~~まず、AQとは1フレームの中で、MBごとに適応的にQPを変動させる処理だ。crfがフレームごとのQPを変動させる時間軸の処理であるとするならば、AQは空間軸の処理であると言える。そしてそのQP変動の基準であるac_energyは、動き補償＋直交変換後の係数値のAC成分から、そのMBがどの程度参照画像と似ていない＝エントロピーが高いかを計算したもの（エントロピー自体ではないが）だ。この値が高ければQPを高くしてより劣化を強く＝エントロピーを削っている。つまり x264はac_energyが高いMBは変更の激しい箇所であり、情報を削ってもより視覚上の影響が少ないはずだと判断している。これにより、1枚の画の中で変更の激しい部分は大雑把に、変更の少ない部分は精細にエンコードすることができる。今回の変更は、ac_energyの値を基準に使うことは同じだが、その値に対する評価方法を変えたということだ。

なお、ac_energyの値は各係数値に対して2乗・加減算・積算をしたもので、やはり指数・対数的な意味合いを持つ尺度である。計算方法の詳細はpixel.cのPIXEL_VAR_Cマクロを参照。

TODO: AutoVAQ側のみaverageを採って中心を保持する処理なのはなぜか？ひょっとしてこれをAutoと呼んでいる？

x264r1183

git-id : f21daff3dc11cf5881f1727c3c9d505f0810d20b
Author : Jason Garrett-Glaser
Date: Wed Jul 15 12:43:35 2009 -0700
Cacheline-split SSSE3 chroma MC
~70% faster chroma MC on 32-bit Conroe
Also slightly faster SSSE3 intra_sad_8x8c

キャッシュライン分割版SSSE3 chroma MC。
32-bitのConroeでchroma MCを～70%高速化。
同時に、SSSE3のintra_sad_8x8cを僅かに高速化。

x264r1182

git-id : 6c13403195d42b2c0ee707e9f2a6e9f9cd81afd6
Author : Jason Garrett-Glaser
Date: Sun Jul 12 12:07:01 2009 -0700
Improve documentation of qp/crf options

qp/crfオプションのドキュメントを改善。

ヘルプの改善なのでエンコード出力に影響なし。

x264r1181

git-id : 49bf7673b2f52b2cd8e9c10d8d6e9cbbb5422cf7
Author : Jason Garrett-Glaser
Date: Thu Jul 9 19:02:57 2009 -0700
Merge array_non_zero into zigzag_sub
Faster lossless, cleaner code.
SSSE3 version of zigzag_sub_4x4_field, faster lossless interlaced coding.

array_non_zeroをzigzag_subにマージ（統合）。
ロスレスを高速化、コードを整理。
SSSE3バージョンのzugzag_sub_4x4_field、ロスレスインターレース符号化を高速化。

x264r1180

git-id : b63f5919e3f5367a0df3dbf218d5a94d2fdba5fb
Author : James Darnley
Date: Thu Jul 9 11:25:55 2009 -0700
Fix bug in reference frame autoadjustment
For some types of input file, x264 did the adjustment before width/height were known.

参照フレームの自動調整のバグを修正。
あるタイプの入力ファイルで、x264がwidth/heightを知る前に調整を行っていた。

x264r1179

git-id : 96e2229e96d65420d491596affa9aaa068d718d6
Author : Jason Garrett-Glaser
Date: Tue Jul 7 11:13:39 2009 -0700
Fix fprofile settings to match changes in defaults
Also add b-adapt 2 to fprofile.

デフォルト値の変更に適合するようfprofile設定を修正。
ついでにb-adapt 2をfprofileに追加。

x264r1178

git-id : 3f6713d5c794d4fbfd3131985e33a822a40cb870
Author : Jason Garrett-Glaser
Date: Fri Jul 3 02:33:44 2009 -0700
Slightly faster dequant_flat assembly
Eliminate some redundant shifts.

dequant_flatアセンブリを僅かに高速化。
冗長なシフト演算を除去。

x264r1177

git-id : af2a4ecd7bcefc97c8aa83913c9a2980206f9cd0
Author : Jason Garrett-Glaser
Date: Wed Jul 1 21:14:57 2009 -0700
Totally new preset system for x264.c (not libx264), new defaults
Other new features include "tune" and "profile" settings; see --help for more details.
Unlike most other settings, "preset" and "tune" act before all other options.
However, "profile" acts afterwards, overriding all other options.
Our defaults have also changed: new defaults are --subme 7 --bframes 3 --8x8dct --no-psnr --no-ssim --threads auto --ref 3 --mixed-refs --trellis 1 --weightb --crf 23 --progress.
Users will hopefully find these changes to greatly improve usability.

x264.c（libx264ではない）に対する総合的に新しいプリセットシステムと新しいデフォルト値。
"tune"と"profile"を含むその他の新規機能。より詳細には--helpを見ること。
殆どの他の設定と異なり、"preset"と"tune"は他のオプションの前に働く。
逆に"profile"は事後的に働き、他のオプションをオーバーライド（上書き）する。
デフォルト値も変更した：新デフォルト値は --subme 7 --bframes 3 --8x8dct --no-psnr --no-ssim --threads auto --ref 3 --mixed-refs --trellis 1 --weightb --crf 23 --progress である。
願わくば、ユーザの皆さんがこれらの変更をユーザビリティの大きな改良と受け取ってくれることを。

某巨大掲示板に説明は書いてあるのでここでは簡単に。

設定のデフォルトが変更されて、highプロファイル(8x8dct)でBフレが使用される(bframes)等、x264がある程度真価を発揮できるものになったと思う。

プロファイルを強制するための--profile (high|main|baseline)が追加、これはオプション指定の最後に評価されて、指定したプロファイルに適合しないオプションは強制的に変更される。

何も分からない状態からでもプリセットがあるというのは心強いかもしれないし、安全装置としてプロファイル指定ができるのもよい。

x264r1176

git-id : 72534d466a6bd99b9cbf32c74e667bea608c6dee
Author : Jason Garrett-Glaser
Date: Wed Jul 1 16:33:12 2009 -0700
Update Gabriel's email address in AUTHORS

著者のGabrielのメールアドレスを更新。

バイナリに影響なし。

x264r1175

git-id : f4dac817e45b967572f5c4e4af4644dc7d263512
Author : Jason Garrett-Glaser
Date: Tue Jun 30 15:20:32 2009 -0700
Early termination for chroma encoding
Faster chroma encoding by terminating early if heuristics indicate that the block will be DC-only.
This works because the vast majority of inter chroma blocks have no coefficients at all, and those that do are almost always DC-only.
Add two new helper DSP functions for this: dct_dc_8x8 and var2_8x8. mmx/sse2/ssse3 versions of each.
Early termination is disabled at very low QPs due to it not being useful there.
Performance increase is ~1-2% without trellis, up to 5-6% with trellis=2.
Increase is greater with lower bitrates.

chromaエンコーディングの早期終了。
ヒューリスティックが対象のブロックはDC（直流）のみであると示す場合には、早期に終了することでchromaエンコーディングを高速化。
inter chromaブロックの広く大多数が全く係数を持たず、持つ場合には殆ど常にDCのみであるため、これが上手く行く。
このために2つのヘルパーDSP関数を追加：dct_dc_8x8とvar2_8x8。それぞれのmmx/sse2/ssse3バージョン。
早期終了は、非常に低いQPでは有用ではないため、無効にされる。
パフォーマンスの向上率はtrellisなしで～1-2%、trellis=2で最大5-6%。
低ビットレートで向上率が大きい。

x264r1174

git-id : 7fd6a9099f18ec028d6c73890258280e6f8a6c02
Author : David Conrad
Date: Fri Jun 26 13:09:44 2009 -0700
Fix bug in checkasm
frame_init_lowres_core check didn't check the C plane.
However, all x86 and PPC assembly was correct regardless of the unit test being incorrect.

checkasmのバグ修正。
frame_init_lowres_coreのチェックはCの領域をチェックしていなかった。
しかしながら、ユニットテストが不正であったにもかかわらず、全てのx86とPPCのセンブリは正しかった。

本来リファレンスコードとしてのC言語版があり、そのアセンブラ版があるわけだけど、元となるはずのC言語版が間違っていてアセンブラ版が正しかった、ってことかな。

x264r1173

git-id : f6d31669a2547110b9c1323aa51437296f2f3506
Author : Jason Garrett-Glaser
Date: Wed Jun 24 14:39:15 2009 -0700
Add subpartition cost for sub-8x8 blocks
Improves sub-p8x8 mode decision.

sub-8x8ブロックに対しサブパーティションコストを追加。
sub-8x8のモード決定を向上。

x264r1172

git-id : b484fe1bff3cb68b3325a9b77d802789cf77e600
Author : Jason Garrett-Glaser
Date: Wed Jun 24 13:24:18 2009 -0700
Yet more CABAC and CAVLC optimizations
Also clean up a lot of pointless code duplication in CAVLC MV coding.

更なるCABACとCAVLCの最適化。
CAVLCのMV符号化で意味無く大量にコードが重複していた部分も整理。

x264r1171

git-id : 2c7cb4c3f111b49d3d961366ed338f38a0555716
Author : Jason Garrett-Glaser
Date: Fri Jun 19 18:49:55 2009 -0700
Various CABAC optimizations and cleanups
Faster CABAC CBF context calculation for inter blocks.
Add x264_constant_p(), will probably be useful in the future as well.
Simpler subpartition functions.
Clean up and optimize mvd_cpn a bit more.
Various other minor optimizations.

様々なCABACの最適化と整理。
interブロックに対するCABACのCBFコンテキスト計算を高速化。
恐らく将来的にも有用であろう、x264_constant_p()を追加。
サブパーティション関数をシンプル化。
mvd_cpnをさらに少し整理し最適化。
その他、様々な小規模の最適化。

x264r1170

git-id : 364d7dff8dd96f71465bed10594b9f1e78fe6139
Author : David Wolstencroft
Date: Sat Jun 20 21:42:55 2009 +0200
AltiVec version of frame_init_lowres_core. 22.4x faster than C on PPC7450 and 25x on PPC970MP.

frame_init_lowres_coreのAltiVecバージョン。CよりPPC7450上で22.4倍、PPC970MP上で25倍高速化。

PowerPCにのみ影響。

x264r1169

git-id : ab85c9b0ae08a237472bfd14558353d5ecb92b3d
Author : Jason Garrett-Glaser
Date: Fri Jun 19 16:03:18 2009 -0700
MMX CABAC mvd sum calculation
Faster CABAC mvd coding.

MMXによるCABACのmvdのsum計算。
CABACのmvd符号化を高速化。

x264r1168

git-id : 803c9d94641e57544932114f61f523e19bba6b4d
Author : Jason Garrett-Glaser
Date: Fri Jun 19 16:02:39 2009 -0700
Faster MV prediction
Smaller code size, plus I get to use goto.

MV予測を高速化。
より小さなコードサイズで、加えてgotoを使うように。

個人的にはgotoのよい使い方だとは思わない。r1166のshufpdと共に、実利目的というよりも「やってみたかったから」の感がある。コードを書く人間として、時にそういう拘りに囚われる気持ちはよく分かるけれど。

x264r1167

git-id : 6199685d989facfd6105c30b50e30615e784fba3
Author : Jason Garrett-Glaser
Date: Wed Jun 10 10:37:01 2009 -0700
Fix potential crash in checkasm
ssim_end4_sse2 requires aligned sums

checkasmで潜在的にクラッシュする可能性を修正。
ssim_end4_sse2はアラインされたsums変数を要求する。

x264r1166

git-id : b555e3f90b2060542d44bcb1a254d5a7bfc5d23a
Author : Jason Garrett-Glaser
Date: Wed Jun 10 10:11:00 2009 -0700
SSSE3, faster SSE2/MMX integral_init4v
The real reason I wrote this was an excuse to use shufpd.

SSSE3とより高速なSSE2/MMXのintegral_init4v。
実のところshufpdを使う口実にこれを書いた。

r1140でshufpsが不当な評価を受けていると言っていた流れだろう。

x264r1165

git-id : 6841c5e2407e60de92c46bc6c649cd2fc4a13d75
Author : Mike Frysinger
Date: Thu Jun 11 08:29:27 2009 +0000
configure check for uclinux

configureのチェックでuclinuxに対応。

ビルドにのみ影響。

"linux*"を"*linux*"にしているのでuc以外でも接頭辞が付くタイプのものに対応。

x264r1164

git-id : a9526973579d7d48bb2730b64a547ad10b7ef6ef
Author : Loren Merritt
Date: Thu Jun 11 08:27:46 2009 +0000
fix a crash on frame width <= 48 pixels

フレームの幅が48ピクセル以下の場合のクラッシュを修正。

x264r1163

git-id : 3f56e271ac3d8a0e054b8b18e63886a6070ef05e
Author : Loren Merritt
Date: Wed May 27 20:47:18 2009 +0000
configure check for cc, rather than reporting lack of compiler as an asm error.
configure check for -mno-cygwin, since it's removed from gcc4.

アセンブラのエラーとしてコンパイラの欠如を報告するのではなく、configureでccをチェック。
-mno-cygwinはgcc4から削除されたのでconfigureでチェック。

ビルドにのみ影響。

-mno-cygwinを使用してもエラーにならない環境なら、自動的に追加されるようになった。

x264r1162

git-id : f7bfcfaa36ac3f04447f668f94db73801ca86e4d
Author : Loren Merritt
Date: Sun May 24 05:01:26 2009 +0000
a better way to keep track of mv candidates.
2-4% faster dia, hex, and umh.

動きベクタ候補情報の保持によりよい方法を採用。
dia, hex, umhが2-4%高速化。

x264r1161

git-id : 8572d517a52cb46692d2fcc0723fdbd01d1878b6
Author : Loren Merritt
Date: Sun May 24 05:01:19 2009 +0000
reorder some motion estimation patterns.
this change is useless on its own, but segregates the bitstream-changing part out of my next optimization.

幾つかの動き見積もりパターンの順番を変更。
この変更は、それ自体は無意味だが、次の最適化にかかるビットストリーム変更部分を分離する。

x264r1160

git-id : ea224ee82377496cd356999b5deadd03a2001514
Author : Loren Merritt
Date: Mon May 25 19:16:05 2009 -0400
Fix VBV warning broken in r915
x264 will now correctly warn about maxrate specified without bufsize even when a level is not set.

r915で壊していたVBVの警告を修正。
これによりx264は、例えレベルが設定されていない場合でも、bufsizeの指定がないmaxrateに関して正しく警告する。

x264r1159

git-id : 3da3f957c096ad7885312b58d7ab12a7fab10111
Author : Loren Merritt
Date: Mon May 25 07:03:10 2009 +0000
configure check for ssse3-capable binutils

SSSE3が使用可能なbinutilsであるかをconfigureでチェック。

ビルドにのみ影響。

SSSE3が使用可能なbinutils（実質的に2.17以上）であることをconfigureの段階で検出。

x264r1158

git-id : a2c01c124e4ee4b9d71a4b2a20ed572d439aea7b
Author : Jason Garrett-Glaser
Date: Sun May 24 16:58:08 2009 -0400
Fix 10L in r1155
Broke --me esa/tesa due to forgetting to add handling for x264_cost_mv_fpel.

r1155の10Lを修正。
x264_cost_mv_fpelの取り扱いを追加することを忘れていたため--me esa/tesaが壊れていた。

もう飽きたので10Lの説明はしない。

x264r1157

git-id : 37241b84054d5d012994c4567270c1b32dd9a038
Author : Jason Garrett-Glaser
Date: Fri May 22 21:28:15 2009 -0700
Fix bug where satd was incorrectly used with subme<=1
Faster subme<=1 with i4x4 enabled.

subme<=1でsatdが不正に使用されていたバグを修正。
i4x4が有効にされている場合のsubme<=1を高速化。

x264r1156

git-id : be17f0306ec42c2fc6e572a50d16c3717c7991ce
Author : Jason Garrett-Glaser
Date: Fri May 22 20:40:27 2009 -0700
Remove some pointless error handling code in cabac/cavlc

cabac/cavlcで的外れなエラーハンドリングをするコードを削除。

x264r1155

git-id : b6470f07f02342d1abf960b1482e3e9e835fbc5d
Author : Jason Garrett-Glaser
Date: Fri May 22 18:40:12 2009 -0700
Save some memory on mv cost arrays
Have quantizers that use the same lambda share the same cost array.

mv（動きベクタ）コストの配列でメモリをいくらか節約。
同じラムダを使うquantizer（量子化器）が同じコストの配列を共有するように。

x264r1154

git-id : 83d2c126382fb5f012e979b1a80f92fd49f35771
Author : Jason Garrett-Glaser
Date: Fri May 22 16:57:33 2009 -0700
Various CABAC and CAVLC optimizations
Backport CAVLC partial-inlining early termination to CABAC (~2-4% faster CABAC residual coding)

CABACとCAVLCの様々な最適化。
CAVLCの部分的インライン版早期終了をCABACにバックポート（CABACのresidual codingを最大2～4%高速化）。

これまで"residual coding"は「（残余コーディング）」と併記してたのだが、正確ではないかも知れないので今後は"residual coding"で書くかも。

ネット上で"residual"を訳している良い参考文献が発見できないのだが、恐らくは、「動き補償で得る参照画素領域と現在MBの差分」という意味。差分を求める手順は引き算のようなもので、動きベクタ検索ではその引き算の結果が0（ゼロ）に近づくような参照画素領域を探す。"residual"はその引き算をした結果、0にならなかった残余、という意味であると理解しているのだが、ちょっと自信がないので原文ママの方がよいかと。

x264r1153

git-id : 7b6ce6a005f791f1d82c369d2f1e1c30b9ccbe80
Author : Loren Merritt
Date: Tue May 19 02:47:15 2009 +0000
fix a race condition at the end of thread_input

thread_inputの終端部の条件競合を修正。

詳しく見てみないとわからないが、今まで特に問題があったという認識はないのだが…？

x264r1152

git-id : fb4e2845497f77e2dae3262e1414de2cdf327ab5
Author : Jason Garrett-Glaser
Date: Mon May 18 22:40:45 2009 -0400
Various trellis speed optimizations

trellisの様々なスピード最適化。

x264r1151

git-id : e86a1734047fb3e3af8dbe17a0e29479e839635b
Author : Jason Garrett-Glaser
Date: Sat May 16 12:16:34 2009 -0700
Make i686 the default arch on x86_32
Disabling asm will default to a generic arch.
Also fix configure for gcc 4.4.

x86_32でi686をデフォルトに。
アセンブラを無効にすると一般アーキテクチャ（訳注：x86一般）をデフォルトにする。
gcc4.4向けにconfigureを修正。

x264r1150

git-id : 0e050663d16cd5b095723bd00136873aa461d5cd
Author : Jason Garrett-Glaser
Date: Fri May 15 20:07:59 2009 -0700
Faster signed golomb coding
3% faster CAVLC RDO and bitstream writing.

符号付きゴロム符号化を高速化。
CAVLC RDOとビットストリームの書き込みが3%高速化。

誤解を恐れず非常に大雑把に言えば、ゴロム符号はハフマン符号と同じようにエントロピー圧縮が行える符号化方式。圧縮率はそう高くないが処理が単純であるため、H.264では前処理などに一部使用されている。印象としては、LHAが基本的にLZ77＋ハフマンであっても、細かな部分で連長圧縮的な処理が入っているのと似ている。（2009/05/27訂正）

x264r1149

git-id : 3b80f2321937ed4962c6e2fcc679765bbf729723
Author : Jason Garrett-Glaser
Date: Thu May 14 04:11:15 2009 -0700
Faster spatial direct MV prediction
unroll/tweak col_zero_flag

spatial direct MV予測（空間軸ダイレクト動きベクタ予測）を高速化。
col_zero_flagにアンロール（訳注：ループアンロールという高速化手法）と調整。

x264r1148

git-id : 400740ba1a8a7a6001aa9b01e9be0b3d0905856e
Author : Jason Garrett-Glaser
Date: Mon May 4 04:19:28 2009 -0700
More CABAC and CAVLC optimizations
Simplified function calling for block_residual_write_(cabac|cavlc) and improved sigmap coding.
Tried making 0/1-bit specific versions of CABAC asm, but benefit was minimal under GCC 4.3.
Helped a decent bit under 3.4, but you shouldn't be using such old versions anyways.

CABACとCAVLCの更なる最適化。
block_residual_write_(cabac|cavlc)の関数呼び出しをシンプルにしsigmap codingを改善。
CABACアセンブラの0/1ビット専用バージョンを作成しようと試みたものの、GCC4.3では利得は小さい。
3.4ではいくらか効果ありだが、いずれにしてもそんな古いバージョンは使用すべきでない。

x264r1147

git-id : f526f244cbc21c245064a2a9c5611840dd3b5203
Author : Jason Garrett-Glaser
Date: Wed Apr 29 22:54:52 2009 -0700
Various optimizations in frametype lookahead

フレームタイプのlookahead（先読み）に様々な最適化。

x264r1146

git-id : ecae94a0ee6689bec9d32d64cce64cab7790a984
Author : Jason Garrett-Glaser
Date: Sun Apr 26 22:13:17 2009 -0700
Some cosmetics/cleanup
Move some macros to x86util.asm that should have been there to begin with.
Fix a typo that didn't cause any issues.

いくつかのコスメティックスと整理。
初めからx86util.asmに置くべきだったいくつかのマクロを移動。
特に問題ないtypoを修正。

問題ないと書いてあるけど

-    if( !h->param.b_cabac );
+    if( !h->param.b_cabac )
         x264_init_vlc_tables();

って問題あるんじゃないの…？

x264r1145

git-id : d2e1e1c35c43ea9c90c9211be6202143b69b35b9
Author : Guillaume Poirier
Date: Tue Apr 21 21:18:44 2009 +0000
fix "incompatible types in initialization" compilation issues with GCC 4.3 (which is stricter than previous compiler version)

GCC4.3でコンパイル時の"incompatible types in initialization"（初期化で互換性のない型が使用されている）という問題を修正。

PowerPCにのみ影響。

またもやキャストによる修正。

x264r1144

git-id : 4fc8c03ad568efe3dd2f57db33b0863d29cb63a0
Author : Guillaume Poirier
Date: Tue Apr 21 17:32:21 2009 +0200
fix conversions between vectors with differing element types or numbers of subparts errors

基本的な型、又は構成数が違う旨のエラーになるベクタ間の変換を修正。

PowerPCにのみ影響。

日本語化するとわかりづらいけど、要はコンパイルエラーになるからキャストを足しましたってこと。

x264r1143

git-id : 755470584932877c4d5ea1b51c2cc2dbd044b7ca
Author : Jason Garrett-Glaser
Date: Sat Apr 18 16:07:53 2009 -0700
Add "coded blocks" stat to output information.
This measures the total percentage of blocks, intra and inter, which have nonzero coefficients.
"y,uvAC,uvDC" refers to luma, chroma DC, and chroma AC blocks.
Note that skip blocks are included in this stat.

"coded blocks"（符号化されたブロック）の統計を出力情報に追加。
これは非ゼロ係数を持つintra/interブロックの総計パーセンテージを測定する。
"y,uvAC,uvDC"はluma, chroma DC, chroma ACブロックをそれぞれ示す。
この統計にはスキップブロックが含まれている事に注意。

x264r1142

git-id : a908c88f0bbd4250afbd16b6c55b3c9f613af96d
Author : Jason Garrett-Glaser
Date: Fri Apr 17 23:38:29 2009 -0700
Enable asm predict_8x8_filter
I'm not entirely sure how this snuck its way out of holger's intra pred patch.

アセンブラのpredict_8x8_filterを有効に。
これが如何にしてholgerのintra予測パッチからその道筋を得ているか、自分は全体を把握していないが。

x264r1141

git-id : c74726e1ed0312e2073f0ea8c45804a870512579
Author : Jason Garrett-Glaser
Date: Fri Apr 17 06:00:39 2009 -0700
Remove various bits of dead code found by CLANG.

CLANGが見つけた死んだコードの様々なビット（訳注：過去に使用されていたが今は無意味なコード）を削除。

x264r1140

git-id : 37424f87a43a2dcecba61bc24b46506ced32307c
Author : Jason Garrett-Glaser
Date: Tue Apr 14 14:47:02 2009 -0700
Slightly faster SSE4 SA8D, SSE4 Hadamard_AC, SSE2 SSIM
shufps is the most underrated SSE instruction on x86.

SSE4 SA8D, SSE4 Hadamard_AC, SSE2 SSIMを僅かに高速化。
shufpsはx86で最も過小評価されているSSE命令だ。

changelogにはよくHadamardという単語が出てくるが、これはアダマール変換を指している。アダマール変換は直交変換の一種で、誤解を恐れず大雑把に言えば、DCTの親戚で計算が非常に軽いもの。H.264に限らず「動き補償＋DCT＋量子化」でエントロピーを削る動画圧縮（現在の殆どの動画圧縮）では、「最終的にDCT＋量子化の結果の係数群が小さくなるような動きベクタ」を求めたい。だがDCTそのものを動き検索時に用いると非常に重くなるので、特性が似て計算が軽いアダマール変換をDCTの代わりに使用し、効率よく適当な動きベクタを求めている。subme2-5のSATD、tesaで使用されるはず。

ちなみにアダマールという名前は、フーリエ変換と同じようにジャック・サロモン・アダマール(Jacques Salomon Hadamard)という人名から来ているが、Hadamardをハダマードと読まずアダマールと読むのは、この人がフランス人で、フランス語ではhを発音しないこと、最後に付いた子音を発音しない事による。

shufpsが過小評価されている…のは仕方ない気もするなぁ。使いにくい命令だと思うし、非常に高速というわけでもない。

x264r1139

git-id : 1024283b0321e53a3e08fddb1411429330bf1731
Author : Jason Garrett-Glaser
Date: Thu Apr 9 02:14:41 2009 -0700
Various CABAC optimizations
Move calculation of b_intra out of the core residual loop and hardcode it where applicable.
Inlining cabac_mb_mvd was unnecessary and wasted tremendous amounts of code size. Inlining only cache_mvd is faster and significantly smaller.

様々なCABACの最適化。
b_intraの計算をresidual（残余）ループのコアから外に出し、適切な場所にハードコーディング。
cabac_mb_mvdのインライン化は不要で、酷いコードサイズの無駄だった。cache_mvdのみのインライン化が速く、有意に小さい。

x264r1138

git-id : fe11a6f39e4e8235d685591ec9c0ec86eca4fee9
Author : Jason Garrett-Glaser
Date: Wed Apr 8 05:45:03 2009 -0700
CAVLC optimizations
faster bs_write_te, port CABAC context selection optimization to CAVLC.

CAVLCの最適化。
bs_write_teを高速化、CABACのコンテキスト選択の最適化をCAVLCに移植。

x264r1137

git-id : 1fda88277f6b2eda27a0f741d58b31532ad0664d
Author : Jason Garrett-Glaser
Date: Sun Apr 5 13:01:42 2009 -0700
Faster CABAC RDO
Since the bypass case is quite unlikely, especially when doing merged sigmap/level coding,
it's faster to use a branch than a cmov.

CABAC RDOを高速化。
バイパスするケースは発生しづらく、統合されたsigmap/levelコーディングを行う際は特にそうであるため、cmovよりも分岐を使用する方が高速である。

x264r1136

git-id : 3f9ba82b9788231c879d31d9c4c8ebf4518d07fe
Author : Jason Garrett-Glaser
Date: Tue Mar 31 10:36:57 2009 -0700
Activate intra_sad_x3_8x8c in lookahead

lookahead（先読み）でintra_sad_x3_8x8cを有効に。

x264r1135

git-id : 9d0c378b235182341edd2e95f01d4fd25132ad50
Author : Jason Garrett-Glaser
Date: Tue Mar 31 10:34:35 2009 -0700
MBAFF interlaced coding is not allowed in baseline profile

ベースラインプロファイルではMBAFFインターレース符号化は不可。

--interlacedの指定で自動・強制的にメインプロファイルになる。

x264r1134

git-id : b8808bf0a35b83c2fac05e07a504ad164c931960
Author : Jason Garrett-Glaser
Date: Mon Mar 30 19:30:59 2009 -0700
intra_sad_x3_8x8 assembly

intra_sad_x3_8x8のアセンブラ化。

ちゃんと把握してないが、1132からの一連の変更は全てintra_sadに掛かっているので、多分subme=0～1にしか影響ないのではなかろうか。

x264r1133

git-id : 8df40a4c41eb5e79d0055f2e5af4b214285b9c8c
Author : Jason Garrett-Glaser
Date: Mon Mar 30 16:37:46 2009 -0700
intra_sad_x3_4x4 assembly

intra_sad_x3_4x4のアセンブラ化。

x264r1132

git-id : d39f8ae1730b07286f1bb281a22d8cd57d0f90b9
Author : Jason Garrett-Glaser
Date: Mon Mar 30 04:07:50 2009 -0700
intra_sad_x3_8x8c assembly
Also fix intra_sad_x3_16x16's use of "n" as a loop variable (broke SWAP)

intra_sad_x3_8x8cのアセンブラ化。
また、intra_sad_x3_16x16でループ変数に"n"を使用（SWAPを破壊）していたのを修正。

x264r1131

git-id : d3ca4647a247186c2df7760be2a9c649efe34815
Author : Jason Garrett-Glaser
Date: Sun Mar 29 18:27:32 2009 -0700
Shave one instruction off CABAC encode_decision
range_lps>>6 ranges from 4-7, so (range_lps>>6)-4 == (range_lps>>6) & 3

CABACのencode_decisionから1つの命令を削減
range_lps>>6は4-7の範囲なので、(range_lps>>6)-4 == (range_lps>>6) & 3である。

下位2bitを取り出す演算を論理積ではなく減算で行う。2進数で考えればわかる。

x264r1130

git-id : 847597773aff56a5612a9edbebb40b350c637edf
Author : Jason Garrett-Glaser
Date: Thu Mar 26 22:22:23 2009 -0700
Faster probe_skip
Add a second chroma threshold after the DC transform.

probe_skipを高速化。
DC変換後に2段目のクロマ閾値を追加。

x264r1129

git-id : c109c8e7db67df7194f2f913a7a4d65217caee26
Author : Jason Garrett-Glaser
Date: Thu Mar 19 12:28:21 2009 -0700
Add missing "static" qualifier to two arrays
Should slightly improve performance.

2つの配列に足りてなかったstatic修飾子を付加した。
僅かにパフォーマンスが向上するはず。

x264r1128

git-id : 682b54d6175f98dfa14fec4d951f4b3b6e686b95
Author : Jason Garrett-Glaser
Date: Tue Mar 17 11:01:57 2009 -0700
SSE2 zigzag_interleave
Replace PHADD with FastShuffle (more accurate naming).
This flag represents asm functions that rely on fast SSE2 shuffle units, and thus are only faster on Phenom, Nehalem, and Penryn CPUs.

SSE2のzigzag_interleave。
PHADDをFastShuffle（より正確な名前）で置き換え。
このフラグは高速なSSE2シャッフルユニットに依存するアセンブラ機能を表しており、そのためPhenom, Nehalem, Penryn CPUでのみ高速。

x264内部で、CPU判別の結果として、例えば「SSEが使用可能」というフラグがあるのと同様、「PHADD命令が速い」というフラグがある。これを「FastShuffleが使用可能」という実際の意味をより正確に表すフラグに置き換えたということ。

FastShuffle(SSE2 shuffle units)はSuper Shuffle Engineのことを指しているのだろう。これは新規のx86命令そのものや特定の命令の改善ではなく、内部のμOPs（uOPs・マイクロオペコード）の効率改善で既存のx86命令がある範囲で高速化（実行にかかるクロック数が減少）するものだ。

初代Pentium以降のIA-32アーキテクチャにおいて、x86命令はCPUレベルで仮想化されている。CPUの外部仕様的にはx86命令を実行してると見なせるが、内部では別の命令体系＝μOPsに分解した後実行される。Super Shuffle EngineはそのうちのシャッフルμOPを高速化するもので、このシャッフル内部命令を使用するx86命令は全て高速化するが、結果的にSSE2以降のSIMD命令が高速化するので、上記のような表現になっている。

x264r1127

git-id : 8d82fecc3377b3052279f038f2273ade3a5b65cc
Author : Jason Garrett-Glaser
Date: Mon Mar 9 23:37:53 2009 -0700
Faster integral_init
palignr to avoid unaligned loads is worth it in inith, but not initv.

integral_initを高速化。
非アラインロードを避けるためのpalignrはinithでは価値があるが、initvではない。

x264r1126

git-id : 96733ab692b4a268685d65070d6977964a466c91
Author : Holger Lubitz
Date: Mon Mar 9 14:05:16 2009 -0700
Faster SSSE3 hpel_filter_v
~10% faster hpel_filter on 64-bit Penryn.
32-bit version by Jason Garrett-Glaser.

SSSE3のhpel_filter_vを高速化。
64bit Penrynでhpel_filterが～10%高速化。
32bitバージョンはJason Garrett-Glaserによる。

x264r1125

git-id : 10d6ef07409ebe38b5f1e8e4516155a2fe66d4c6
Author : Jason Garrett-Glaser
Date: Sat Mar 7 16:43:09 2009 -0800
Faster SSE2 pixel_var
Optimized using the DEINTB method from r1122. ~32% faster var_16x16 on Conroe.

SSE2 pixel_varを高速化。
r1122からのDEINTBの方法を使用し最適化、Conroeでvar_16x16が～32%高速化。

x264r1124

git-id : e1c7f12cb828072b2b5b096195ff3004b83c7785
Author : Jason Garrett-Glaser
Date: Sat Mar 7 00:27:27 2009 -0800
SSSE3 hpel_filter_v
Optimized using the same method as in r1122. Patch partially by Holger.
~8% faster hpel filter on 64-bit Nehalem

SSSE3版のhpel_filter_v。
r1122と同様の方法を使用し最適化。パッチは一部Holgerによる。
64bit Nehalemでhpel filterが～8%高速化。

x264r1123

git-id : 3d780622c1ff19bc3d6a522a65879779a7ddb3dd
Author : Jason Garrett-Glaser
Date: Fri Mar 6 18:57:15 2009 -0800
Update some asm copyright headers

いくつかのアセンブラのコピーライトヘッダを更新。

ソース上の著作権コメントの更新なのでバイナリには一切影響がない。

x264r1122

git-id : 2dca5f5413051a26cbba4e20f3c77ff69b694ba3
Author : Holger Lubitz
Date: Fri Mar 6 18:16:30 2009 -0800
Vastly faster SATD/SA8D/Hadamard_AC/SSD/DCT/IDCT
Heavily optimized for Core 2 and Nehalem, but performance should improve on all modern x86 CPUs.
16x16 SATD: +18% speed on K8(64bit), +22% on K10(32bit), +42% on Penryn(64bit), +44% on Nehalem(64bit), +50% on P4(32bit), +98% on Conroe(64bit)
Similar performance boosts in SATD-like functions (SA8D, hadamard_ac) and somewhat less in DCT/IDCT/SSD.
Overall performance boost is up to ~15% on 64-bit Conroe.

SATD/SA8D/Hadamard_AC/SSD/DCT/IDCTの広範な高速化。
Core2とNehalemに対し重点的に最適化、ただしパフォーマンスは現代のx86CPUの全てにおいて向上するだろう。
16x16 SATD：K8(64bit)で+18%、K10(32bit)で+22%、Penryn(64bit)で+42%、Nehalem(64bit)で+44%、P4(32bit)で+50%、Conroe(64bit)で+98%の速度向上。
同様のパフォーマンス向上はSATDに似た関数(SA8D、hadamard_ac)と、やや小規模ながらもDCT/IDCT/SSDにも及ぶ。
全体でのパフォーマンス向上は64bitのConroeで最大～15%になる。

x264r1121

git-id : f3872178768cca2973f759c479e26f3ac35e55fe
Author : Jason Garrett-Glaser
Date: Fri Mar 6 15:28:47 2009 -0800
Update x264 copyright date

x264のコピーライト表記の日付を更新。

"2003-2008"という部分を"2003-2009"としただけ。ただしx264_sei_version_writeでの話なので注意。

x264r1120

git-id : 8544346a43456720f07e6a438cfbb0d84b39779a
Author : Jason Garrett-Glaser
Date: Wed Mar 4 03:16:06 2009 -0800
Remove pre-scenecut from fprofile commands as well
Also add psy-trellis to fprofile

pre-scenecutをfprofileコマンドから削除。
また、psy-trellisをfprofileに追加。

テスト対象（実行時間計測用の設定）に関する変更なので一般のユーザには関係ない。

x264r1119

git-id : 6f0b2a9b18f3af3fd7e495640756e1d5e43343e1
Author : Jason Garrett-Glaser
Date: Tue Mar 3 16:21:52 2009 -0800
Slightly faster 8x16 SAD on Penryn Core 2
Same as MMX 8x16 cacheline SAD, but calls SSE2 8x16 SAD in non-cacheline case.
Only Nehalem benefits from sizes smaller than 8x16, and Nehalem doesn't use cacheline functions, so no smaller versions are included.

Penryn Core 2上での8x16 SADを僅かに高速化。
MMX 8x16 cacheline SADと同様、ただしnon-cachelineの場合SSE2 8x16 SADを呼び出す。
8x16より小さいサイズでNehalemのみが利得を得るが、Nehalemはcacheline関数（機能？）を使用しないため、結局のところ小さなバージョンは含まれない。

TODO:内容が若干曖昧なので後でdiffを確認する。

x264r1118

git-id : 1cc16dcee61496a5fb4da80d7605c1c88e2d371d
Author : Jason Garrett-Glaser
Date: Thu Feb 26 19:50:09 2009 -0800
Fix scenecut and VBV with videos of width/height <= 32
Also remove an unused variable

width/height <= 32のビデオにおけるscenecutとVBVを修正。
不使用の変数を削除。

幅・高さが32px以下の場合、つまり殆ど発生しないケースの対処。

x264r1117

git-id : 3e4946f305317856ed79e0898f25f10859df22ed
Author : Jason Garrett-Glaser
Date: Thu Feb 26 14:29:50 2009 -0800
Remove non-pre scenecut
Add support for no-b-adapt + pre-scenecut (patch by BugMaster)
Pre-scenecut was generally better than regular scenecut in terms of accuracy and regular scenecut didn't work in threaded mode anyways.
Add no-scenecut option (scenecut=0 is now no scenecut; previously it was -1)
Fix an incorrect bias towards P-frames near scenecuts with B-adapt 2.
Simplify pre-scenecut code.

non-pre scenecutを除去。
no-b-adapt + pre-scenecutのサポートを追加（パッチはBugMasterによる）。
pre-scenecutは精度的に通常のscenecutよりも良好であったし、かつ、いずれにせよ通常のscenecutはスレッドモードでは働いていなかった。
no-scenecutオプション（scenecut=0がno-scenecutになる・以前は-1だった）を追加。
B-adapt 2使用時にscenecut付近のPフレームにかかる誤ったバイアスを修正。
pre-scenecutコードを単純化。

詳しく見ていないが、これまでpre-scenecutと通常のscenecutがあったのを、pre-scenecutに一本化した模様。通常のscenecutのコード（pre非指定時のみ動作していた部分）がごっそり削除され、pre指定時のみ動作するコードが通常のscenecutとして動作するようになっている。

x264r1116

git-id : 7ddb2c7da0621bb853b6702e6f59619c2d1c6a08
Author : Guillaume Poirier
Date: Tue Mar 3 07:44:18 2009 -0800
Add AltiVec version of hadamard_ac. 2.4x faster than the C version.
Note this this implementation is pretty naive and should be improved
by implementing what's discussed in this ML thread:
date: Mon, Feb 2, 2009 at 6:58 PM
subject: Re: [x264-devel] [PATCH] AltiVec implementation of hadamard_ac routines

hadmard_acのAltiVecバージョンを追加。Cバージョンより2.4倍高速。
この実装は少々愚直であり以下のMLのスレッドで議論されている物を実装し改善されるべきであることに注意。
date: Mon, Feb 2, 2009 at 6:58 PM
subject: Re: [x264-devel] [PATCH] AltiVec implementation of hadamard_ac routines

PowerPCにのみ影響。

その当該ML。

x264r1115

git-id : 11863ace50e918ec75f7c8e22907ebf1000820e1
Date: Thu Feb 26 12:07:56 2009 -0800
Fix regression in r1085
Deblocking was very slightly incorrect with partitions=all.
Bug found by BugMaster.

r1085でのレグレッション（訳注：エンバグ）を修正。
partitions=allの場合にデブロッキングがほんの僅かに間違っていた。
バグはBugMaster氏による発見。

x264r1114

git-id : a933a3e6a6be918a2ae56e3d94ecea29143b9ea5
Author : Jason Garrett-Glaser
Date: Mon Feb 16 05:56:12 2009 -0800
Optimize neighbor CBP calculation and fix related regression
r1105 introduced array overflow in cbp handling

neighbor CBP（近傍CBP）の計算を最適化、関連するレグレッションの修正。
r1105がCBPの扱いに配列のオーバーフローを組み込んでいた。

x264r1113

git-id : cc4f807796220a042e386da259ddc38e6ca8e43b
Date: Fri Feb 13 16:30:14 2009 -0800
Show FPS when importing a raw YUV file

raw YUVファイルのインポート時にFPSを表示。

x264r1112

git-id : f43e22a7873ea3811bbc15e30d67681f23249087
Date: Wed Feb 11 10:38:56 2009 -0800
Windows 64-bit support
A "make distclean" is probably required after updating to this revision.

Windows 64bitのサポート。
このリビジョンにアップデートした後は恐らく"make distclean"する必要がある。

32bitにも影響が無いとは限らないので注意しておくべき。

x264r1111

git-id : d56e13f9016b898a3bd4043b26c2e70c2bb9f6c4
Date: Wed Feb 11 10:35:56 2009 -0800
Minor fixes and cosmetics
Suppress a GCC warning, fix a non-problematic array overflow, one REP->REP_RET.

小規模の修正とコスメティックス。
GCCの警告を抑制、問題のない配列のオーバーフローを修正、1つのREPをREP_RETに。

x264r1110

git-id : 05afd8e02bd83c6c9eba4d41b8d829a383689117
Date: Tue Feb 10 12:06:47 2009 -0800
fix 10l in 75b495f2723fcb77f
Original thread:
date: Mon, Feb 9, 2009 at 9:37 PM
subject: [x264-devel] commit: Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors . (Guillaume Poirier )

75b495f2723fcb77fでの10L修正。
オリジナルスレッド:
date: Mon, Feb 9, 2009 at 9:37 PM
subject: [x264-devel] commit: Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors . (Guillaume Poirier )

r1109での下らないミス（エンバグ）の修正。もう10Lの説明はいらないよね？オリジナルスレッドとはMLでのgitからのcommit通知メールのスレッドのこと。

x264r1109

git-id : 75b495f2723fcb77fe7d5c92511136d3fea4cf13
Date: Mon Feb 9 21:17:33 2009 +0100
Spare a vec_perm and a vec_mergeh though using a LUT of permutation vectors.

permutationベクタ（置換ベクタ）にLUTを使用しvec_permとvec_mergehを使用しない。

PowerPCにのみ影響。

実際に使わなくしてるのはvec_lvslとvec_mergehで、vec_permは分割実行しなくなっただけで最終的に使ってる。

x264r1108

git-id : 37f98cb85024d288eab5508a3b04ca1324335693
Date: Mon Feb 9 21:12:23 2009 +0100
Promote chroma planes to 16 byte alignment.
This will allow simplifying vectors loads that can only load 16-bytes
aligned data (such as AltiVec).

chromaプレーンの16byteアラインメントを推進。
これは16byteアラインされたデータのみを読み込める（AltiVecのような）ベクタ読み込みの単純化を可能にする。

x264r1107

git-id : b5b9728b9b1cf9e4e54092515fd9fa86cd9023a4
Date: Mon Feb 9 11:30:54 2009 -0800
Fix 10L in intra pred
Forgetting a %define resulted in SIGILL on 32-bit systems without SSE (e.g. Athlon XP).

intra予測の10Lを修正。
%defineを忘れていたので結果としてSSEのない（例えばAthlonXP）32bitシステムでSIGILL（不正命令シグナル）になっていた。

r1103の修正。SIGILLは一般にクラッシュになる。使用するレジスタも間違っていた模様。

x264r1106

git-id : 0ee50db35db3e5af0d40936dbc5ff7e2478b1a2c
Date: Sun Feb 8 23:36:40 2009 -0800
Add decimation in i16x16 blocks
Up to +0.04db with CAVLC, generally a lot less with CABAC.

i16x16ブロックにdecimationを追加
CAVLCで最大0.04db増加、CABACでは通常（利得が）もっと少ない。

ソースには"Writing the 16 CBFs in an i16x16 block is quite costly, so decimation can save many bits."とのコメントあり。所詮はdecimationなのでPSNRは上がったとしても視覚上の画質が必ず上がるかは微妙。細かなオブジェクトのある実写系では特に。

x264r1105

git-id : 9bf45f6d397559486b5fe038c3847b0d35c61728
Date: Sat Feb 7 02:27:16 2009 -0800
Much faster CABAC residual context selection
Up to ~17% faster CABAC RDO, ~36% faster intra-only CABAC RDO.
Up to 7% faster overall in extreme cases.

CABAC残余(residual)コンテキスト選択をかなり高速化。
CABAC RDOを～17%高速化、intraのみのCABAC RDOを～36%高速化。
極端なケースでは全体で7%の高速化。

cabac_mb_cbf_ctxidxincがスッキリした。条件分岐が多数除去されパイプラインが良く通りそうだ。

x264r1104

git-id : 3927938121ef63d72d9fd429c25202ebd65dd208
Date: Sat Feb 7 01:57:43 2009 -0800
Faster coeff_last64 on 32-bit

32bitでのcoeff_last64を高速化。

x264r1103

git-id : 32615747d3ab5648b666b4f55531f47b3c075521
Date: Fri Feb 6 02:59:36 2009 -0800
More intra pred asm optimizations
SSSE3 version of predict_8x8_hu
SSE2 version of predict_8x8c_p
SSSE3 versions of both planar prediction functions
Optimizations to predict_16x16_p_sse2
Some unnecessary REP_RETs -> RETs.
SSE2 version of predict_8x8_vr by Holger.
SSE2 version of predict_8x8_hd.
Don't compile MMX versions of some of the pred functions on x86_64.
Remove now-useless x86_64 C versions of 4x4 pred functions.
Rewrite some of the x86_64-only C functions in asm.

intra予測アセンブラの最適化。
predict_8x8_huのSSSE3バージョン。
predict_8x8c_pのSSE2バージョン。
両planar予測関数のSSSE3バージョン。
predict_16x16_p_sseへの最適化。
いくつかの不要なREP_RETをRETに。
Holgerによるpredict_8x8_vrのSSE2バージョン。
predict_8x8_hdのSSE2バージョン。
x86_64上でいくつかの予測関数のMMXバージョンをコンパイルしない。
4x4予測関数のx86_64 Cバージョンを削除。
いくつかのx86_64専用C関数をアセンブラで書き直し。

「両planar予測関数」の「両」とは8x8と16x16のこと、かな？

x264r1102

git-id : 711e6e87967aa3813a894fdfcd1e2b7eb48328a6
Date: Sun Feb 8 21:35:51 2009 +0100
Speed-up mc_chroma_altivec by using vec_mladd cleverly, and unrolling.
Also put width == 2 variant in its own scalar function because it's faster
than a vectorized one.

vec_mladdを上手く使い、加えてアンローリング（訳注：ループアンローリングという高速化の手法）することでmc_chroma_altivecを速度向上。
ベクタ化するよりも速いため、スカラ関数にwidth == 2の変形版を置いた。

PowerPCにのみ影響。

x264r1101

git-id : b69548aa3a0218ba1d4f934edcf8942f2b1682f5
Date: Wed Feb 4 12:46:17 2009 -0800
Merging Holger's GSOC branch part 2: intra prediction
Assembly versions of most remaining 4x4 and 8x8 intra pred functions.
Assembly version of predict_8x8_filter.
A few other optimizations.
Primarily Core 2-optimized.

HolgerのGSOCブランチの統合その２：intra予測
残る3x3と8x8の殆どのintra予測関数のアセンブラバージョン。
predict_8x8_filterのアセンブラバージョン。
少数のその他の最適化。
主にCore2に最適化。

GSOCはGoogle Summer of Codeか？diffは大きめ。

x264r1100

git-id : 122a54a0dbbfd5a8b649a6ca7eb0b7d3c42f89aa
Date: Wed Feb 4 10:04:55 2009 +0000
10l: fix compilation with GCC 4.3+

10L: GCC4.3以上でのコンパイルを修正。

PowerPCにのみ影響。

r1042と同様に、r1097における下らないケアレスミスの修正。10Lと表す事が流行している模様。

最終更新時間：2009年10月24日 03時08分36秒

ショートカット

最近の更新

リンク