このページの全ては誤っているかもしれません。[[x264関連の記事に関して]]を読んでください。

!!!x264-changelog-jp r1000-r1099

r1000-r1099のchangelogの日本語訳。その他のリビジョンと注意事項については[[x264-changelog-jp]]へどうぞ。

次:[[x264-changelog-jp r1100-r1199]]

!x264r1099
{{bq
git-id : c0be8106d40b2ccbfec37229afaecf236b03762c
Date: Sat Jan 31 05:00:39 2009 -0800

Faster 8x8dct+CAVLC interleave
Integrate array_non_zero with the CAVLC 8x8dct interleave function.
Roughly 1.5-2x faster than the original separate array_non_zero method.
}}

{{bq
8x8dct + CAVLC interleaveを高速化。
array_non_zeroをCAVLC 8x8dct interleave関数に統合。
オリジナルの別途array_non_zeroを行う方法よりも大雑把に1.5-2倍の高速化。
}}

!x264r1098
{{bq
git-id : 91bffdcc24e59620cc09aa4288f7e0c0b74c8891
Date: Sat Jan 31 01:00:26 2009 -0800

Measure CBP cost in i8x8 RD refinement
~0.02-0.05db PSNR gain at high quants in intra-only encoding, pretty small otherwise.
Allows a small optimization in i8x8 encoding.
}}

{{bq
i8x8のRD refinmentでCBP(Coded Block Pattern)コストを計測。
intraのみのエンコード、かつ高度に量子化する場合で～0.02-0.05dbのPSNRの利得、その他では若干小さくなる。
i8x8エンコードに小さな最適化を含む。
}}

!x264r1097
{{bq
git-id : e404f350afc3c52a99d13d22f862cd2ede1438b0
Date: Sun Feb 1 20:58:00 2009 +0100

Take advantage of saturated signed horizontal sum instructions in
the variance computation epilogue since there won't be any overflow
triggering an overflow.
Suggested by Loren Merritt
}}

{{bq
variance（分散）計算のエピローグ部において、オーバーフローを引き起こすオーバーフローはないため、水平方向の符号付き飽和積算命令を上手く利用。
Loren Merrittによる提案。
}}
PowerPCにのみ影響。

!x264r1096
{{bq
git-id : 4c171c3b1c803a173ace823275882b3c9a2ecd24
Date: Fri Jan 30 03:40:54 2009 -0800

Massive overhaul of nnz/cbp calculation
Modify quantization to also calculate array_non_zero.
PPC assembly changes by gpoirior.
New quant asm includes some small tweaks to quant and SSE4 versions using ptest for the array_non_zero.
Use this new feature of quant to merge nnz/cbp calculation directly with encoding and avoid many unnecessary calls to dequant/zigzag/decimate/etc.
Also add new i16x16 DC-only iDCT with asm.
Since intra encoding now directly calculates nnz, skip_intra now backs up nnz/cbp as well.
Output should be equivalent except when using p4x4+RDO because of a subtlety involving old nnz values lying around.
Performance increase in macroblock_encode: ~18% with dct-decimate, 30% without at CRF 25.
Overall performance increase 0-6% depending on encoding settings.
}}

{{bq
nnz/cbp計算の大きなオーバーホール。
array_non_zeroも計算するように量子化を修正。
gpoirior（人名）によるPowerPCアセンブラの変更。
量子化に関して、array_non_zeroにptestを使用するSSE4バージョンといくつかの小さな改善を含んだ新しい量子化アセンブラ。
nnz/cbp計算を直接エンコードに統合し、多くの不要なdequant/zigzag/decimate/その他の呼び出しを回避するためにこの新規機能を使用。
また、アセンブラに新しいi16x16 DC専用iDCT（逆DCT）を追加。
これによりintraエンコーディングはnnzを直接計算するため、skip_intraは同様にnnz/cbpをバックアップする。
散らばっていた旧nnz値を細かく拾っているため、p4x4+RDOを使用する場合以外は、出力は同一のはず。
macroblock_encodeの速度向上は、CRF25のdct-decimate使用時で～18%、不使用時で30%。
全体での速度向上はエンコード設定次第で、0-6%。
}}
diffは追い切れない程の量だが、上記の通りMBのエンコーディング、量子化周りが大きく変わっている。

!x264r1095
{{bq
git-id : 8ae672fe2e0f779e57abe560bafdf41ec1fea533
Date: Thu Jan 29 01:28:12 2009 -0800

Add PowerPC support for "checkasm --bench", reading the time base register.
This isn't ideal since the `time base' register is running at a fraction
of the processor cycle speed, so the measurement isn't as precise as x86's
rdtsc.
It's better than nothing though...
}}

{{bq
"checkasm --bench"にPowerPCのサポートを追加、time base（時間ベースの）レジスタを読む。
'time base'レジスタはプロセッサ周波数の分数で進行するために理想的ではなく、その測定はx86のrdtscに比して精密ではない。
しかし、何もないよりはマシ。
}}
PowerPCにのみ影響。

PowerPCの[mftb命令|http://www.ibm.com/developerworks/jp/linux/library/pa-timebase/index.html]を使用して実行時間を測る。x86のrdtscはクロックと1:1だがmftbはOSがN:1(fraction：分数)に設定する。x264は1クロック単位での最適化を行っているため精度が十分でないと言っている。mftb（rdtscも）は64bit値なので1:1でも3GHzで約194年分の時間を表せるのだが、OS依存なので仕方ない。

!x264r1094
{{bq
git-id : c9095eb2792b18a264569d46a3d923eafe333f45
Date: Thu Jan 29 04:35:34 2009 +0000

fix detection of pthread and isfinite on OpenBSD
}}

{{bq
OpenBSDにおけるpthreadとisfiniteの検出を修正。
}}
OpenBSDにのみ影響。

!x264r1093
{{bq
git-id : 1df50b9287c83d5443d19482345b6842b78081c3
Date: Tue Jan 27 05:42:51 2009 +0000

remove $ECHON kludge, which broke on SunOS. bring back `gcc -MT`.
remove auto-reconfigure on svn update, which has done nothing since we stopped using svn.
fix $AS on sparc (was disabled by mmx check).
fix --extra-asflags (was ignored).
mark bash scripts as bash, not sh

patch partly by Greg Robinson and Jugdish.
}}

{{bq
SunOSで動かず、上手くないコードだった$ECHONを削除。'gcc -MT'に戻した。
svnの使用を止めたために意味のない、svn update時のauto-reconfigureを削除。
sparcでの$ASを修正（mmxチェックで無効にされていた）。
--extra-asflagsを修正（無視されていた）。
bashスクリプトをshでなくbashと明示。

パッチは一部Greg RobinsonとJugdishによる。
}}
'gcc -MT'にしたことでgcc2.9xでmakeできなくなるかも知れない。猫科研究所ではgcc3.x以上でしかビルドしたことがないので、これまでは可能だったのか定かでない。

!x264r1092
{{bq
git-id : 60f4cd8936af4997cfbd9650ea27152df00c5669
Date: Mon Jan 26 14:28:48 2009 +0000

1.6x faster satd_c (and sa8d and hadamard_ac) with pseudo-simd.
60KB smaller binary.
}}

{{bq
satd_c（とsa8dとhadamard_ac）を疑似SIMDで高速化。
バイナリが60KB小さく。
}}

!x264r1091
{{bq
git-id : 71c5a8dca6e5f7bf2330d028989eeeab27701151
Date: Tue Jan 27 23:27:56 2009 -0800

Hack around a potential failure point in VBV
pred_b_from_p can become absurdly large in static scenes, leading to rare collapses of quality with VBV+B-frames+threads.
This isn't a final fix, but should resolve the problem in most cases in the meantime.
}}

{{bq
VBVの潜在的な問題点周辺のHack。
pred_b_from_pは固定的なシーンでやたら大きくなり得るため、稀にVBV + B-frames + threadsで画質の破綻に繋がる。
これは最終的な修正ではないが、ひとまず殆どのケースで問題を解決するだろう。
}}
「固定的なシーン」は、ソース上には"In some cases, such as completely blank scenes"とあり、pred_b_from_pをキャップしている。単に動きの少ないシーンだけでなく非常に暗いシーン等もマシになるのかも知れない。

!x264r1090
{{bq
git-id : 8e2c4b76dd49c44cedf46343e46b292d5a0ca39e
Date: Mon Jan 26 23:43:25 2009 -0800

Much faster chroma encoding and other opts
~15% faster chroma encode by reorganizing CBP calculation and adding special-case idct_dc function, since most coded chroma blocks are DC-only.
Small optimization in cache_save (skip_bp)
Fix array_non_zero to not violate strict aliasing (should eliminate miscompilation issues in the future)
Add in automatic substitutions for some asm instructions that have an equivalent smaller representation.
}}

{{bq
chromaエンコーディングをかなり高速化＆その他の最適化。
CBPの計算の再構成と、殆どの符号化chromaブロックがDCのみであることから個別ケースのidct_dc関数を追加することにより、chromaエンコードを～15%高速化。
cache_save(skip_bp)に小さな最適化。
strict aliasingに違反しないようにarray_non_zeroを修正（将来的なコンパイル時の問題を排除）。
より小さな同等表現を持ついくつかのアセンブラ命令に対して自動的な代替を追加。
}}
strict aliasingとは、例えばint変数を別名(alias)のshort*で操作したり''しない''厳密(strict)で堅いコードであるということ。キャストに頼った力ずくのメモリ操作をしていないとも言える。gcc3.x以上では-O2以上でこれが仮定され、最適化に利用される。困る場合は-fno-strict-aliasingで「行儀が悪いです」と宣言する。

!x264r1089
{{bq
git-id : 355c445f222eae4f953f1be9f8e1d44a2a07d639
Date: Mon Jan 26 06:28:23 2009 -0800

add AltiVec implementation of x264_mc_copy_w16_aligned
}}

{{bq
x264_mc_copy_w16_alignedのAltiVec実装を追加。
}}
PowerPCにのみ影響。

!x264r1088
{{bq
git-id : 71ac0a34bc0460bf67da68f300e4150bc50d9aae
Date: Fri Jan 23 13:53:06 2009 -0800

add AltiVec implementation of x264_pixel_var_16x16 and x264_pixel_var_8x8
}}

{{bq
x264_pixel_var_16x16とx264_pixel_var_8x8のAltiVec版を追加。
}}
PowerPCにのみ影響。

!x264r1087
{{bq
git-id : 1959672cf6f18da888940261916dbf81248e0598
Date: Fri Jan 23 01:11:20 2009 -0800

add AltiVec 16 <-> 32 bits conversions macros
}}

{{bq
AltiVecの16bit<->32bit変換マクロを追加。
}}
PowerPCにのみ影響。

!x264r1086
{{bq
git-id : 39a279613d10fa4dbe608f1a2af1eb86686033af
Date: Mon Jan 19 21:29:27 2009 +0100

Replace 16x16=>32 mul + pack + add by a simple 16x16=>16 multiply-add.
Suggested by Loren.
}}

{{bq
16x16 => 32 mul + pack + add をシンプルな 16x16 => 16 multiply-addで置き換え。
Lorenからの提案。
}}
PowerPCにのみ影響。

r1083のAltiVecで32bit演算を使用していたので本来の16bit演算へ変更。速度的にはケースバイケース。

!x264r1085
{{bq
git-id : 5fb6417309febda6f73a98494cb935379129e15d
Date: Mon Jan 19 15:17:53 2009 -0800

Eliminate support for direct_8x8_inference=0
The benefit in the most extreme contrived situation was at most 0.001db PSNR, at the cost of slower decoding.
As this option was basically useless, it was a waste of code and prevented some other useful optimizations.
Remove some unused mc code related to sub-8x8 partitions.
Small deblocking speedup when p4x4 is used.
Also remove unused x264_nal_decode prototype from x264.h.
}}

{{bq
direct_8x8_inference=0のサポート廃止。
最も極端で不自然なケースにおいても最大0.001db程度のPSNRの利得で、代償にデコードが遅くなっていた。
このオプションは基本的に無用であり、コードの無駄で他の有用な最適化を妨げていた。
いくつかのsub-8x8パーティションに関わる不使用のmc（動き補償）コードを削除。
p4x4使用時のデブロッキングが少し速度向上。
また、x264.hから不使用であるx264_nal_decodeのプロトタイプを削除。
}}
X264_BUILDが66に。

--direct-8x8がオプションから消えた。commitdiffの理解が追いついていないが、恐らくdirect4x4の廃止≒--direct-8x8を1に固定したような動作。

!x264r1084
{{bq
git-id : a48d1d0a2ad590d041b79bb152ed47d00451ba8d
Date: Mon Jan 19 05:14:53 2009 -0800

Add AltiVec and CPU numbers detection on OpenBSD.
}}

{{bq
OpenBSDでのAltiVecとCPU数の検出を追加。
}}
OpenBSDにのみ影響。

!x264r1083
{{bq
git-id : 09e76c903d3419619ed326a4dd114369a55bdd6e
Date: Sun Jan 18 22:44:14 2009 +0100

Add AltiVec implementation of predict_8x8c_p. 2.6x faster than scalar C.
}}

{{bq
predict_8x8c_pのAltiVec実装を追加。スカラであるCより2.6倍高速。
}}
PowerPCにのみ影響。

!x264r1082
{{bq
git-id : bde164d50f1936b28a9cd66f8be8d9995cc5a01b
Date: Sat Jan 17 15:16:37 2009 -0500

Warn if direct auto wasn't set on the first pass
And, if it wasn't, run direct auto as if it was the first pass, rather than simply forcing temporal direct mode on all frames.
Also a small tweak to coeff_level_run asm.
}}

{{bq
1st pass（最初のパス）でdirect autoが設定されていない場合に警告。
そしてその場合に、全てのフレームで単に強制的にtemporal directモード（訳注：時間方向でのdirectモード）とするのではなく、1st passであるかのようにdirect autoを実行する。
coeff_lecel_runのアセンブラにも小さな改善。
}}

!x264r1081
{{bq
git-id : 201f7ad8ad50ff460f79cb44e0bee6aebbf039ca
Date: Sat Jan 17 12:52:28 2009 +0000

Changes the PowerPC ppccommon.h header so it no longer checks for a particular
OS such as Linux but instead looks for HAVE_ALTIVEC_H being set.
Fixes all *BSD/PowerPC builds.
}}

{{bq
PowerPCのヘッダppccommon.hを変更、今後はLinux等の特定のOSに対するチェックはせず、その代わりHAVE_ALTIVEC_Hがセットされているかを見る。
これは全ての*BSD/PowerPCビルドを修正する。
}}
PowerPCにのみ影響。機能上の影響なし。

!x264r1080
{{bq
git-id : 56e91836a12f1b119fb4aae43182f2fc012f1eca
Date: Wed Jan 14 21:56:31 2009 +0100

update x264_hpel_filter_altivec's prototype to match the one of the C version.
It changed in commit 045ae4045a1827555b3eaab4fbf3c9809e98c58f (factorization of mallocs)
(NB: Altivec implementation wasn't allocating and writing to any scratch memory.)
}}

{{bq
x264_hpel_filter_altivecのプロトタイプをC言語バージョンと一致させた。
これは045ae4045a1827555b3eaab4fbf3c9809e98c58fのコミット（mallocの除去）で変更された。
（NB: Altivec実装はスクラッチメモリを確保も、書き込みもしなかった。）
}}
PowerPCにのみ影響。機能上の影響なし。

!x264r1079
{{bq
git-id : d2b6423db8310e5238fd7c0e517b3344578cc08a
Date: Wed Jan 14 21:49:42 2009 +0100

rename vector+array unions to closer match the vector typedefs names.
}}

{{bq
vector+array共用体をより適切なベクタtypedef名に変更。
}}
PowerPCにのみ影響。機能上の影響なし。

変数の型名をvect_ushort_u→vec_u16_u等とよりC99的で一意な物に変更している。

!x264r1078
{{bq
git-id : 264e447cc4ce267d7e4d078b080716093a78a2c8
Date: Wed Jan 14 21:13:58 2009 +0100

Add Altivec implementation of all the remaining 16x16 predict routines.
}}

{{bq
残っていた全ての16x16の予測ルーチンのAltivec版を実装。
}}
PowerPCにのみ影響。

!x264r1077
{{bq
git-id : e46f64824a718ca146724d6cfa104aaba16eb169
Date: Tue Jan 13 21:11:50 2009 -0500

Cache ref costs and use more accurate MV costs
New MV costs should improve quality slightly by improving the smoothness of the field of MV costs (and they're closer to CABAC's actual costs).
Despite being optimized for CABAC, they still help under CAVLC, albeit less.
MV cost change by Loren Merritt
}}

{{bq
refコストをキャッシュしより精密なMVコストを使用。
新MVコストはMVコストのフィールド（値）のなめらかさを向上し（CABACの実際のコストに近づき）、質を僅かに向上するはず。
CABACに最適化されるにも関わらず、より小さいながら、CAVLCでも効果がある。
MVコストの変更はLoren Merrittによる。
}}

!x264r1076
{{bq
git-id : 79bfb039de253c986986fbd99935c0d4a95ad503
Date: Tue Jan 13 20:22:36 2009 -0500

Support forced frametypes with scenecut/b-adapt
This allows an input qpfile to be used to force I-frames, for example.
The same can be done through the library interface.
Document the format of the qpfile in --longhelp and the forcing of frametypes in x264.h
Note that forcing B-frames and B-refs may not always have the intended result.
Patch partially by Steven Walters .
}}

{{bq
scenecut/b-adapt使用時のフレームタイプの強制をサポート。
これは例えば、Iフレームを強制するための指定をqpfileを使用した入力で可能にする。
同じことはライブラリインターフェースからも可能。
--longhelpにqpfileのフォーマットについて、x264.hにフレームタイプの強制について記述。
BフレームとB参照の強制は常に意図した結果になるとは限らないことに注意。
パッチは一部Steven Walters提供。
}}

!x264r1075
{{bq
git-id : 7c4f8297e057a439cf1f1d7cf95d05b9f27063c7
Date: Tue Jan 13 19:58:44 2009 -0500

Remove an IDIV from i8x8 analysis
Only one IDIV is left in macroblock level code (transform_rd)
}}

{{bq
IDIV（整数除算）をi8x8解析から削除。
マクロブロックレベルのコード中では（transform_rdに）１つだけIDIVが残っている。
}}

!x264r1074
{{bq
git-id : 9d6cc8e28319b3935698d52a6711414435444029
Date: Thu Jan 8 15:07:16 2009 -0500

Fix regression in r1066
With some combinations of video width and other settings, the scratch buffer was slightly too small.
This caused heap corruption on some systems.
Also prevent merange from being raised during encoding with esa/tesa through encoder_reconfig, as this no longer works.
}}

{{bq
r1066のレグレッション（訳注：エンバグ）を修正。
ビデオの幅とその他の設定の組み合わせのいくつかで、スクラッチバッファ（訳注：作業バッファ？）が僅かに小さかった。
これによりいくつかのシステム上でヒープ破壊が発生していた。
esa/tesa時にencoder_reconfigでmerangeが引き上げられることを抑制、これはもはや働かないため。
}}

!x264r1073
{{bq
git-id : 6a4a9beae060d69bbeaeb8c1c3056fb6ae6765f6
Date: Tue Jan 6 16:55:44 2009 -0500

Disable B-frames in lossless mode
They hurt compression anyways, and direct auto was bugged with lossless.
}}

{{bq
ロスレスモードでのBフレームを無効化。
どっちみち圧縮率に悪影響で、direct autoはロスレス時のバグを含んでいる。
}}

!x264r1072
{{bq
git-id : f586ba52b87ebb6b1d7689603680ef7d3219e09a
Date: Mon Jan 5 22:53:11 2009 +0000

Factorize in ppccommon.h the conditional inclusion of altivec.h on Linux systems.
}}

{{bq
ppcccommon.hにLinuxシステムでのaltivec.hの条件付きインクルードを集約。
}}
PowerPCのLinuxにのみ影響。機能上の影響なし。

!x264r1071
{{bq
git-id : 87b6d55ebda8a11186f7b09b5866b05a4584d13d
Date: Mon Jan 5 15:58:32 2009 -0500

Disable __builtin_clz() intrinsic on gcc versions prior to 3.4.
The function did not exist before that version.
}}

{{bq
バージョン3.4より前のgccで内蔵の__builtin_clz()を無効化。
この関数はそれより前のバージョンには存在しない。
}}

!x264r1071→r1070
r1070が取り下げられたためr1070に繰り下がった。
{{bq
git-id : 6f7c9be698848e8d9fd116b728af7d718ea43a2f
Date: Thu Jan 1 21:44:00 2009 -0500

Small tweaks to coeff asm
Factor out a few redundant pxors
Related cosmetics
}}

{{bq
coeff（係数）アセンブラへの小さな調整。
いくつかの冗長なpxorを削除。
関連したコスメティックス。
}}

繰り下がり前のgit-idは"16c855394a0068792456aada724f3d8305608fa6"。

!x264r1070→無効
''この修正は取り下げられた。''現在のリポジトリには含まれていない。
{{bq
git-id : 1e2f6d258df09874e2e0c85ef611f27d397555bf
Date: Thu Jan 1 21:38:33 2009 -0500

Fix C99ism in r1066
}}

{{bq
r1066でのC99イズム（訳注：C99的なコードの書き方）を修正。
}}

取り下げられた経緯は[x264-devのML|http://mailman.videolan.org/pipermail/x264-devel/2009-January/005340.html]にあるが、要約するとC99対応度の低いMSVCに毎回付き合ってられないから、ということの模様。この修正の有無で機能上の影響なし。

!x264r1069
{{bq
git-id : ed32ad20d6914d1781c7711574f10a8df49e3e20
Date: Tue Dec 30 22:20:37 2008 -0500

Use the correct strtok under MSVC
Also change one malloc -> x264_malloc
}}

{{bq
MSVC下で正しいstrtokを使うようにした。
同様に1つのmallocをx264_mallocに変更した。
}}

!x264r1068
{{bq
git-id : 390d26ad2ca72b420448c36a747d3ee49b79e75b
Date: Tue Dec 30 22:14:45 2008 -0500

Add stack alignment for lookahead functions
Should allow libx264 to be called from non-gcc-compiled applications without adding force_align_arg_pointer.
}}

{{bq
lookahead（先読み）関数にスタックアラインメントを追加。
gcc以外でコンパイルされたアプリケーションから、force_align_arg_pointerを追加することなく、libx264が呼び出されることが可能になったはず。
}}

!x264r1067
{{bq
git-id : 00cef64dd3fff5d4b5b9b0e63314c11bfb7d33e0
Date: Tue Dec 30 20:47:45 2008 -0500

Add support for SSE4a (Phenom) LZCNT instruction
Significantly speeds up coeff_last and coeff_level_run on Phenom CPUs for faster CAVLC and CABAC.
Also a small tweak to coeff_level_run asm.
}}

{{bq
SSE4A(Phenom) LZCNT命令のサポートを追加。
CAVLCとCABAC高速化のため、Phenom CPU上でのcoeff_lastとcoeff_level_runを有意に高速化。
coeff_level_runのアセンブラの小さな改善も行った。
}}

!x264r1066
{{bq
git-id : 045ae4045a1827555b3eaab4fbf3c9809e98c58f
Date: Mon Dec 29 05:14:26 2008 +0000

factor mallocs out of hpel, ssim, and esa.
there should now be no memory allocation outside of init-time.
}}

{{bq
hpel, ssimとesaからmallocを除去。
初期化時以外にメモリ確保が無くなったはず。
}}

!x264r1065
{{bq
git-id : 0e1a92c56f8c89735e2d5044f89832bdc22b6e50
Date: Mon Dec 29 22:00:02 2008 -0500

Much faster CAVLC RDO and bitstream writing
Pure asm version of level/run coding. Over 2x faster than C.
Up to 40% faster CAVLC RDO. Overall benefit up to ~7.5% with RDO or ~5% with fast encoding settings.
}}

{{bq
CAVLC RDOとビットストリームの書き込みを大きく高速化。
完全アセンブラバージョンのlevel/runコーディング。Cより2倍以上速い。
CAVLC RDOが最大40%高速化。全体での利得はRDOの場合で最大～7.5%、または高速なエンコーディング設定（訳注：≒RDO不使用と思われる）の場合で～5%になる。
}}

!x264r1064
{{bq
git-id : 648e132f7135c7e18625198e3ffe2c6c7d824df6
Date: Mon Dec 29 21:52:25 2008 -0500

Cosmetics: cleaner syntax for defining temporary registers in asm
Globally define t#[qdwb], so that only t# needs to be locally defined when reorganizing registers
}}

{{bq
コスメティックス：アセンブラ中でテンポラリレジスタを定義するための文法を整理。
グローバルにt#[qdwb]を定義し、レジスタを再構成する場合にはt#だけがローカルに定義されればよいようになった。
}}

!x264r1063
{{bq
git-id : 84a1ca6ce70fe7bad4922ddc5a72c2e9cd73703b
Date: Sat Dec 27 21:36:14 2008 -0500

Much faster CABAC RDO
Since RDO doesn't care about what order bit costs are calculated, merge sigmap and level coding into the same loop in RDO.
This is bit-exact for 4x4dct but slightly incorrect for 8x8dct due to the sigmap containing duplicated contexts.
However, the PSNR penalty of this is extremely small (~0.001db).
Speed benefit is about 15% in 4x4dct and 30% in 8x8dct residual bit cost calculation at QP20.
Overall encoding speed benefit is up to 5%, depending on encoding settings.
Also remove an old unnecessary CABAC table that hasn't been used for years.
}}

{{bq
CABACのRDOをかなり高速化。
RDOはどのような手順でビットコストが計算されたか気にしないため、RDO中のsigmapとレベルのコーディングを同じループに統合した。
これは4x4dctに関してはbit-exact（ビット単位で正確）であるが、sigmapが重複したコンテキストを含むため、8x8dctに関しては僅かに正しくない。
しかしながら、これによるPSNRのペナルティは非常に小さい（～0.001db）。
スピードの利得はQP20のresidual bit cost（残余ビットコスト）計算において8x8dctで約30%、4x4dctで約15%である。
全体でのエンコーディングスピードの利得は、エンコーディング設定によるが最大5%になる。
また、何年も使用されていなかった古い不要なCABACのテーブルを削除した。
}}

!x264r1062
{{bq
git-id : 40ad1b6d411d1e7d0788d29627804a19977bb6ee
Date: Fri Dec 26 07:35:49 2008 -0500

VLC table optimizations
Slightly reorganize VLC tables for ~2% faster block_residual_write_cavlc.
Also a small optimization in p8x8 CAVLC.
}}

{{bq
VLCのテーブル最適化。
僅かにVLCテーブルを再編しblock_residual_write_cavlcを～2%高速化。
p8x8のCAVLCにも小さな最適化。
}}

!x264r1061
{{bq
git-id : 839cd8cf33492213f7878bf14c26b387b6599abd
Date: Wed Dec 24 22:58:17 2008 -0500

Fix crash in --me esa/tesa introduced in r1058
Also suppress the last mingw warning message
}}

{{bq
r1058で発生した--me esa/tesaでのクラッシュを修正。
また、mingwの最後の警告メッセージを抑制。
}}

!x264r1060
{{bq
git-id : 42070dff1bc3019a6f56773fce3dd6e328e3a61b
Date: Tue Dec 23 22:33:28 2008 -0500

Optimize variance asm + minor changes
Remove SAD argument from var, not needed anymore.
Speed up var asm a bit by eliminating psadbw and instead HADDWing at end.
Eliminate all remaining warnings on gcc 3.4 on cygwin
Port another minor optimization from lavc (pskip)
}}

{{bq
variance（訳注：分散、レートコントロールの収束管理部と思われる）のアセンブラを最適化＋小規模な変更。
varからもう必要ないSAD引数を削除。
psadbwを排除し代わりに終端でHADDW化することでvarのアセンブラを若干高速化。
cygwinのgcc 3.4において残っていた全ての警告を排除。
lavc（訳注：libavcodecと思われる）からまた別の小さな最適化をポーティング（pskip）。
}}

!x264r1059
{{bq
git-id : 265d61bcdc5e6ce6688e377af9f6c0136724ed59
Date: Tue Dec 23 18:31:48 2008 -0500

Minor CABAC cleanups and related optimizations
Merge the two list tables to allow cleaner MC/CABAC/CAVLC code
Remove lots of unnecessary {s
Port some very minor opts from lavc
}}

{{bq
小規模なCABACの整理と関連する最適化。
MC/CABAC/CAVLCを整理できるようにするため2つのリスト表を統合。
多くの不要な"{"を削除。
いくつかの非常に小さな最適化をlavc（訳注：libavcodecと思われる）からポーティング。
}}

!x264r1058
{{bq
git-id : a4ec1020efb1a2a6757f8f891d78c2dd9344bb91
Date: Thu Dec 11 19:47:17 2008 +0000

faster ESA init
reduce memory if using ESA and not p4x4
}}

{{bq
ESA初期化を高速化。
ESAを使用し、かつp4x4を使用しない場合のメモリ使用量を低減。
}}

!x264r1057
{{bq
git-id : 5f8a1490eb0bc2a934c34bc8307bfdc1ade6a92d
Date: Mon Dec 15 23:02:49 2008 -0800

More macroblock_cache optimizations
Patch partially by Loren Merritt
}}

{{bq
macroblock_cacheをさらに最適化。
パッチは一部Loren Merrittによる。
}}

!x264r1056
{{bq
git-id : e59ee249829049de338bebc3a2a00f9e471b40f3
Date: Mon Dec 15 13:15:29 2008 -0800

Faster macroblock_cache_rect
Explicit loop unrolling
}}

{{bq
macroblock_cache_rectを高速化。
明示的なループアンローリング。
}}

!x264r1055
{{bq
git-id : 2b8d6a6f957be623186ea2a20bcb13c3637440b8
Date: Sun Dec 14 18:30:51 2008 -0800

Optimizations in predict_mv_direct
Add some early terminations and minor optimizations
This change may also fix the extremely rare direct+threading MV bug.
}}

{{bq
predict_mv_directの最適化。
いくつかの早期打ち切りと小さな最適化を追加。
これは極めて稀なdirect+スレッドでのMVのバグをも修正する。
}}

!x264r1054
{{bq
git-id : 918ff3c6a33e170655b61af85f8955ec5590fff8
Date: Sun Dec 14 10:47:28 2008 +0000

Fix visual corruption when picture width was not mod 32.
The previous Altivec implemention of mc_chroma assumed that i_src_stride was always mod 16.
}}

{{bq
ピクチャの幅が32の倍数ではない場合の目に見えた破綻を修正。
以前のmc_chromaのAltivec実装はi_src_strideが16の倍数であると仮定していた。
}}

!x264r1053
{{bq
git-id : 9089d217078450fad075b5eb61f372572d094a5f
Date: Mon Dec 8 21:11:45 2008 +0100

Add support for FSF GCC version >= 4.3 on OSX.
So far, only Apple GCC version was supported.
}}

{{bq
OSXでFSFのGCC 4.3以降のサポートを追加。
これまではApple GCCのみサポートされていた。
}}
Macのみ影響。機能上の影響なし。

!x264r1052
{{bq
git-id : ad2c84f76f9fbb4f360caeb87df824beab023bbf
Date: Thu Dec 11 17:31:52 2008 -0800

More accurate refcost for p8x8 CAVLC
Slightly better quality, especially in non-RD mode, with CAVLC.
}}

{{bq
p8x8 CAVLCに対してより精密なrefcost。
僅かに質を改善、特にCAVLCの非RDモードにおいて。
}}

!x264r1051
{{bq
git-id : 549cc55b50df76d5167c0ace75c62595feb753ca
Date: Wed Dec 10 20:54:17 2008 -0800

use lookup tables instead of actual exp/pow for AQ
Significant speed boost, especially on CPUs with atrociously slow floating point units (e.g. Pentium 4 saves 800 clocks per MB with this change).
Add x264_clz function as part of the LUT system: this may be useful later.
Note this changes output somewhat as the numbers from the lookup table are not exact.
}}

{{bq
AQで実際のexp/powの代わりにルックアップテーブルを使用。
有意に速度向上、特に酷く遅い浮動小数点ユニットを持つCPUで（例えばPentium 4はこの変更でMBごとに800クロックを節約）。
LUTシステムの一部としてx264_clz関数を追加：これは後々役立つ。
ルックアップテーブルによる数値が精密ではないため出力が幾分変わることに注意。
}}

!x264r1050
{{bq
git-id : 877d22e071f1b73fb33c628cc273e2819d10c3de
Date: Wed Dec 10 20:53:13 2008 -0800

Suppress saveptr warnings on Windows GCC
}}

{{bq
Windows上のGCCでsaveptrの警告を抑制。
}}

!x264r1049
{{bq
git-id : 0fdd0403cc9cc95637e287ddd1b257d6b65b7ddb
Date: Wed Dec 10 20:52:06 2008 -0800

More small speed tweaks to macroblock.c
}}

{{bq
macroblock.cにさらなる小さな速度改善。
}}

!x264r1048
{{bq
git-id : 77028cd3671de855affb02ffefe6bbd99ac7816e
Date: Mon Dec 8 13:44:23 2008 -0800

Much faster CAVLC residual coding
Use a VLC table for common levelcodes instead of constructing them on-the-spot
Branchless version of i_trailing calculation (2x faster on Nehalem)
Completely remove array_non_zero_count and instead use the count calculated in level/run coding. Note: this slightly changes output with subme > 7 due to different nonzero counts being stored during qpel RD.
}}

{{bq
CAVLCでのresidual coding（残余コーディング）をかなり高速化。
一般levelcodesを必要時に構築するのではなくVLCテーブルを使用。
分岐のないバージョンのi_trailing計算（Nehalemで2倍高速）。
array_non_zero_countを完全に削除し代わりにlevel/runコーディングにおいて計算された値を使用。
注意：qpel RDで異なる非ゼロ数量が格納されるため、subme > 7においてこれは僅かに出力を変える。
}}

!x264r1047
{{bq
git-id : f773bf06256a467f6b18418d97ce2c7ddbe5728c
Date: Fri Dec 5 22:26:55 2008 +0100

fix compilation with GCC-4.3+
}}

{{bq
GCC-4.3以上でのコンパイルを修正。
}}

!x264r1046
{{bq
git-id : 71d34b4eb454027cd742e6d96e2a70cce8cd163c
Date: Sat Nov 29 23:13:58 2008 -0800

High Profile allows 25% higher maxbitrate/cpb
Correct level detection to take this into account.
}}

{{bq
High Profileは25%高いmaxbitrate/cpbを許容する。
これを反映するためレベルの検出を修正。
}}

!x264r1045
{{bq
git-id : cc40e308c919010ef9ae6ff376cda56a58a79f3a
Date: Sat Nov 29 14:04:29 2008 -0800

s/nasm/yasm in VS project file
}}

{{bq
VSのプロジェクトファイルでnasmをyasmに置換。
}}

!x264r1044
{{bq
git-id : 6c3e0258776c1117929aec7c12d51b88c214467c
Date: Sat Nov 29 04:49:18 2008 -0800

Cosmetic: update various file headers.
}}

{{bq
コスメティックス：様々なファイルのヘッダを更新。
}}

!x264r1043
{{bq
git-id : 2f031b0e31b799072383792358eee376baeb2ba7
Date: Sat Nov 29 11:54:02 2008 +0000

add date and compiler to `x264 --version`
}}

{{bq
"x264 --version"に日付とコンパイラを追加。
}}

!x264r1042
{{bq
git-id : 5df2a7162aecda66b3da8dde501971389b1bbd44
Date: Fri Nov 28 14:32:11 2008 -0800

10L in r1041
}}

{{bq
r1041における10L。
}}
r1041で発生した下らないケアレスミスで、クラッシュするバグの修正。

2009/02/04追記：10Lの意味については、akupenguin氏により下記のような[冗談交じりの解説|http://forum.doom9.org/showthread.php?p=1218259#post1218259]がなされていた。
{{bq
10 liters of cola. This means:
(a) "I must have been asleep when I wrote that, need more caffeine."
(b) One of the MPlayer developers didn't like cola, and we joked about assigning him some as punishment after one time he committed a nasty bug that should have been easily spotted.
}}
{{bq
10リットルのコーラ。その意味は、
(a) 「それを書いたとき, 俺は居眠りしていたに違いない, もっとカフェインをくれ」
(b) MPlayerの開発者の一人はコーラが嫌いで、彼がすぐにわかるような下らないバグをコミットしたとき、それを罰にしようぜ、とみんなで冗談を言っていたことから。
}}
正解は…両方、といったところか？

!x264r1041
{{bq
git-id : 12724e638e8d28ce97bcb9c77d2bb7336b087af3
Date: Thu Nov 27 19:37:56 2008 -0800

Significantly faster CABAC and CAVLC residual coding and bit cost calculation
Early-terminate in residual writing using stored nnz counts
To allow the above, store nnz counts for luma and chroma DC
Add assembly functions to find the last nonzero coefficient in a block
Overall ~1.9% faster at subme9+8x8dct+qp25 with CAVLC, ~0.7% faster with CABAC
Note this changes output slightly with CABAC RDO because it requires always storing correct nnz values during RDO, which wasn't done before in cases it wasn't useful.
CAVLC output should be equivalent.
}}

{{bq
CABACとCAVLCのresidual coding（残余コーディング）とビットコスト計算が有意に高速化。
residual writing（残余書き込み）で保持された非ゼロ数量を使用し早期打ち切り。
上記を可能にするため、lumaとchromaのDCで非ゼロ数量を保持。
ブロック中で最後の非ゼロ係数を探すアセンブリ関数を追加。
subme9+8x8dct+qp25のCAVLCにおいて全体で～1.9%高速、CABACにおいて～0.7%高速。
RDO中には常に正しい非ゼロ数量値を格納することを要求されるが、
以前は無意味な場合にこれを行っておらず、
そのため、これによりCABAC RDOでの出力がわずかに変わることに注意。
CAVLCの出力は同等のはずである。
}}

!x264r1040
{{bq
git-id : 1591275a92faa3d63186e6de1e9022956113bc1d
Date: Wed Nov 26 23:42:55 2008 -0800

dequant_4x4_dc assembly
About 3.5x faster DC dequant on Conroe
}}

{{bq
dequant_4x4_dcをアセンブラ化。
DCの逆量子化がConroeで約3.5倍高速。
}}

!x264r1039
{{bq
git-id : 2338e1301aded556be8b85c6c3b4050e562ed862
Date: Thu Nov 27 02:37:46 2008 +0000

fix an overflow in dct4x4dc_mmx
(unlikely to have occurred in any real video)
}}

{{bq
dct4x4dc_mmxでのオーバーフローを修正。
（現実の動画で発生していたとは考えにくい）
}}

!x264r1038
{{bq
git-id : a7fd9f5da062de323ae89f9a71ede03bfd6ddb6a
Date: Tue Nov 25 16:30:39 2008 -0800

Remove nasm support
Nasm won't correctly parse the SSE4 code introduced a few revisions ago, so we're removing support.
Users should upgrade to yasm 0.6.1 or later.
}}

{{bq
nasmのサポートを削除。
nasmはいくつか以前のリビジョンで導入されたSSE4のコードを正しく解釈しないのでサポートをやめる。
ユーザはyasm 0.6.1以降にアップグレードすべき。
}}

!x264r1037
{{bq
git-id : 0fe9b85f393e00b79c37d8c81aeae2a2f3d41290
Date: Tue Nov 25 15:11:24 2008 -0800

Fix rare warning messages in ratecontrol due to r1020
}}

{{bq
r1020に起因するratecontrolでの稀な警告メッセージを修正。
}}

!x264r1036
{{bq
git-id : adffb7fa3d3db6d2fa2ad97e0c950afec8889ea5
Date: Tue Nov 25 15:10:43 2008 -0800

Fix MSVC compilation and clean up MSVC build file
Remove Release64 which never worked anyways.
}}

{{bq
MSVCのコンパイルを修正、MSVCビルドファイルをクリーンアップ。
どうせ動かないRelease64を削除。
}}

!x264r1035
{{bq
git-id : e1013e8152254614696bbc9d92959bc9705d98b1
Date: Tue Nov 25 01:04:26 2008 -0800

Faster width4 SSD+SATD, SSE4 optimizations
Do satd 4x8 by transposing the two blocks' positions and running satd 8x4.
Use pinsrd (SSE4) for faster width4 SSD
Globally replace movlhps with punpcklqdq (it seems to be faster on Conroe)
Move mask_misalign declaration to cpu.h to avoid warning in encoder.c.
These optimizations help on Nehalem, Phenom, and Penryn CPUs.
}}

{{bq
width4 SSD+SATDを高速化、SSE4最適化。
2ブロックの位置を転移してsatd 8x4を実行することでsatd 4x8を行う。
width 4 SSDの高速化のためpinsrd(SSE4)を使用。
全体的にmovlhpsをpunpcklqdq（Conroeでより高速の模様）で置換。
encoder.cでの警告回避のためmask_misalignの宣言をcpu.hに移動。
これらの最適化はNehalem, Phenom, Penryn CPUで効果あり。
}}

!x264r1034
{{bq
git-id : cab23dd7e6bf4ed501f36e1e6f64d4902d0489c9
Date: Tue Nov 25 17:27:27 2008 +0100

fix indentation, whitespace cleanup, more consistent indentation of macro backslashes
}}

{{bq
インデント修正、空白類のクリーンアップ、マクロのバックスラッシュをより一貫させた。
}}

!x264r1033
{{bq
git-id : 4bf4109aee5b602f8a124b434e18f93ef539bbe6
Date: Sat Nov 22 17:54:38 2008 +0100

Change some macros to be more sensitive to memory alignment, thus avoiding
useless loads/stores and calculations of permutation vectors.
Affected functions are all of mc_luma, mc_chroma, 'get_ref', SATD, SA8D and deblock.
Gains globally vary from ~5% - 15% on a depending on settings running on a 1.42 ghz G4.
}}

{{bq
いくつかのマクロをメモリアラインメントに対してより精密にし、
不要なload/storeとpermutation vectorの計算を回避。
影響を受ける関数は全てのmc_luma, mc_chroma, 'get_ref', SATD, SA8Dとデブロック。
利得は1.42GHzのG4で設定により～5%から15%まで様々。
}}

!x264r1032
{{bq
git-id : 8215f79c7e19747b3206bcd6245be6cc6e668145
Date: Fri Nov 7 05:31:24 2008 +0000

refactor satd. 20KB smaller binary.
refactor sa8d. slightly faster.
more checkasm for hadamard.
}}

{{bq
satdのリファクタリング。バイナリが20KB小さく。
sa8dのリファクタリング。若干高速化。
hadamardにさらなるcheckasm。
}}

!x264r1031
{{bq
git-id : 3a028c8e50238b7799175bd5a172e5517b4baf8d
Date: Mon Nov 24 21:56:24 2008 -0800

Fix crash with threads and SSEMisalign on Phenom
Misalign mask needed to be set separately for each encoding thread.
}}

{{bq
PhenomにおけるスレッドとSSEミスアラインのクラッシュを修正。
ミスアラインマスクは各エンコーディングスレッドで個別に設定する必要がある。
}}

!x264r1030
{{bq
git-id : f9dba8bb274dffb19394db20912823464efcb8e1
Date: Fri Nov 21 03:39:11 2008 -0800

Phenom CPU optimizations
Faster hpel_filter by using unaligned loads instead of emulated PALIGNR
Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it).
Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref.
Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom.
Merge cpu-32.asm and cpu-64.asm
Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.
}}

{{bq
Phenom CPU最適化。
PALIGNRをエミュレートする代わりにunalignedロードを使用しhpel_filterを高速化。
64bitでも32bitバージョンを使用する（エミュレートされたPALIGNRのコストは高く
中間値のキャッシングによるコスト低減でも相応しない）ことでhpel_filterを高速化。
Phenomでのmisaligned_maskのサポートを追加：hpel_filterが～2%高速化、width16 multisadが～4%高速化、width20 get_refが7%高速化。
PhenomとNehalemにおいてwidth12 mmxをwidth16 sseで置き換え：width12 get_refがPhenomで32%高速化。
cpu-32.asmとcpu-64.asmをマージ（統合）。
週末にPhenom boxを寄与してくれたEasy123に感謝する。そのためこの最適化を書くことができた。
}}

!x264r1029
{{bq
git-id : cb3c213850320fb0c1b17ae8bbbbf5d687e43961
Date: Thu Nov 20 20:11:14 2008 -0800

A few tweaks to decimate asm
A little bit faster on both 32-bit and 64-bit
}}

{{bq
decimateアセンブリをいくつか調整。
32-bitと64-bitの両方でわずかに高速。
}}

!x264r1028
{{bq
git-id : 83baa7fdd2edf3e2f9522fc8b79e0826bcf190fc
Date: Wed Nov 12 16:50:31 2008 -0800

Nehalem optimization part 2: SSE2 width-8 SAD
Helps a bit on Phenom as well
~25% faster width8 multiSAD on Nehalem
}}

{{bq
Nehalem最適化その2：SSE2 width-8 SAD。
Phenomでも多少の効果あり。
Nehalem上でwidth8 multiSADが～25%高速。
}}

!x264r1027
{{bq
git-id : aa14719bf2b78f8fd3da7bbabb0faf142313dae1
Date: Mon Nov 10 23:34:02 2008 -0800

Add subme=0 (fullpel motion estimation only)
Only for experimental purposes and ultra-fast encoding. Probably not a good idea for firstpass.
}}

{{bq
subme=0（画素単位の動き評価のみ）を追加。
実験的な目的での超高速エンコーディング専用。恐らく1パス目には向かない。
}}

!x264r1026
{{bq
git-id : 745a48beddc58e2ef121326e9156d3d42590a4b5
Date: Mon Nov 10 15:34:48 2008 -0800

Fix minor memory leak in r1022
}}

{{bq
r1022における小さなメモリリークを修正。
}}

!x264r1025
{{bq
git-id : fdb6114d1b456e1438374671ec42d1d77cdd05f8
Date: Mon Nov 10 15:32:06 2008 -0800

r1024 borked checkasm
Remove idct/dct2x2 from checkasm as they are no longer in dctf
}}

{{bq
r1024はcheckasmを壊していた。
idct/dct2x2はもはやdctfに存在しないためcheckasmから削除。
}}

!x264r1024
{{bq
git-id : 2652abeae5445ffefded5ee7d0853300d0973b37
Date: Sun Nov 9 17:39:21 2008 -0800

Faster chroma encoding
9-12% faster chroma encode.
Move all functions for handling chroma DC that don't have assembly versions to macroblock.c and inline them, along with a few other tweaks.
}}

{{bq
chromaエンコーディングを高速化。
chromaエンコードが9-12%高速。
chroma DCを扱うアセンブリバージョンのないすべての関数をmacroblock.cに移動し、
インライン化といくつかの調整を行った。
}}

!x264r1023
{{bq
git-id : f13d4637fe9b2f10b8c103500ac9293bfca3ad1f
Date: Sun Nov 9 17:34:31 2008 -0800

Various cosmetics and minor fixes
Disable hadamard_ac sse2/ssse3 under stack_mod4
Fix one MSVC compilation warning
Fix compilation in debug mode in certain cases on x64
Remove eval.c from MSVC project
Fix crash when VBV is used in CQP mode
Patches by MasterNobody
}}

{{bq
様々なコスメティックスと小さな修正。
stack_mod4下でhadamard_acのsse2/sse3を無効にした。
MSVCでコンパイル時の警告の1つを修正。
x64上の特定のケースにおけるデバッグモードでのコンパイルを修正。
eval.cをMSVCのプロジェクトから削除。
CQPモードでVBVが使用された場合のクラッシュを修正。
MasterNobodyによるパッチ提供。
}}
stack_mod4とは、スタックが4byteアラインである場合を指していると思われる。gccで言えばデフォルトは16byteアラインであり、-mpreferred-stack-boundaryで変更できる。

!x264r1022
{{bq
git-id : 7cdaf638c3777f2b38fb60181dde7ed4de614cc1
Date: Sat Nov 8 20:16:17 2008 -0800

Faster b-adapt + adaptive quantization
Factor out pow to be only called once per macroblock. Speeds up b-adapt, especially b-adapt 2, considerably.
Speed boost is as high as 24% with b-adapt 2 + b-frames 16.
}}

{{bq
b-adapt + 適応的量子化（訳注：CQP以外のモードと思われる）を高速化。
powを取り除きマクロブロックで1回だけ呼ばれるようにした。b-adapt、特にb-adapt 2がかなり高速化。
速度の向上はb-adapt 2 + b-frames 16で24%になる。
}}

!x264r1021
{{bq
git-id : 852579be365549db3ccc1c2906c9a1d2f4a92ac9
Date: Fri Nov 7 11:39:43 2008 -0800

Faster CABAC residual encoding
6% faster block_residual_write_cabac in RD mode.
}}

{{bq
CABACのresidual encoding（残余コーディング）を高速化。
RDモードでblock_residual_write_cabacが6%高速化。
}}

!x264r1020
{{bq
git-id : 418cace8646a6f546a9026da47f79fad7285f577
Date: Wed Nov 5 19:51:59 2008 -0800

Fix potential crash in the case that the input statsfile is too short
Also resolve various other potential weirdness (such as multiple copies of the same error message in threaded mode).
}}

{{bq
入力のstatsファイルが短すぎる場合にクラッシュする可能性を修正。
様々な他の潜在的奇行（スレッドモードで同じエラーメッセージが複数存在するなど）も解決。
}}

!x264r1019
{{bq
git-id : a5ac6a5b8688915553fe6fccee09f1272f3788ac
Date: Wed Nov 5 03:11:45 2008 -0800

Initial Nehalem CPU optimizations
movaps/movups are no longer equivalent to their integer equivalents on the Nehalem, so that substitution is removed.
Nehalem has a much lower cacheline split penalty than previous Intel CPUs, so cacheline workarounds are no longer necessary.
Thanks to Intel for providing Avail Media with the pre-release Nehalem CPU needed to prepare these (and other not-yet-committed) optimizations.
Overall speed improvement with Nehalem vs Penryn at the same clock speed is around 40%.
}}

{{bq
最初のNehalem CPUへの最適化。
movaps/movupsはもはやNehalem上ではその整数同等物と等価ではなくなったため、この代用は削除された。
Nehalemは以前のインテルCPUよりキャッシュライン分割のペナルティが非常に低いため、キャッシュラインのワークアラウンド（訳注：次善策 - 問題を回避する方策）は不要である。
これら（とまだコミットされていないその他）の最適化の準備に必要なプレリリース版Nehalem CPUをAvail Mediaに提供してくれたインテルに感謝する。
同クロックの対PenrynでNehalemの全体でのスピード向上は約40%である。
}}

!x264r1018
{{bq
git-id : 41b8069cb74fa3bc905618225be07ee8d35bbc79
Date: Tue Nov 4 09:56:03 2008 -0800

Fix potential infinite loop in VBV under GCC 4.2
}}

{{bq
GCC4.2においてVBVで無限ループする可能性を修正。
}}

!x264r1017
{{bq
git-id : e4c4568d4f0f234e942b4855391aea7224c41eb6
Date: Mon Nov 3 22:59:49 2008 -0800

Encoder_reconfig: esa/tesa can only be enabled if they were on to begin with
Bug report by kemuri-_9.
}}

{{bq
Encoder_reconfig： esa/tesaは開始時にONである場合のみ有効にできる。
kemuri-_9によるバグレポート。
}}

!x264r1016
{{bq
git-id : dbc5ef040b1f8a83e7491dc8a2fc8943b1e20c07
Date: Thu Oct 30 00:47:09 2008 -0700

Fix bug in hadamard_ac SSE assembly
Some extreme inputs could cause overflows.
}}

{{bq
hadamard_acのSSEアセンブリのバグフィックス。
極端な入力でオーバーフローを起こす可能性があった。
}}

!x264r1015
{{bq
git-id : a0a1bfac7f4a09159f6ef2bf13fb69548b6c5a02
Date: Tue Oct 28 20:35:15 2008 -0700

Full sub8x8 RD mode decision
Small speed penalty with p4x4 enabled, but significant quality gain at subme >= 6
As before, gain is proportional to the amount of p4x4 actually useful in a given input at the given bitrate.
}}

{{bq
完全なsub8x8のRDモード決定。
p4x4が有効な場合スピードに若干の悪影響だが、subme>=6で有意に質が向上する。
以前と同様、利得は与えられた入力、ビットレートにおいて実際に有効なp4x4の量に比例する。
}}

!x264r1014
{{bq
git-id : aa40e41abae051191117ae670cadd9cd50f66b6f
Date: Sat Oct 25 01:50:08 2008 -0700

Optimize CABAC bit cost calculation
Speed up cabac mvd and add new precalculated transition/entropy table.
Add "noup" function for cabac operations to not update the state table when it isn't necessary.
1-3% faster macroblock_size_cabac.
Cosmetics
}}

{{bq
CABACのビット消費計算を最適化。
CABACのmvdを高速化し新たな事前計算済みの変換／エントロピー表を追加。
不要時にはステータステーブルを更新しない"noup"機能をCABAC処理に追加。
macroblock_size_cabacが1-3%高速化。
その他コスメティックス。
}}

!x264r1013
{{bq
git-id : d7df1a477b5e0e851d206e8c25da0b275ae0b7cc
Date: Thu Oct 23 22:36:11 2008 -0700

Replace "git-command" with "git command" in version.sh for git 1.6 support
}}

{{bq
git1.6のサポートのためversion.sh中の"git-command"を"git command"に置換。
}}

!x264r1012
{{bq
git-id : 990274cd5fd276bb26ac0fa13fc9bc1cbcf7acbc
Date: Thu Oct 23 13:45:04 2008 -0700

Add assembly version of CAVLC 8x8dct interleave
Faster CAVLC encoding and RDO with 8x8dct
}}

{{bq
CAVLC 8x8dctインターリーブのアセンブラバージョンを追加。
8x8dctを使用するRDOとCAVLCエンコードを高速化。
}}

!x264r1011
{{bq
git-id : 86a0fe50c6e369d6dacac5b992febb4bd09de85d
Date: Wed Oct 22 15:55:30 2008 -0700

Add support for psy-rd/trellis to encoder_reconfig
}}

{{bq
encoder_reconfigにpsy-rd/trellisのサポートを追加。
}}

!x264r1010
{{bq
git-id : 4c78f091e625e87b8f82c567af81969c2fd3e671
Date: Wed Oct 22 15:00:43 2008 -0700

Fix Darwin speed regression
}}

{{bq
Darwinでのスピードの退行を修正。
}}

!x264r1009
{{bq
git-id : 8ca555ef691e20d4eb429b45e178f4c0108b607d
Date: Wed Oct 22 14:48:47 2008 -0700

Further improve prediction of bitrate and VBV in threaded mode
}}

{{bq
スレッド使用時のVBVとビットレートの予測をさらに改善。
}}

!x264r1008
{{bq
git-id : 8fa0bf3d56fa1b02a44f8ee0f673ea998294bd7e
Date: Wed Oct 22 13:37:09 2008 -0700

Sub-8x8 Qpel-RD in P-frames
Improves quality when using p8x4/p4x8/p4x4 subpartitions
Benefit is proportional to how many sub-8x8 partitions are used; helps most at high bitrates and low resolutions.
}}

{{bq
Pフレームにおけるsub8x8 Qpel-RD。
p8x4/p4x8/p4x4サブパーティション使用時の質を向上。
利得はsub8x8パーティションの使用量に比例する：高ビットレートかつ低解像度時に最も効く。
}}

!x264r1007
{{bq
git-id : d0add77f5f084253202747266f85daa65f7fc9cc
Date: Wed Oct 22 02:20:06 2008 -0700

Faster qpel-RD
3-4% faster qpel-RD; avoid re-checking bmv/pmv during the hex search.
}}

{{bq
qpel-RDを高速化。
qpel-RDが3-4%高速化：hex検索時のbmv/pmvの再チェックを回避。
}}

!x264r1006
{{bq
git-id : f451563f93fad8972c0e9b788b30799e777e913a
Date: Wed Oct 22 00:37:00 2008 -0700

Some minor optimizations in RD refinement
Don't write b subpartition in CABAC RDO
Calculate nonzero count in i4x4 CAVLC RDO
}}

{{bq
RD refinementの些末な最適化。
CABAC RDOにおいてBサブパーティションを書き込まない。
i4x4のCAVLC RDOにおいて非ゼロの個数を計算。
}}

!x264r1005
{{bq
git-id : 84ede33bec64332cc4bc5da1106c53f3cffa919b
Date: Tue Oct 21 20:17:18 2008 -0700

Faster deblocking when p4x4 isn't used
Most of the MV checks can be skipped, resulting in faster strength calculation
}}

{{bq
p4x4不使用の場合のデブロッキングを高速化。
MVチェックのほとんどをスキップ可能で、その結果、高速なstrength計算になる。
}}

!x264r1004
{{bq
git-id : ce0b11099e5fa920b8d1bc39389ae9373f921358
Date: Tue Oct 21 19:38:21 2008 -0700

Print profile and level information upon starting encode
Previously level was only printed as part of autodetect, and only in verbose mode.
}}

{{bq
エンコード開始時にプロファイルとレベルの情報を表示。
これまでは自動検出の一部としてのみ、かつverboseモード時だけレベルが表示されていた。
}}

!x264r1003
{{bq
git-id : 296b39dd863ea90a12d8a52848d1135e387a28f3
Date: Tue Oct 21 17:10:46 2008 -0700

Fix possible crash in trellis at very low QPs
}}

{{bq
非常に低いQP時にtrellisでクラッシュする可能性を修正。
}}

!x264r1002
{{bq
git-id : f5da8110606c1bdb8f3a194f11574db28855415e
Date: Tue Oct 21 14:59:07 2008 -0700

Add assembly versions of decimate_score
3-7x faster decimation, 1-3% faster overall
}}

{{bq
decimate_scoreのアセンブラバージョンを追加。
decimationを3-7倍高速化、全体で1-3%の高速化。
}}

!x264r1001
{{bq
git-id : 6d9ef8ad39b0bfa5df0c1305e91ae932aad4997e
Date: Sat Oct 18 03:40:59 2008 -0700

Fix typo in subme8/9 lossless qpel-RD
Slightly improves compression.
}}

{{bq
subme8/9のロスレスqpel-RDでのタイプミスを修正。
圧縮率をわずかに改善。
}}

!x264r1000
{{bq
git-id : 79194caffdc216e338674d88e50adca2f4ea8fa2
Date: Thu Oct 16 03:17:53 2008 -0700

Extend trellis to support luma/chroma DC and chroma AC
Small speed loss in trellis 1, slightly larger in trellis 2, but significant quality improvement.
}}

{{bq
luma/chroma DCとchroma ACをサポートするためtrellisを延長（拡張）。
trellis 1において少々、trellis 2において若干大きなスピードロス、ただし有意に質が向上。
}}