Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine...

291
Perl 5.10 Yves Orton / Paul Fenwick 唐鳳 1

Transcript of Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine...

Page 1: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

Perl5.10☯

Yves Orton / Paul Fenwick

唐鳳

1

Page 2: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

⌛2

Page 3: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

perlhist

3

Page 4: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

perlhist

⌛ 5.0:1994

3

Page 5: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

perlhist

⌛ 5.0:1994

⌛ 5.1:1995

3

Page 6: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

perlhist

⌛ 5.0:1994

⌛ 5.1:1995

⌛ 5.2:1996

3

Page 7: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

perlhist

⌛ 5.0:1994

⌛ 5.1:1995

⌛ 5.2:1996

⌛ 5.3:1996

3

Page 8: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

perlhist

⌛ 5.0:1994

⌛ 5.1:1995

⌛ 5.2:1996

⌛ 5.3:1996

⌛ 5.4:1997

3

Page 9: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

perlhist

⌛ 5.0:1994

⌛ 5.1:1995

⌛ 5.2:1996

⌛ 5.3:1996

⌛ 5.4:1997

⌛ 5.5:1998

3

Page 10: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

perlhist

⌛ 5.0:1994

⌛ 5.1:1995

⌛ 5.2:1996

⌛ 5.3:1996

⌛ 5.4:1997

⌛ 5.5:1998

⌛ 5.6:2000

3

Page 11: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

perlhist

⌛ 5.0:1994

⌛ 5.1:1995

⌛ 5.2:1996

⌛ 5.3:1996

⌛ 5.4:1997

⌛ 5.5:1998

⌛ 5.6:2000

⌛ 5.8:2002

3

Page 12: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

5.8.0:2002

4

Page 13: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

5.8.0:2002

⌛ 5.8.1:2003

4

Page 14: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

5.8.0:2002

⌛ 5.8.1:2003

⌛ 5.8.2:2003

4

Page 15: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

5.8.0:2002

⌛ 5.8.1:2003

⌛ 5.8.2:2003

⌛ 5.8.3:2004

4

Page 16: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

5.8.0:2002

⌛ 5.8.1:2003

⌛ 5.8.2:2003

⌛ 5.8.3:2004

⌛ 5.8.4:2004

4

Page 17: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

5.8.0:2002

⌛ 5.8.1:2003

⌛ 5.8.2:2003

⌛ 5.8.3:2004

⌛ 5.8.4:2004

⌛ 5.8.5:2004

4

Page 18: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

5.8.0:2002

⌛ 5.8.1:2003

⌛ 5.8.2:2003

⌛ 5.8.3:2004

⌛ 5.8.4:2004

⌛ 5.8.5:2004

⌛ 5.8.6:2004

4

Page 19: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

5.8.0:2002

⌛ 5.8.1:2003

⌛ 5.8.2:2003

⌛ 5.8.3:2004

⌛ 5.8.4:2004

⌛ 5.8.5:2004

⌛ 5.8.6:2004

⌛ 5.8.7:2005

4

Page 20: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

5.8.0:2002

⌛ 5.8.1:2003

⌛ 5.8.2:2003

⌛ 5.8.3:2004

⌛ 5.8.4:2004

⌛ 5.8.5:2004

⌛ 5.8.6:2004

⌛ 5.8.7:2005

⌛ 5.8.8:2006

4

Page 21: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

2007

5

Page 22: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

2007

5.8.9☮

5

Page 23: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

2007

5.8.9☮

6.0.0☭

5

Page 24: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

2007

5.8.9☮

6.0.0☭

Perl 5.10

5

Page 25: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

5.10:繼往開來

6

Page 26: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

5.10:繼往開來

⌛ 承襲5.8系列的穩定性

6

Page 27: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

5.10:繼往開來

⌛ 承襲5.8系列的穩定性

⌛ 加入Perl6的新功能

6

Page 28: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

5.10:繼往開來

⌛ 承襲5.8系列的穩定性

⌛ 加入Perl6的新功能

⌛ 大幅提昇執行效率

6

Page 29: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

6‐on‐5計劃

Augmentation (XS)PadWalker, Devel::Caller

autobox, re::override ...

SemanticsData::Bind, Class::MOP

Pugs::Runtime, Pugs::Compiler::Rule ...

Perl 5 SugarMoose, Moose::Autobox ...

Syntaxv6.pm

Pugs::Compiler::Perl6 ...

Core

perl

Tool SupportCPAN, PAUSE

Perldoc, Perl::Tidy ...

!"#$%&

'()&*+,&-"#%

.&*%!/01!2.34!1"#56

InfrastructureParse::Yapp

Module::Compile ...

7

Page 30: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

⌛ 擴充P5VM的功能

6‐on‐5計劃

Augmentation (XS)PadWalker, Devel::Caller

autobox, re::override ...

SemanticsData::Bind, Class::MOP

Pugs::Runtime, Pugs::Compiler::Rule ...

Perl 5 SugarMoose, Moose::Autobox ...

Syntaxv6.pm

Pugs::Compiler::Perl6 ...

Core

perl

Tool SupportCPAN, PAUSE

Perldoc, Perl::Tidy ...

!"#$%&

'()&*+,&-"#%

.&*%!/01!2.34!1"#56

InfrastructureParse::Yapp

Module::Compile ...

7

Page 31: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

⌛ 擴充P5VM的功能⌛ 將Perl6編譯成Perl5

6‐on‐5計劃

Augmentation (XS)PadWalker, Devel::Caller

autobox, re::override ...

SemanticsData::Bind, Class::MOP

Pugs::Runtime, Pugs::Compiler::Rule ...

Perl 5 SugarMoose, Moose::Autobox ...

Syntaxv6.pm

Pugs::Compiler::Perl6 ...

Core

perl

Tool SupportCPAN, PAUSE

Perldoc, Perl::Tidy ...

!"#$%&

'()&*+,&-"#%

.&*%!/01!2.34!1"#56

InfrastructureParse::Yapp

Module::Compile ...

7

Page 32: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

⌛ 擴充P5VM的功能⌛ 將Perl6編譯成Perl5

⌛ 物件模型:Moose.pm

6‐on‐5計劃

Augmentation (XS)PadWalker, Devel::Caller

autobox, re::override ...

SemanticsData::Bind, Class::MOP

Pugs::Runtime, Pugs::Compiler::Rule ...

Perl 5 SugarMoose, Moose::Autobox ...

Syntaxv6.pm

Pugs::Compiler::Perl6 ...

Core

perl

Tool SupportCPAN, PAUSE

Perldoc, Perl::Tidy ...

!"#$%&

'()&*+,&-"#%

.&*%!/01!2.34!1"#56

InfrastructureParse::Yapp

Module::Compile ...

7

Page 33: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

⌛ 擴充P5VM的功能⌛ 將Perl6編譯成Perl5

⌛ 物件模型:Moose.pm

6‐on‐5計劃

Augmentation (XS)PadWalker, Devel::Caller

autobox, re::override ...

SemanticsData::Bind, Class::MOP

Pugs::Runtime, Pugs::Compiler::Rule ...

Perl 5 SugarMoose, Moose::Autobox ...

Syntaxv6.pm

Pugs::Compiler::Perl6 ...

Core

perl

Tool SupportCPAN, PAUSE

Perldoc, Perl::Tidy ...

!"#$%&

'()&*+,&-"#%

.&*%!/01!2.34!1"#56

InfrastructureParse::Yapp

Module::Compile ...

7

Page 34: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

✯8

Page 35: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature;

9

Page 36: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature;

✯ 過去:沒人敢加新保留字

9

Page 37: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature;

✯ 過去:沒人敢加新保留字

✯ 八年來祇多了our和CHECK

9

Page 38: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature;

✯ 過去:沒人敢加新保留字

✯ 八年來祇多了our和CHECK

✯ 現在:自行挑選新功能匯入

9

Page 39: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

• {

• usefeature'say';#區塊範圍內生效• say"Hello,World!";• }

• usefeature':5.10';#啟用所有5.10版的新保留字#(當然全都是從Perl6搬來的)

10

Page 40: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

自動啟用新保留字

11

Page 41: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

自動啟用新保留字

✯ 命令列:perl‐E"..."

11

Page 42: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

自動啟用新保留字

✯ 命令列:perl‐E"..."

✯ 程式內:usev5.10;➥usefeature':5.10';

11

Page 43: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'say';

12

Page 44: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'say';

✯ print再加上換列符號

12

Page 45: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'say';

✯ print再加上換列符號

✯ subsay{print@_,"\n"}

12

Page 46: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'say';

✯ print再加上換列符號

✯ subsay{print@_,"\n"}

✯ 既省力又不會出錯

12

Page 47: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

• use5.10;

• print"HelloWorld\n";say"HelloWorld"; #省下4個鍵

• printsome_function(),"\n";saysome_function(); #省下8個鍵

• sayfoo(); #串列語境

• printfoo(),"\n"; #串列語境

• printfoo()."\n"; #不小心用了純量語境!

• print'HelloWorld\n'; #引號打錯了print"HelloWorld/n"; #斜線打錯了

• say'HelloWorld'; #不可能出錯♡

13

Page 48: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'state';

14

Page 49: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'state';

✯ state$x;#宣告靜態變數

14

Page 50: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'state';

✯ state$x;#宣告靜態變數

✯ 下次進入區塊時,值不會清空

14

Page 51: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'state';

✯ state$x;#宣告靜態變數

✯ 下次進入區塊時,值不會清空

✯ 相當於C語言裡的static

14

Page 52: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

• #舊的寫法

• {

• my$i=0;

• subincrement{

• return++$i;

• }

• }

15

Page 53: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

•#新的寫法•use5.10;•subincrement{•state$i=0;•return++$i;•}

• #舊的寫法

• {

• my$i=0;

• subincrement{

• return++$i;

• }

• }

15

Page 54: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

•#新的寫法•use5.10;•subincrement{•state$i=0;•return++$i;•}

• #舊的寫法

• {

• my$i=0;

• subincrement{

• return++$i;

• }

• }

•formy$x(...){•formy$y(...){•state%seen;#不必提到迴圈最外層了!

•nextif$seen{$x}{$y}++;•...;•}•}

15

Page 55: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'switch';

16

Page 56: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'switch';

✯ 啟用given/when保留字

16

Page 57: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'switch';

✯ 啟用given/when保留字

✯ 相當於C語言的switch/case

16

Page 58: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'switch';

✯ 啟用given/when保留字

✯ 相當於C語言的switch/case

✯ 支援break/continue/default

16

Page 59: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'switch';

✯ 啟用given/when保留字

✯ 相當於C語言的switch/case

✯ 支援break/continue/default

✯ 強大的「智慧型比對」功能

16

Page 60: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

• # 猜數字遊戲• use5.10;

• my$num=int(rand100); #謎底

• my@guessed; #猜過的數字

• while(my$guess=<STDIN>){chomp$guess;

• given($guess){

• when(/\D/){say"請輸入正整數" }

• when(@guessed){say"您已經猜過這個數字了" }

• when($num){say"猜中了!";last }

• when($_<$num){say"再高一點";continue }

• when($_>$num){say"再低一點";continue }

• push@guessed,$_;}}

17

Page 61: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

智慧型比對

18

Page 62: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

智慧型比對

✯ Perl5.10內建~~算符

18

Page 63: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

智慧型比對

✯ Perl5.10內建~~算符

✯ 沿用Perl6裡的定義

18

Page 64: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

智慧型比對

✯ Perl5.10內建~~算符

✯ 沿用Perl6裡的定義

✯ 毋需usefeature即可使用

18

Page 65: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

• use5.10;

• if($x~~@array){say"$x在陣列裡" }

• if($x~~/match/){say"字串符合樣式" }

• if(@x~~/match/){say"陣列符合樣式" }

• if($key~~%hash){say"$key是雜湊鍵" }

• if(\&func~~$arg){say'func($arg)為真' }

19

Page 66: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

• use5.10;

• if(@array~~$x){say"$x在陣列裡" }

• if(/match/~~$x){say"字串符合樣式" }

• if(/match/~~@x){say"陣列符合樣式" }

• if(%hash~~$key){say"$key是雜湊鍵" }

• if($arg~~\&func){say'func($arg)為真' }

20

Page 67: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'err';

21

Page 68: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'err';✯ $file=param('file')

ordie"請輸入檔名";

21

Page 69: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'err';✯ $file=param('file')

ordie"請輸入檔名";

✯ 檔名是"0"怎麼辦?

21

Page 70: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

usefeature'err';✯ $file=param('file')

ordie"請輸入檔名";

✯ 檔名是"0"怎麼辦?

✯ $file=param('file')errdie"請輸入檔名";

21

Page 71: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

||陷阱

22

Page 72: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

||陷阱

✯ ($x||$y)➥($x?$x:$y)

22

Page 73: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

||陷阱

✯ ($x||$y)➥($x?$x:$y)

✯ 空字串、0、undef都是假值

22

Page 74: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

||陷阱

✯ ($x||$y)➥($x?$x:$y)

✯ 空字串、0、undef都是假值

✯ 一不小心就會出錯

22

Page 75: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

defined‐or

23

Page 76: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

defined‐or

✯ Perl5.10內建//算符

23

Page 77: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

defined‐or

✯ Perl5.10內建//算符

✯ 毋需usefeature即可使用

23

Page 78: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

defined‐or

✯ Perl5.10內建//算符

✯ 毋需usefeature即可使用

✯ ($x//$y)➥defined($x)?$x:$y

23

Page 79: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

• #舊的寫法(好孩子不要學)

• my%bugs_in;

• while(my$module=<>){

• $bugs_in{$module}||=count_bugs($module);

• }

24

Page 80: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

• #新的寫法(用//=就對了)

• my%bugs_in;

• while(my$module=<>){

• $bugs_in{$module}//=count_bugs($module);

• }

25

Page 81: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

㊣(來不

及譯成中文)

26

Page 82: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

27

Source:xkcd.com‐Awebcomicofromance,sarcasm,math,andlanguage.(http://xkcd.com/comics/regular_expressions.png)LicensedundertheCreativeCommonsAttribution‐NonCommercial2.5(http://xkcd.com/license.html)

27

Page 83: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

28

The Great Regex Engine Rewrite

28

Page 84: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

28

The Great Regex Engine Rewrite

− The regular expression engine has been re-engineered and many new features have been added!

28

Page 85: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

28

The Great Regex Engine Rewrite

− The regular expression engine has been re-engineered and many new features have been added!

− This should excite you. If it doesn't then you should drink more coffee!

28

Page 86: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

29

Engine Restructured

29

Page 87: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

29

Engine Restructured

� Recursion Eliminated

29

Page 88: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

29

Engine Restructured

� Recursion Eliminated− Engine is no longer recursive. Iterative engine does

not suffer from stack overflow errors in the C code for perl.

29

Page 89: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

29

Engine Restructured

� Recursion Eliminated− Engine is no longer recursive. Iterative engine does

not suffer from stack overflow errors in the C code for perl.

− Patterns that previously caused the engine to crash will run to finish in Perl 5.10, albeit very... very.... slowly....

29

Page 90: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

29

Engine Restructured

� Recursion Eliminated− Engine is no longer recursive. Iterative engine does

not suffer from stack overflow errors in the C code for perl.

− Patterns that previously caused the engine to crash will run to finish in Perl 5.10, albeit very... very.... slowly....

− It is much easier to override backtracking type behavior in an iterative engine.

29

Page 91: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

29

Engine Restructured

� Recursion Eliminated− Engine is no longer recursive. Iterative engine does

not suffer from stack overflow errors in the C code for perl.

− Patterns that previously caused the engine to crash will run to finish in Perl 5.10, albeit very... very.... slowly....

− It is much easier to override backtracking type behavior in an iterative engine.

− This may have a modest penalty for normal patterns, but the boys in the lab are working on it!

29

Page 92: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

30

Engine Restructured

30

Page 93: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

30

Engine Restructured

� Pluggable interface

30

Page 94: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

30

Engine Restructured

� Pluggable interface− engine is now abstracted with a usable interface.

We can now plug in other engines....

30

Page 95: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

30

Engine Restructured

� Pluggable interface− engine is now abstracted with a usable interface.

We can now plug in other engines....

use re::engine::PCRE;

30

Page 96: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

30

Engine Restructured

� Pluggable interface− engine is now abstracted with a usable interface.

We can now plug in other engines....

use re::engine::PCRE;− use re 'debug'; is now lexically scoped, as is

the use of any other engine.

30

Page 97: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

30

Engine Restructured

� Pluggable interface− engine is now abstracted with a usable interface.

We can now plug in other engines....

use re::engine::PCRE;− use re 'debug'; is now lexically scoped, as is

the use of any other engine.− default engine can be extended or instrumented

post release without requiring a full build.

30

Page 98: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

30

Engine Restructured

� Pluggable interface− engine is now abstracted with a usable interface.

We can now plug in other engines....

use re::engine::PCRE;− use re 'debug'; is now lexically scoped, as is

the use of any other engine.− default engine can be extended or instrumented

post release without requiring a full build.− All your regex engines are belong to us!

30

Page 99: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

31

Quantifier Combinatorial Explosion!

� This pattern matches quoted strings and supports Perl / C style escapes

� Except.... It's evil....� The combination of * and + leads to

combinatorial explosion

qr/ " (?: [^"\\]+ | (?:\\.)+ )* " /x

31

Page 100: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

32

Call the bomb squad!

32

Page 101: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

32

Call the bomb squad!

− How to deal with combinatorial explosion?

32

Page 102: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

32

Call the bomb squad!

− How to deal with combinatorial explosion?− One solution is the (?>.....) construct

32

Page 103: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

32

Call the bomb squad!

− How to deal with combinatorial explosion?− One solution is the (?>.....) construct− This matches its contents, and then refuses to give any

back should what follows not match

32

Page 104: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

32

Call the bomb squad!

− How to deal with combinatorial explosion?− One solution is the (?>.....) construct− This matches its contents, and then refuses to give any

back should what follows not match− So we can rewrite the pattern to use this construct

32

Page 105: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

32

Call the bomb squad!

− How to deal with combinatorial explosion?− One solution is the (?>.....) construct− This matches its contents, and then refuses to give any

back should what follows not match− So we can rewrite the pattern to use this construct− Except its pretty nasty to read....

32

Page 106: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

33

The Good, the Bad, and the Ugly!

� (?>....) prevents combinatorial explosion� But there must be a nicer way to do this� After all not everything in perl needs to be

incomprehensible.� Enter possessive quantifiers....

qr/ " (?> (?: (?> [^"\\]+) | (?> (?:\\.)+ ) )* ) " /x

33

Page 107: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

34

Possessive Quantifiers

qr/ " (?: [^"\\]++ | (?:\\.)++ )*+ " /x

34

Page 108: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

34

Possessive Quantifiers

− Now we can write common forms of this in an easier way by using possessive quantifiers.

qr/ " (?: [^"\\]++ | (?:\\.)++ )*+ " /x

34

Page 109: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

34

Possessive Quantifiers

− Now we can write common forms of this in an easier way by using possessive quantifiers.

− A possessive quantifier matches as much as it can and never gives any back.

qr/ " (?: [^"\\]++ | (?:\\.)++ )*+ " /x

34

Page 110: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

34

Possessive Quantifiers

− Now we can write common forms of this in an easier way by using possessive quantifiers.

− A possessive quantifier matches as much as it can and never gives any back.

− The notation is to put a plus immediately after the main quantifier.

qr/ " (?: [^"\\]++ | (?:\\.)++ )*+ " /x

34

Page 111: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

34

Possessive Quantifiers

− Now we can write common forms of this in an easier way by using possessive quantifiers.

− A possessive quantifier matches as much as it can and never gives any back.

− The notation is to put a plus immediately after the main quantifier.

− Thus /(?>X+)/ can be rewritten as /X++/

qr/ " (?: [^"\\]++ | (?:\\.)++ )*+ " /x

34

Page 112: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

35

'aaaa' =~ m/ a+ a /x; # Will match by backtracking

Compiling REx " a+ a "Final program: 1: PLUS (4) 2: EXACT <a> (0) 4: EXACT <a> (6) 6: END (0)anchored "a" at 0 floating "aa" at 0..2147483647 (checking floating) plus minlen 2 Guessing start of match in sv for REx " a+ a " against "aaaa"Found floating substr "aa" at offset 0...Found anchored substr "a" at offset 0...Guessed: match at offset 0Matching REx " a+ a " against "aaaa" 0 <> <aaaa> | 1:PLUS(4) EXACT <a> can match 4 times out of 2147483647... 3 <aaa> <a> | 4: EXACT <a>(6) 4 <aaaa> <> | 6: END(0)Match successful!Freeing REx: " a+ a "

35

Page 113: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

36

# Will not match because a++ wont ever give anything back!'aaaa' =~ m/ a++ a /x;

Compiling REx " a++ a "Final program: 1: SUSPEND (8) 3: PLUS (6) 4: EXACT <a> (0) 6: SUCCEED (0) 7: TAIL (8) 8: EXACT <a> (10) 10: END (0)[ ........... ]Matching REx " a++ a " against "aaaa" 0 <> <aaaa> | 1:SUSPEND(8) 0 <> <aaaa> | 3: PLUS(6) EXACT <a> can match 4 times out of 2147483647... 4 <aaaa> <> | 6: SUCCEED(0) subpattern success... 4 <aaaa> <> | 8:EXACT <a>(10) failed...[ ........... ] 3 <aaa> <a> | 1:SUSPEND(8) 3 <aaa> <a> | 3: PLUS(6) EXACT <a> can match 1 times out of 2147483647... 4 <aaaa> <> | 6: SUCCEED(0) subpattern success... 4 <aaaa> <> | 8:EXACT <a>(10) failed...Match failedFreeing REx: " a++ a "

36

Page 114: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

37

Capture Buffers

/ (foo) $user_qr (what-number-am-i) /x

37

Page 115: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

37

Capture Buffers

− Before Perl 5.10 capture buffers were numbered only.

/ (foo) $user_qr (what-number-am-i) /x

37

Page 116: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

37

Capture Buffers

− Before Perl 5.10 capture buffers were numbered only.− Adding a new buffer means that the numbering of the

buffers following it changes, often requiring changes to the code using the pattern.

/ (foo) $user_qr (what-number-am-i) /x

37

Page 117: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

37

Capture Buffers

− Before Perl 5.10 capture buffers were numbered only.− Adding a new buffer means that the numbering of the

buffers following it changes, often requiring changes to the code using the pattern.

− How do we know what number the last buffer in the below pattern will have?

/ (foo) $user_qr (what-number-am-i) /x

37

Page 118: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

38

Named Capture Buffers

38

Page 119: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

38

Named Capture Buffers

− So in Perl 5.10 we added named capture buffers.

38

Page 120: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

38

Named Capture Buffers

− So in Perl 5.10 we added named capture buffers. − We used the .Net syntax as most people think its

nicer than Pythons.

38

Page 121: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

38

Named Capture Buffers

− So in Perl 5.10 we added named capture buffers. − We used the .Net syntax as most people think its

nicer than Pythons. − We didn't use their numbering scheme tho.

(Note to .Net developers: Crack Kills.)

38

Page 122: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

39

Named Capture

39

Page 123: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

39

Named Capture− Declaration:

39

Page 124: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

39

Named Capture− Declaration:

(?<name>pat) or (?'name'pat)

39

Page 125: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

39

Named Capture− Declaration:

(?<name>pat) or (?'name'pat)− Backreference:

39

Page 126: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

39

Named Capture− Declaration:

(?<name>pat) or (?'name'pat)− Backreference:

\k<name> or \k'name'

39

Page 127: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

40

Getting results from named captures

40

Page 128: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

40

Getting results from named captures

− %+ hash contains the contents of the leftmost capture of a given name that was involved in the match.

40

Page 129: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

40

Getting results from named captures

− %+ hash contains the contents of the leftmost capture of a given name that was involved in the match.

For example:$+{foo}

40

Page 130: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

40

Getting results from named captures

− %+ hash contains the contents of the leftmost capture of a given name that was involved in the match.

For example:$+{foo}− %- hash contains an array with the contents of all

the buffers of a given name.

40

Page 131: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

40

Getting results from named captures

− %+ hash contains the contents of the leftmost capture of a given name that was involved in the match.

For example:$+{foo}− %- hash contains an array with the contents of all

the buffers of a given name.

For example: $‐{foo}[0]

40

Page 132: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

40

Getting results from named captures

− %+ hash contains the contents of the leftmost capture of a given name that was involved in the match.

For example:$+{foo}− %- hash contains an array with the contents of all

the buffers of a given name.

For example: $‐{foo}[0]− exists() can be used to check if a buffer has

content just like with any other hash.

40

Page 133: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

41

Original Back-reference Syntax

/(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)\11/

41

Page 134: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

41

Original Back-reference Syntax

− It is tricky to use back-references in embeddable qr// constructs

/(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)\11/

41

Page 135: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

41

Original Back-reference Syntax

− It is tricky to use back-references in embeddable qr// constructs

− Original back-reference syntax is open to ambiguity. (Is it octal or not?)

/(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)\11/

41

Page 136: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

41

Original Back-reference Syntax

− It is tricky to use back-references in embeddable qr// constructs

− Original back-reference syntax is open to ambiguity. (Is it octal or not?)

− What does \11 in the below pattern mean?

/(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)\11/

41

Page 137: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

41

Original Back-reference Syntax

− It is tricky to use back-references in embeddable qr// constructs

− Original back-reference syntax is open to ambiguity. (Is it octal or not?)

− What does \11 in the below pattern mean?− Is that octal or a back-reference?

/(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)\11/

41

Page 138: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

42

More Problems with Numeric Backreferences

42

Page 139: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

42

More Problems with Numeric Backreferences

− Concatenation is unsafe (think of “\1” and “1”)

42

Page 140: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

42

More Problems with Numeric Backreferences

− Concatenation is unsafe (think of “\1” and “1”)− This could cause problems for code generators

42

Page 141: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

42

More Problems with Numeric Backreferences

− Concatenation is unsafe (think of “\1” and “1”)− This could cause problems for code generators− Traditional styles of escaping don't help. About the best

solution is to use (?:\1) or /x and spaces to separate the components.

42

Page 142: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

42

More Problems with Numeric Backreferences

− Concatenation is unsafe (think of “\1” and “1”)− This could cause problems for code generators− Traditional styles of escaping don't help. About the best

solution is to use (?:\1) or /x and spaces to separate the components.

− It would be nice to have a syntax that would let us avoid these problems

42

Page 143: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

42

More Problems with Numeric Backreferences

− Concatenation is unsafe (think of “\1” and “1”)− This could cause problems for code generators− Traditional styles of escaping don't help. About the best

solution is to use (?:\1) or /x and spaces to separate the components.

− It would be nice to have a syntax that would let us avoid these problems

− So we invented one....

42

Page 144: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

43

New Back-Reference Syntax

43

Page 145: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

43

New Back-Reference Syntax

− New syntax for \1

43

Page 146: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

43

New Back-Reference Syntax

− New syntax for \1\g{1} or \g1

43

Page 147: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

43

New Back-Reference Syntax

− New syntax for \1\g{1} or \g1

− Relative back-references

43

Page 148: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

43

New Back-Reference Syntax

− New syntax for \1\g{1} or \g1

− Relative back-references\g{-1} or \g-1

43

Page 149: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

43

New Back-Reference Syntax

− New syntax for \1\g{1} or \g1

− Relative back-references\g{-1} or \g-1

refers to the previous Nth capture buffer

43

Page 150: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

43

New Back-Reference Syntax

− New syntax for \1\g{1} or \g1

− Relative back-references\g{-1} or \g-1

refers to the previous Nth capture buffer− For instance a generic embeddable dupe word

matcher:

43

Page 151: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

43

New Back-Reference Syntax

− New syntax for \1\g{1} or \g1

− Relative back-references\g{-1} or \g-1

refers to the previous Nth capture buffer− For instance a generic embeddable dupe word

matcher:my $dupew = qr/(\w+)\s+\g{-1}/;

43

Page 152: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

44

Matching balanced constructs....

44

Page 153: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

44

Matching balanced constructs....

− Is not a problem that traditional regular expression engines can solve

44

Page 154: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

44

Matching balanced constructs....

− Is not a problem that traditional regular expression engines can solve

− They can only handle an arbitrary nesting depth

44

Page 155: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

44

Matching balanced constructs....

− Is not a problem that traditional regular expression engines can solve

− They can only handle an arbitrary nesting depth− Except Perl doesn't use a traditional regular expression

engine

44

Page 156: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

44

Matching balanced constructs....

− Is not a problem that traditional regular expression engines can solve

− They can only handle an arbitrary nesting depth− Except Perl doesn't use a traditional regular expression

engine− So we can write patterns that will match any level of

nesting if we wish

44

Page 157: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

44

Matching balanced constructs....

− Is not a problem that traditional regular expression engines can solve

− They can only handle an arbitrary nesting depth− Except Perl doesn't use a traditional regular expression

engine− So we can write patterns that will match any level of

nesting if we wish− But it's not exactly easy.....

44

Page 158: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

45

Recursive Patterns Using Eval

45

Page 159: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

45

Recursive Patterns Using Eval

− Old way – dynamic patterns our $pat;

$pat = qr/\((?>(?>[^()]+)|(??{$pat}))*\)/x;

if ('(x(x)y(x)x)' =~ /^($pat)$/){ ... }

− Inherent problems1. Slow – requires the interpreter to resolve what $pat holds

2. Fat – requires two patterns (inner and outer)

3. Clumsy – requires use of global vars

4. Ugly – pattern is not self contained. Pattern does not in of itself explain what it does.

45

Page 160: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

46

Old way in more detail

qr/

\( # Open paren

(?> # Possessive subgroup

(?> [^()]+ ) # Grab all the non parens we can

| # or

(??{$pat}) # Recurse, grab a balanced paren

)* # Zero or more times

\) # Close paren

/x;

46

Page 161: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

47

Recursive Patterns

47

Page 162: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

47

Recursive Patterns

− With the (?1) notation things are a little easier.

47

Page 163: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

47

Recursive Patterns

− With the (?1) notation things are a little easier.if ('(x(x)y(x)x)'=~m/^(\((?>[^()]++|(?1))*\))$/x)

47

Page 164: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

47

Recursive Patterns

− With the (?1) notation things are a little easier.if ('(x(x)y(x)x)'=~m/^(\((?>[^()]++|(?1))*\))$/x)

{ ... }

47

Page 165: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

47

Recursive Patterns

− With the (?1) notation things are a little easier.if ('(x(x)y(x)x)'=~m/^(\((?>[^()]++|(?1))*\))$/x)

{ ... }

− Advantages:

47

Page 166: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

47

Recursive Patterns

− With the (?1) notation things are a little easier.if ('(x(x)y(x)x)'=~m/^(\((?>[^()]++|(?1))*\))$/x)

{ ... }

− Advantages:1. Faster – Perl interpreter is not involved. Since pattern may not change

the engine can optimize the pattern.

47

Page 167: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

47

Recursive Patterns

− With the (?1) notation things are a little easier.if ('(x(x)y(x)x)'=~m/^(\((?>[^()]++|(?1))*\))$/x)

{ ... }

− Advantages:1. Faster – Perl interpreter is not involved. Since pattern may not change

the engine can optimize the pattern.

2. Smaller – pattern need not be duplicated

47

Page 168: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

47

Recursive Patterns

− With the (?1) notation things are a little easier.if ('(x(x)y(x)x)'=~m/^(\((?>[^()]++|(?1))*\))$/x)

{ ... }

− Advantages:1. Faster – Perl interpreter is not involved. Since pattern may not change

the engine can optimize the pattern.

2. Smaller – pattern need not be duplicated

3. Self contained –no global vars

47

Page 169: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

47

Recursive Patterns

− With the (?1) notation things are a little easier.if ('(x(x)y(x)x)'=~m/^(\((?>[^()]++|(?1))*\))$/x)

{ ... }

− Advantages:1. Faster – Perl interpreter is not involved. Since pattern may not change

the engine can optimize the pattern.

2. Smaller – pattern need not be duplicated

3. Self contained –no global vars

4. Self describing – pattern is self contained. Context is not required to see what the pattern matches.

47

Page 170: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

48

Pattern Recursion in more detail

48

Page 171: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

48

Pattern Recursion in more detail

− Pattern recursion allows us to treat the contents of a particular pattern buffer as an independent subexpression

48

Page 172: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

48

Pattern Recursion in more detail

− Pattern recursion allows us to treat the contents of a particular pattern buffer as an independent subexpression

− (?1) recurses into the first capture buffer in the pattern

48

Page 173: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

48

Pattern Recursion in more detail

− Pattern recursion allows us to treat the contents of a particular pattern buffer as an independent subexpression

− (?1) recurses into the first capture buffer in the pattern− (?R) and its alias (?0) allows us to treat the entire

pattern as an independent subexpression

48

Page 174: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

48

Pattern Recursion in more detail

− Pattern recursion allows us to treat the contents of a particular pattern buffer as an independent subexpression

− (?1) recurses into the first capture buffer in the pattern− (?R) and its alias (?0) allows us to treat the entire

pattern as an independent subexpression− If we have named a buffer we can also recurse into it

by name by using (?&NAME)

48

Page 175: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

48

Pattern Recursion in more detail

− Pattern recursion allows us to treat the contents of a particular pattern buffer as an independent subexpression

− (?1) recurses into the first capture buffer in the pattern− (?R) and its alias (?0) allows us to treat the entire

pattern as an independent subexpression− If we have named a buffer we can also recurse into it

by name by using (?&NAME)− This is useful for writing grammars...

48

Page 176: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

49

New way in more detail

qr/

^ # Start of string ( # Start capture group 1 \( # Open paren (?> # Possessive capture subgroup [^()]++ # Grab all the non parens we can | # or (?1) # Recurse into group 1 )* # Zero more times \) # Close Paren ) # End capture group 1 $ # End of string/x;

49

Page 177: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

50

Recursive Patterns

50

Page 178: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

50

Recursive Patterns

− Need not be recursive:

50

Page 179: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

50

Recursive Patterns

− Need not be recursive:if (“AABB”=~/(A)(?1)(?2)(B)/) {...}

50

Page 180: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

50

Recursive Patterns

− Need not be recursive:if (“AABB”=~/(A)(?1)(?2)(B)/) {...}

− Think “regex subroutine” or Perl6 Rule (sorta)

50

Page 181: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

50

Recursive Patterns

− Need not be recursive:if (“AABB”=~/(A)(?1)(?2)(B)/) {...}

− Think “regex subroutine” or Perl6 Rule (sorta)− Relative referencing is possible:

50

Page 182: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

50

Recursive Patterns

− Need not be recursive:if (“AABB”=~/(A)(?1)(?2)(B)/) {...}

− Think “regex subroutine” or Perl6 Rule (sorta)− Relative referencing is possible:

/(A)(?-1)(?+1)(B)/

50

Page 183: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

50

Recursive Patterns

− Need not be recursive:if (“AABB”=~/(A)(?1)(?2)(B)/) {...}

− Think “regex subroutine” or Perl6 Rule (sorta)− Relative referencing is possible:

/(A)(?-1)(?+1)(B)/− Relative referencing facilitates easy embedding

and reuse of recursive patterns

50

Page 184: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

51

qr { (?(DEFINE) (?<address> (?&mailbox) | (?&group)) (?<mailbox> (?&name_addr) | (?&addr_spec)) (?<name_addr> (?&display_name)? (?&angle_addr)) (?<angle_addr> (?&CFWS)? < (?&addr_spec) > (?&CFWS)?) (?<group> (?&display_name) : (?:(?&mailbox_list) | (?&CFWS))? ; (?&CFWS)?) (?<display_name> (?&phrase)) (?<mailbox_list> (?&mailbox) (?: , (?&mailbox))*) (?<address_list> (?&address) (?: , (?&address))*) (?<addr_spec> (?&local_part) \@ (?&domain)) (?<local_part> (?&dot_atom) | (?&quoted_string)) (?<domain> (?&dot_atom) | (?&domain_literal)) (?<domain_literal> (?&CFWS)? \[ (?: (?&FWS)? dcontent)* (?&FWS)? \] (?&CFWS)?) (?<dcontent> (?&dtext) | (?&quoted_pair)) (?<dtext> (?&NO_WS_CTL) | [\x21-\x5a\x5e-\x7e]) (?<atext> (?&ALPHA) | (?&DIGIT) | [!#\$%&'*+-/=?^_`{|}~]) (?<atom> (?&CFWS)? (?&atext)+ (?&CFWS)?) (?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)?) (?<dot_atom_text> (?&atext)+ (?: \. (?&atext)+)*) (?<text> [\x01-\x09\x0b\x0c\x0e-\x7f]) (?<quoted_pair> \\ (?&text)) (?<qtext> (?&NO_WS_CTL) | [\x21\x23-\x5b\x5d-\x7e]) (?<qcontent> (?&qtext) | (?&quoted_pair)) (?<quoted_string> (?&CFWS)? (?&DQUOTE) (?:(?&FWS)? (?&qcontent))* (?&FWS)? (?&DQUOTE) (?&CFWS)?) (?<word> (?&atom) | (?&quoted_string)) (?<phrase> (?&word)+) # Folding white space (?<FWS> (?: (?&WSP)* (?&CRLF))? (?&WSP)+) (?<ctext> (?&NO_WS_CTL) | [\x21-\x27\x2a-\x5b\x5d-\x7e]) (?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment)) (?<comment> \( (?: (?&FWS)? (?&ccontent))* (?&FWS)? \) ) (?<CFWS> (?: (?&FWS)? (?&comment))* (?: (?:(?&FWS)? (?&comment)) | (?&FWS))) # No whitespace control (?<NO_WS_CTL> [\x01-\x08\x0b\x0c\x0e-\x1f\x7f]) (?<ALPHA> [A-Za-z]) (?<DIGIT> [0-9]) (?<CRLF> \x0d \x0a) (?<DQUOTE> ") (?<WSP> [\x20\x09]) ) (?&address)}x;

51

Page 185: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

52

What's this (?(DEFINE)....) thing?

52

Page 186: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

52

What's this (?(DEFINE)....) thing?

− Recursing to a named capture buffer is pretty much like matching a rule

52

Page 187: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

52

What's this (?(DEFINE)....) thing?

− Recursing to a named capture buffer is pretty much like matching a rule

− So it would be nice to be able to define a rule out of place from where it is first used

52

Page 188: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

52

What's this (?(DEFINE)....) thing?

− Recursing to a named capture buffer is pretty much like matching a rule

− So it would be nice to be able to define a rule out of place from where it is first used

− So you can use the (?(DEFINE)....) construct

52

Page 189: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

52

What's this (?(DEFINE)....) thing?

− Recursing to a named capture buffer is pretty much like matching a rule

− So it would be nice to be able to define a rule out of place from where it is first used

− So you can use the (?(DEFINE)....) construct− Whatever is in the .... will never be used as part of the

match

52

Page 190: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

52

What's this (?(DEFINE)....) thing?

− Recursing to a named capture buffer is pretty much like matching a rule

− So it would be nice to be able to define a rule out of place from where it is first used

− So you can use the (?(DEFINE)....) construct− Whatever is in the .... will never be used as part of the

match− Unless it is recursed into.

52

Page 191: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

53

The “Keep” pattern \K

53

Page 192: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

53

The “Keep” pattern \K

− Originally implemented via Regexp::Keep by Jeff Pinyan (japhy)

53

Page 193: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

53

The “Keep” pattern \K

− Originally implemented via Regexp::Keep by Jeff Pinyan (japhy)

− Says that everything before the \K should not be included in a match. Same thing as <( in perl6.

53

Page 194: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

53

The “Keep” pattern \K

− Originally implemented via Regexp::Keep by Jeff Pinyan (japhy)

− Says that everything before the \K should not be included in a match. Same thing as <( in perl6.

− This is effectively a form of variable length positive look-behind, but much more efficient.

53

Page 195: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

53

The “Keep” pattern \K

− Originally implemented via Regexp::Keep by Jeff Pinyan (japhy)

− Says that everything before the \K should not be included in a match. Same thing as <( in perl6.

− This is effectively a form of variable length positive look-behind, but much more efficient.

− Especially useful in substitution as it means you often can avoid capturing

53

Page 196: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

53

The “Keep” pattern \K

− Originally implemented via Regexp::Keep by Jeff Pinyan (japhy)

− Says that everything before the \K should not be included in a match. Same thing as <( in perl6.

− This is effectively a form of variable length positive look-behind, but much more efficient.

− Especially useful in substitution as it means you often can avoid capturing

− Thanks japhy!

53

Page 197: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

54

cmpthese -1, { keep => sub { my $str = $test; $str =~ s/fo+\Kbar/baz/; }, nokeep=> sub { my $str = $test; $str =~ s/(fo+)bar/${1}baz/; },}

Rate nokeep keepnokeep 74490/s -- -88%keep 595923/s 700% --

54

Page 198: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

55

The /p modifier!

55

Page 199: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

55

The /p modifier!

− Due to the dynamic nature of Perl the use of $`, $& and $' anywhere in a script has a global performance penalty.

55

Page 200: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

55

The /p modifier!

− Due to the dynamic nature of Perl the use of $`, $& and $' anywhere in a script has a global performance penalty.

− With /p modifier the variables ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} maybe be used instead.

55

Page 201: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

55

The /p modifier!

− Due to the dynamic nature of Perl the use of $`, $& and $' anywhere in a script has a global performance penalty.

− With /p modifier the variables ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} maybe be used instead.

− No penalty! Or at least the same penalty as from using capturing, that is only the regex with the /p modifier will be affected.

55

Page 202: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

56

Branch Reset Patternwhile (/\G\s* (?:value=(\w+)) |”(\w+)” /gx) { my $text= defined $1 ? $1 : $2 }

56

Page 203: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

56

Branch Reset Pattern

− Sometimes you want to match one of several possibilities, but only capture part of what you match. For instance in a tokenizer

while (/\G\s* (?:value=(\w+)) |”(\w+)” /gx) { my $text= defined $1 ? $1 : $2 }

56

Page 204: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

56

Branch Reset Pattern

− Sometimes you want to match one of several possibilities, but only capture part of what you match. For instance in a tokenizer

− But that can be a real pain. So H. Merijn Brand suggested a better way....

while (/\G\s* (?:value=(\w+)) |”(\w+)” /gx) { my $text= defined $1 ? $1 : $2 }

56

Page 205: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

56

Branch Reset Pattern

− Sometimes you want to match one of several possibilities, but only capture part of what you match. For instance in a tokenizer

− But that can be a real pain. So H. Merijn Brand suggested a better way....

− (?|.....)

while (/\G\s* (?:value=(\w+)) |”(\w+)” /gx) { my $text= defined $1 ? $1 : $2 }

56

Page 206: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

57

Branch Reset Patternwhile (/\G\s* (?| value=(\w+)) | ”(\w+)” )/gx) { my $text= $1 }

57

Page 207: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

57

Branch Reset Pattern

− Capture buffers in each branch share the same numbers, in other words the buffer number is reset to the same value at the start of each branch.

while (/\G\s* (?| value=(\w+)) | ”(\w+)” )/gx) { my $text= $1 }

57

Page 208: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

57

Branch Reset Pattern

− Capture buffers in each branch share the same numbers, in other words the buffer number is reset to the same value at the start of each branch.

− Buffers following the construct are numbered as though there is only one branch, that with the most buffers in it.

while (/\G\s* (?| value=(\w+)) | ”(\w+)” )/gx) { my $text= $1 }

57

Page 209: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

58

New Optimizations

58

Page 210: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

58

New Optimizations

− Trie and Aho-Corasick matching

58

Page 211: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

58

New Optimizations

− Trie and Aho-Corasick matching− Alternations starting with literal text will be merged

into a single TRIE construct.

58

Page 212: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

58

New Optimizations

− Trie and Aho-Corasick matching− Alternations starting with literal text will be merged

into a single TRIE construct.− If a TRIE is the first matching regop in a regular

expression the engine will create an Aho-Corasick matcher and use it for start point determination.

58

Page 213: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

58

New Optimizations

− Trie and Aho-Corasick matching− Alternations starting with literal text will be merged

into a single TRIE construct.− If a TRIE is the first matching regop in a regular

expression the engine will create an Aho-Corasick matcher and use it for start point determination.

− Much more efficient, match time is not dependent on the number of sub-patterns in the TRIE, unlike normal alternation.

58

Page 214: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

59

New Optimizations

59

Page 215: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

59

New Optimizations

− One cool thing about tries is that they allow us to find common prefixes....

59

Page 216: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

59

New Optimizations

− One cool thing about tries is that they allow us to find common prefixes....

− Which we can extract from the alternation if it exists:

59

Page 217: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

59

New Optimizations

− One cool thing about tries is that they allow us to find common prefixes....

− Which we can extract from the alternation if it exists:/foam|foal|foad/

59

Page 218: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

59

New Optimizations

− One cool thing about tries is that they allow us to find common prefixes....

− Which we can extract from the alternation if it exists:/foam|foal|foad/

will be optimized to perform the same as

59

Page 219: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

59

New Optimizations

− One cool thing about tries is that they allow us to find common prefixes....

− Which we can extract from the alternation if it exists:/foam|foal|foad/

will be optimized to perform the same as

/foa[mld]/

59

Page 220: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

60

Backtracking

60

Page 221: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

61

A bit about backtracking

61

Page 222: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

61

A bit about backtracking

− Perls engine is a backtracking engine

61

Page 223: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

61

A bit about backtracking

− Perls engine is a backtracking engine− Don't confuse NFA with 'backtracking', just because an

engine is NFA doesn't mean it uses backtracking.

61

Page 224: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

61

A bit about backtracking

− Perls engine is a backtracking engine− Don't confuse NFA with 'backtracking', just because an

engine is NFA doesn't mean it uses backtracking.− It has to use backtracking to provide some of the

advanced features, especially backreferences, which strictly speaking aren't regular

61

Page 225: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

61

A bit about backtracking

− Perls engine is a backtracking engine− Don't confuse NFA with 'backtracking', just because an

engine is NFA doesn't mean it uses backtracking.− It has to use backtracking to provide some of the

advanced features, especially backreferences, which strictly speaking aren't regular

− Leftmost-longest is also a side effect of backtracking.

61

Page 226: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

62

Backtracking Control Verbs

62

Page 227: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

62

Backtracking Control Verbs

− Regexes are normally declarative, meaning that their behavior, in theory, should be independent of their implementation.

62

Page 228: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

62

Backtracking Control Verbs

− Regexes are normally declarative, meaning that their behavior, in theory, should be independent of their implementation.

− In real life implementation has a significant effect on behavior and performance.

62

Page 229: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

62

Backtracking Control Verbs

− Regexes are normally declarative, meaning that their behavior, in theory, should be independent of their implementation.

− In real life implementation has a significant effect on behavior and performance.

− Controlling how the engine backtracks can result in significant speedups

62

Page 230: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

62

Backtracking Control Verbs

− Regexes are normally declarative, meaning that their behavior, in theory, should be independent of their implementation.

− In real life implementation has a significant effect on behavior and performance.

− Controlling how the engine backtracks can result in significant speedups

− Thus we now have verbs to control this behavior

62

Page 231: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

63

New Verbs: (*VERB)

� (*FAIL)� (*ACCEPT)� (*PRUNE)� (*MARK)� (*SKIP)� (*THEN)� (*COMMIT)

� (*PRUNE:NAME)� (*MARK:NAME)� (*:NAME)� (*SKIP:NAME)� (*THEN:NAME)� (*COMMIT:NAME)

63

Page 232: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

64

Simple Backtracking Control Verbs

� (*FAIL)− simplest verb. Syntactic sugar for (?!)− can be used to force the engine to backtrack and

therefore find all matches in a pattern, similar to how “exhaustive matching” would work in Perl6'aaab' =~ /a+b?(?{print $&})(*FAIL)/

− would print out every possible substring that matches the pattern /a+b?/

64

Page 233: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

65

'aaab' =~ /(?{ print "\n" }) a+ b? (?{ print "$& " }) (*FAIL)/x;print "\n";

aaab aaa aa a aab aa a ab a

Exhaustive matching with (*FAIL)

65

Page 234: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

66

Simple Backtracking Control Verbs

� (*ACCEPT)� causes a pattern to be accepted at the current match

point even if there is more pattern to be matched.� think of this as being something like a “return”

statement for regexes.

'AC'=~/A (? B | C (*ACCEPT) | D ) E/xwill match

66

Page 235: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

67

'AC'=~/A (?: B | C (*ACCEPT) | D ) E/x and print $&,"\n";

Guessing start of match in sv for REx "A (?: B | C (*ACCEPT) | D ) E" against "AC"Found anchored substr "A" at offset 0...Guessed: match at offset 0Matching REx "A (?: B | C (*ACCEPT) | D ) E" against "AC" 0 <> <AC> | 1:EXACT <A>(3) 1 <A> <C> | 3:TRIE-EXACT[BCD](15) 1 <A> <C> | State: 1 Accepted: 0 Charid: 2 CP: 43 After State: 3 2 <AC> <> | State: 3 Accepted: 1 Charid: 2 CP: 0 After State: 0 got 1 possible matches only one match left: #2 <C> 2 <AC> <> | 9:ACCEPT0(15)Match successful!AC

Using (*ACCEPT)

67

Page 236: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

68

Backtracking Control Verbs

� Rest of the verbs support an argument and are of the form (*NAME:arg)

� When such verbs are used the regex engine will set the package variable $REGERROR and $REGMARK.

� On success $REGERROR is set to false, and $REGMARK set to the 'arg' of the last verb involved in the match

� On failure it is the reverse.

68

Page 237: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

69

Backtracking Control Verbs

� (*PRUNE)− when backtracked into, the match fails at the current

starting position.− similar to (?>...) except that prune is unary and not a

bracketing construct. Thus you can write:

/A (? B | C (*PRUNE) ) D/− Can be used with (*FAIL) to find the longest match

possible for every start position in the string.

/a+b?(?{print $&})(*PRUNE)(*FAIL)/

69

Page 238: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

70

'aaab' =~ / a+ b? (?{print "PRUNE: $&" }) (*PRUNE) (*FAIL) /x;

PRUNE: aaabPRUNE: aabPRUNE: ab

Using (*PRUNE)

70

Page 239: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

71

Backtracking Control Verbs

� (*MARK:name)− when executed “marks” the current position in the

string and gives it a name. − Provides a way to see what “path” the engine has

taken through the pattern.

/foo(*:A)|bar(*:B)|baz(*:C)/ and print $REGMARK;

− (*:name) is a short form of (*MARK:name)− Invented for Spam-Assassin

71

Page 240: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

72

'baz' =~/foo(*:A)|bar(*:B)|baz(*:C)/ and print "REGMARK: $REGMARK";

REGMARK: C

Using (*MARK)

72

Page 241: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

73

Backtracking Control Verbs

� (*SKIP:name)− when backtracked into the engine will reject all

matches up to the point where the cursor was when the (*SKIP) was entered.

− can be coupled with (*MARK:name) to skip the text up to the point where the (*MARK) was executed.

− Can be used with (*FAIL) to find all of the non overlapping matches in a string:

'aaabaaab'=~/a+b?(*SKIP)(?{print $&})(*FAIL)/

73

Page 242: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

74

'aaabaaab' =~ / a+ b? (*SKIP) (?{print "SKIP: $&"}) (*FAIL) /x;

SKIP: aaabSKIP: aaab

Using (*SKIP)

74

Page 243: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

75

Backtracking Control Verbs

� (*THEN)− When backtracked into causes the pattern to

backtrack directly to the next alternation in the pattern.

− Can be thought of as an if/then statement for regular expressions.

/COND1 (*THEN) REST1 | COND2 (*THEN) REST2/

− When not used in an alternation acts just like (*PRUNE) would.

− In Perl6 this is called the “cut group” operator.

75

Page 244: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

76

Backtracking Control Verbs

� (*COMMIT)− When backtracked into causes the pattern to fail

outright. The engine will not try to match the pattern at any other start point.

− In Perl6 this is called... <commit>

'foofoofoobar' =~ / (fo+)* (*COMMIT) [BC]ar /x;

will fail, and will do so quickly.

'foofoofoobar' =~ / (fo+)* [BC]ar /x;

won't, although it could be worse as the super-linear cache kicks in. (see commit.pl)

76

Page 245: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

❦77

Page 246: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

斷言函式(Assertion)

78

Page 247: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

斷言函式(Assertion)

❦ subassert:assertion{...}

78

Page 248: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

斷言函式(Assertion)

❦ subassert:assertion{...}

❦ perl‐A啟用所有斷言

78

Page 249: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

斷言函式(Assertion)

❦ subassert:assertion{...}

❦ perl‐A啟用所有斷言

❦ perl‐A=Foo啟用特定斷言

78

Page 250: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

斷言函式(Assertion)

❦ subassert:assertion{...}

❦ perl‐A啟用所有斷言

❦ perl‐A=Foo啟用特定斷言

❦ 自動忽略未啟用的斷言呼叫

78

Page 251: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

自行定義pragma

79

Page 252: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

自行定義pragma

❦ usestrict;# 區塊內生效

79

Page 253: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

自行定義pragma

❦ usestrict;# 區塊內生效

❦ 編譯時將資訊寫入%^H變數

79

Page 254: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

自行定義pragma

❦ usestrict;# 區塊內生效

❦ 編譯時將資訊寫入%^H變數

❦ 執行時用caller取回編譯資訊

79

Page 255: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

• packageyes_means_no;• subimport{$^H{yes_means_no}=1}• subunimport{$^H{yes_means_no}=0}• 1;•

80

Page 256: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

• packageyes_means_no;• subimport{$^H{yes_means_no}=1}• subunimport{$^H{yes_means_no}=0}• 1;•• packagemain;• subyes{• my$hints=(caller(0))[10];• if($hints‐>{yes_means_no}){• return"No";• }• else{• return"Yes";• }• }

80

Page 257: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

• packageyes_means_no;• subimport{$^H{yes_means_no}=1}• subunimport{$^H{yes_means_no}=0}• 1;•• packagemain;• subyes{• my$hints=(caller(0))[10];• if($hints‐>{yes_means_no}){• return"No";• }• else{• return"Yes";• }• }

• {useyes_means_no;printyes();} #"No"• printyes(); #"Yes"

80

Page 258: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

UNIVERSAL::isa

81

Page 259: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

UNIVERSAL::isa

❦ $obj‐>isa('Logger');

81

Page 260: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

UNIVERSAL::isa

❦ $obj‐>isa('Logger');

❦ $obj必須繼承Logger類型

81

Page 261: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

UNIVERSAL::isa

❦ $obj‐>isa('Logger');

❦ $obj必須繼承Logger類型

❦ 不支援集成、委派、擬倣等關係

81

Page 262: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

UNIVERSAL::DOES

82

Page 263: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

UNIVERSAL::DOES

❦ $obj‐>DOES('Logger');

82

Page 264: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

UNIVERSAL::DOES

❦ $obj‐>DOES('Logger');

❦ $obj必須實作Logger角色

82

Page 265: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

UNIVERSAL::DOES

❦ $obj‐>DOES('Logger');

❦ $obj必須實作Logger角色

❦ 類型可自行定義DOES方法

82

Page 266: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

Hash::Util::FieldHash

83

Page 267: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

Hash::Util::FieldHash

❦ 用物件當雜湊鍵:$hash{$obj}

83

Page 268: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

Hash::Util::FieldHash

❦ 用物件當雜湊鍵:$hash{$obj}

❦ 物件消滅時自動刪除雜湊鍵

83

Page 269: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

Hash::Util::FieldHash

❦ 用物件當雜湊鍵:$hash{$obj}

❦ 物件消滅時自動刪除雜湊鍵

❦ 大幅提昇Class::InsideOut等物件管理模組的執行效能

83

Page 270: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

更多核心模組

84

Page 271: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

更多核心模組

❦ encoding::warnings

84

Page 272: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

更多核心模組

❦ encoding::warnings

❦ Math::BigInt::FastCalc

84

Page 273: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

更多核心模組

❦ encoding::warnings

❦ Math::BigInt::FastCalc

❦ Time::Piece

84

Page 274: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

更多核心模組

❦ encoding::warnings

❦ Math::BigInt::FastCalc

❦ Time::Piece

❦ Win32API::File

84

Page 275: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

更多核心模組

❦ encoding::warnings

❦ Math::BigInt::FastCalc

❦ Time::Piece

❦ Win32API::File

❦ CPANPLUS!

84

Page 276: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

CPANPLUS.pm

85

Page 277: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

CPANPLUS.pm

❦ CPAN.pm的接班者

85

Page 278: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

CPANPLUS.pm

❦ CPAN.pm的接班者

❦ 便捷可靠的API和互動界面

85

Page 279: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

CPANPLUS.pm

❦ CPAN.pm的接班者

❦ 便捷可靠的API和互動界面

❦ 五年過去終於成為核心模組

85

Page 280: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

CPANPLUS.pm

❦ CPAN.pm的接班者

❦ 便捷可靠的API和互動界面

❦ 五年過去終於成為核心模組

❦ 帶了一大串模組進來!

85

Page 281: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

CPANPLUS.pm

❦ CPAN.pm的接班者

❦ 便捷可靠的API和互動界面

❦ 五年過去終於成為核心模組

❦ 帶了一大串模組進來!

Archive::ExtractArchive::Tar

Compress::ZlibDigest::SHA

ExtUtils::CBuilderExtUtils::XSBuilder

File::FetchIO::ZlibIPC::Cmd

Locale::Maketext::SimpleLog::Message

Module::BuildModule::CoreListModule::Load

Module::Load::ConditionalModule::LoadedModule::PluggableObject::Accessor

Package::ConstantsParams::Check

Term::UI...族繁不及備載

85

Page 282: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

效能提昇

86

Page 283: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

效能提昇

❦ 加速萬國碼字串處理

86

Page 284: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

效能提昇

❦ 加速萬國碼字串處理

❦ 減少常數函式記憶體用量

86

Page 285: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

效能提昇

❦ 加速萬國碼字串處理

❦ 減少常數函式記憶體用量

❦ 更有效率的執行緒管理

86

Page 286: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

還有還有...

87

Page 287: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

還有還有...

❦ 更詳盡的污染檢查(taintcheck)

87

Page 288: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

還有還有...

❦ 更詳盡的污染檢查(taintcheck)

❦ 穩定的.pmc支援(Module::Compile)

87

Page 289: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

還有還有...

❦ 更詳盡的污染檢查(taintcheck)

❦ 穩定的.pmc支援(Module::Compile)

❦ 新的核心文件(perlunitut等)

87

Page 290: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

還有還有...

❦ 更詳盡的污染檢查(taintcheck)

❦ 穩定的.pmc支援(Module::Compile)

❦ 新的核心文件(perlunitut等)

❦ ...其他請自行參閱perldelta☺

87

Page 291: Perl 5 - Pugs · 2008-10-05 · 28 The Great Regex Engine Rewrite ... The Great Regex Engine Rewrite − The regular expression engine has been re-engineered and many new features

fin.88