B ^-@sddlZddlmmZddlZddlmZddl Z ddl m Z ddl m Z ddlmZddlmZddlmZddlmZdd lmZdd lmZdd lmZmZmZmZmZddl Z e j!j"Gd d d Z#dS)N)Path) Sst2Processor)get_from_cache) BertTokenizer)DistilBertTokenizer) CTRLTokenizer) GPT2Tokenizer)RobertaTokenizer)OpenAIGPTTokenizer)PyBertTokenizerPyCtrlTokenizerPyGpt2TokenizerPyRobertaTokenizerPyOpenAiGptTokenizerc@sDeZdZddZddZddZddZd d Zd d Zd dZ dS)TestTokenizationSST2cCs.t|_|jtjd|_tt|_ dS)NZ SST2_PATH) r processorget_train_examplesosenvironexamplesrtempfilemkdtemptest_dir)selfrGE:\Coding\backup-rust\rust-transformers\tests\test_tokenization_sst2.py setup_classsz TestTokenizationSST2.setup_classc Cstjdd|jd|_tt|jjdd|_g}x,|jD]"}| |jj |j dddddq:W|jj dd|jDdd d d }xtt ||D]d\}}|j}|d }||k}|dkrd d lm} d dlm} | | ddddd|sTtd|fd||fdtkst|r t|ndt|t|d} dd| i} tt| d}}}|j}|d}||k}|dkrd d lm} d dlm} | | ddddd|std|fd||fdtkst|rt|ndt|t|d} dd| i} tt| d}}}|j}|d}||k}|dkrvd d lm} d dlm} | | ddddd|std|fd||fdtkst|rt|ndt|t|d} dd| i} tt| d}}}qWdS)Nzbert-base-uncasedT) do_lower_case cache_dir vocab_file)add_special_tokensreturn_overflowing_tokensreturn_special_tokens_mask max_lengthcSsg|] }|jqSr)text_a).0examplerrr 5sz?TestTokenizationSST2.test_tokenization_bert.. longest_firstr)max_lentruncation_strategystride input_ids)PytestAssertRewriteWarning) warn_explicitz5asserting the value None, please use "assert is None"zGE:\Coding\backup-rust\rust-transformers\tests\test_tokenization_sst2.py<)categoryfilenamelineno)==)z1%(py2)s {%(py2)s = %(py0)s.token_ids } == %(py5)srust)py0py2py5zassert %(py7)spy7token_type_ids=)z3%(py2)s {%(py2)s = %(py0)s.segment_ids } == %(py5)sspecial_tokens_mask>)z;%(py2)s {%(py2)s = %(py0)s.special_tokens_mask } == %(py5)s)rfrom_pretrainedrbase_tokenizerr rpretrained_vocab_files_maprust_tokenizerrappend encode_plusr% encode_listzip token_ids_pytest.warning_typesr.warningsr/ @pytest_ar_call_reprcompare @py_builtinslocals_should_repr_global_name _safereprAssertionError_format_explanation segment_idsr<) routput_baseliner' output_rustr5baseline @py_assert1 @py_assert4 @py_assert3r.r/ @py_format6 @py_format8rrrtest_tokenization_bert&sh   R   R   R z+TestTokenizationSST2.test_tokenization_bertc Cstjdd|jd|_tt|jjdd|_g}x,|jD]"}| |jj |j dddddq:W|jj dd|jDdd d d }xtt ||D]d\}}|j}|d }||k}|dkrd d lm} d dlm} | | ddddd|sTtd|fd||fdtkst|r t|ndt|t|d} dd| i} tt| d}}}|j}|d}||k}|dkrd d lm} d dlm} | | ddddd|std|fd||fdtkst|rt|ndt|t|d} dd| i} tt| d}}}|j}|d}||k}|dkrvd d lm} d dlm} | | ddddd|std|fd||fdtkst|rt|ndt|t|d} dd| i} tt| d}}}qWdS)Nzdistilbert-base-uncasedT)rrrr )r!r"r#r$cSsg|] }|jqSr)r%)r&r'rrrr(OszETestTokenizationSST2.test_tokenization_distilbert..r)r)r*r+r,r-)r.)r/z5asserting the value None, please use "assert is None"zGE:\Coding\backup-rust\rust-transformers\tests\test_tokenization_sst2.pyV)r1r2r3)r4)z1%(py2)s {%(py2)s = %(py0)s.token_ids } == %(py5)sr5)r6r7r8zassert %(py7)sr9r:W)z3%(py2)s {%(py2)s = %(py0)s.segment_ids } == %(py5)sr<X)z;%(py2)s {%(py2)s = %(py0)s.special_tokens_mask } == %(py5)s)rr>rr?r rr@rArrBrCr%rDrErFrGr.rHr/rIrJrKrLrMrNrOrPrQr<) rrRr'rSr5rTrUrVrWr.r/rXrYrrrtest_tokenization_distilbert@sh   R   R   R z1TestTokenizationSST2.test_tokenization_distilbertc Cstjdd|jd|_tt|jjddt|jjdd|_g}x,|jD]"}| |jj |j dddddqLW|jj dd |jDdd d d }xtt ||D]d\}}|j}|d }||k}|dkrd dlm} d dlm} | | ddddd|sftd|fd||fdtks(t|r2t|ndt|t|d} dd| i} tt| d}}}|j}|d}||k}|dkrd dlm} d dlm} | | ddddd|s0td|fd||fdtkst|rt|ndt|t|d} dd| i} tt| d}}}|j}|d}||k}|dkrd dlm} d dlm} | | ddddd|std|fd||fdtkst|rt|ndt|t|d} dd| i} tt| d}}}qWdS) NctrlT)rrr merges_filer )r!r"r#r$cSsg|] }|jqSr)r%)r&r'rrrr(ksz?TestTokenizationSST2.test_tokenization_ctrl..r)r)r*r+r,r-)r.)r/z5asserting the value None, please use "assert is None"zGE:\Coding\backup-rust\rust-transformers\tests\test_tokenization_sst2.pyr)r1r2r3)r4)z1%(py2)s {%(py2)s = %(py0)s.token_ids } == %(py5)sr5)r6r7r8zassert %(py7)sr9r:s)z3%(py2)s {%(py2)s = %(py0)s.segment_ids } == %(py5)sr<t)z;%(py2)s {%(py2)s = %(py0)s.special_tokens_mask } == %(py5)s)rr>rr?r rr@rArrBrCr%rDrErFrGr.rHr/rIrJrKrLrMrNrOrPrQr<) rrRr'rSr5rTrUrVrWr.r/rXrYrrrtest_tokenization_ctrlZsj   R   R   R z+TestTokenizationSST2.test_tokenization_ctrlc Cstjdd|jd|_tt|jjddt|jjdd|_g}x,|jD]"}| |jj |j dddddqLW|jj dd |jDdd d d }xtt ||D]d\}}|j}|d }||k}|dkrd dlm} d dlm} | | ddddd|sftd|fd||fdtks(t|r2t|ndt|t|d} dd| i} tt| d}}}|j}|d}||k}|dkrd dlm} d dlm} | | ddddd|s0td|fd||fdtkst|rt|ndt|t|d} dd| i} tt| d}}}|j}|d}||k}|dkrd dlm} d dlm} | | ddddd|std|fd||fdtkst|rt|ndt|t|d} dd| i} tt| d}}}qWdS) Ngpt2T)rrrr`r )r!r"r#r$cSsg|] }|jqSr)r%)r&r'rrrr(sz?TestTokenizationSST2.test_tokenization_gpt2..r)r)r*r+r,r-)r.)r/z5asserting the value None, please use "assert is None"zGE:\Coding\backup-rust\rust-transformers\tests\test_tokenization_sst2.py)r1r2r3)r4)z1%(py2)s {%(py2)s = %(py0)s.token_ids } == %(py5)sr5)r6r7r8zassert %(py7)sr9r:)z3%(py2)s {%(py2)s = %(py0)s.segment_ids } == %(py5)sr<)z;%(py2)s {%(py2)s = %(py0)s.special_tokens_mask } == %(py5)s)rr>rr?r rr@rArrBrCr%rDrErFrGr.rHr/rIrJrKrLrMrNrOrPrQr<) rrRr'rSr5rTrUrVrWr.r/rXrYrrrtest_tokenization_gpt2vsj   R   R   R z+TestTokenizationSST2.test_tokenization_gpt2c Cstjdd|jd|_tt|jjddt|jjdd|_g}x,|jD]"}| |jj |j dddddqLW|jj dd |jDdd d d }xtt ||D]d\}}|j}|d }||k}|dkrd dlm} d dlm} | | ddddd|sftd|fd||fdtks(t|r2t|ndt|t|d} dd| i} tt| d}}}|j}|d}||k}|dkrd dlm} d dlm} | | ddddd|s0td|fd||fdtkst|rt|ndt|t|d} dd| i} tt| d}}}|j}|d}||k}|dkrd dlm} d dlm} | | ddddd|std|fd||fdtkst|rt|ndt|t|d} dd| i} tt| d}}}qWdS) Nz roberta-baseT)rrrr`r )r!r"r#r$cSsg|] }|jqSr)r%)r&r'rrrr(szBTestTokenizationSST2.test_tokenization_roberta..r)r)r*r+r,r-)r.)r/z5asserting the value None, please use "assert is None"zGE:\Coding\backup-rust\rust-transformers\tests\test_tokenization_sst2.py)r1r2r3)r4)z1%(py2)s {%(py2)s = %(py0)s.token_ids } == %(py5)sr5)r6r7r8zassert %(py7)sr9r:)z3%(py2)s {%(py2)s = %(py0)s.segment_ids } == %(py5)sr<)z;%(py2)s {%(py2)s = %(py0)s.special_tokens_mask } == %(py5)s)r r>rr?rrr@rArrBrCr%rDrErFrGr.rHr/rIrJrKrLrMrNrOrPrQr<) rrRr'rSr5rTrUrVrWr.r/rXrYrrrtest_tokenization_robertasj   R   R   R z.TestTokenizationSST2.test_tokenization_robertac Cstjdd|jd|_tt|jjddt|jjdd|_g}x,|jD]"}| |jj |j dddddqLW|jj dd |jDdd d d }xtt ||D]d\}}|j}|d }||k}|dkrd dlm} d dlm} | | ddddd|sftd|fd||fdtks(t|r2t|ndt|t|d} dd| i} tt| d}}}|j}|d}||k}|dkrd dlm} d dlm} | | ddddd|s0td|fd||fdtkst|rt|ndt|t|d} dd| i} tt| d}}}|j}|d}||k}|dkrd dlm} d dlm} | | ddddd|std|fd||fdtkst|rt|ndt|t|d} dd| i} tt| d}}}qWdS) Nz openai-gptT)rrrr`r )r!r"r#r$cSsg|] }|jqSr)r%)r&r'rrrr(szETestTokenizationSST2.test_tokenization_openai_gpt..r)r)r*r+r,r-)r.)r/z5asserting the value None, please use "assert is None"zGE:\Coding\backup-rust\rust-transformers\tests\test_tokenization_sst2.py)r1r2r3)r4)z1%(py2)s {%(py2)s = %(py0)s.token_ids } == %(py5)sr5)r6r7r8zassert %(py7)sr9r:)z3%(py2)s {%(py2)s = %(py0)s.segment_ids } == %(py5)sr<)z;%(py2)s {%(py2)s = %(py0)s.special_tokens_mask } == %(py5)s)r r>rr?rrr@rArrBrCr%rDrErFrGr.rHr/rIrJrKrLrMrNrOrPrQr<) rrRr'rSr5rTrUrVrWr.r/rXrYrrrtest_tokenization_openai_gptsj   R   R   R z1TestTokenizationSST2.test_tokenization_openai_gptN) __name__ __module__ __qualname__rrZr^rdrirmrqrrrrrsr)$builtinsrK_pytest.assertion.rewrite assertionrewriterIrpathlibrpytestZ!transformers.data.processors.gluerZtransformers.file_utilsrZtransformers.tokenization_bertrZ$transformers.tokenization_distilbertrZtransformers.tokenization_ctrlrZtransformers.tokenization_gpt2rZ!transformers.tokenization_robertar Z transformers.tokenization_openair rust_transformersr r r rrrmarkslowrrrrr s