Georgi Gerganov
92139b90af
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
* tests : add test-tokenizer-0.sh
* unicode : add all unicode number ranges
* starcoder : fix pre-tokenizer
* tests : add test that fails with DeepSeek tokenizers
* falcon : fix regex
* unicode : regenerate unicode tables
* refact : add tokenizer model
* lint : fix
* tests : disable failing tests
ggml-ci
* refact : add tests files
ggml-ci
* convert : print -> logging
ggml-ci
* lint : fix
* unicode : digit -> number
* phi-3 : update
2024-05-04 08:32:32 +03:00
..
2023-11-27 21:25:42 +02:00
2023-11-02 08:50:16 +02:00
2024-04-29 16:58:41 +03:00
2024-01-26 14:18:00 +02:00
2024-04-18 15:18:48 +02:00
2024-05-03 22:36:41 +03:00
2023-08-27 15:24:58 +03:00
2024-04-09 09:23:19 +03:00
2023-11-27 21:25:42 +02:00
2024-05-04 08:32:32 +03:00
2024-02-18 16:21:52 -05:00
2024-01-18 20:45:39 +02:00
2024-01-09 19:21:13 +02:00
2024-04-13 11:33:52 +02:00
2024-03-23 01:24:36 +01:00
2024-01-18 20:45:39 +02:00
2024-04-11 16:22:47 +03:00
2024-01-31 08:08:07 +05:30
2024-03-26 01:16:01 +01:00
2024-03-26 01:16:01 +01:00
2023-08-29 10:50:30 +03:00
2023-08-29 10:50:30 +03:00
2023-08-29 10:50:30 +03:00
2024-05-03 22:36:41 +03:00
2024-03-26 01:16:01 +01:00
2024-04-09 09:23:19 +03:00
2024-04-09 20:29:06 +03:00
2024-04-09 09:23:19 +03:00
2024-05-03 22:36:41 +03:00
2024-04-21 18:48:53 +01:00