A lightweight command-line utility for encoding text files into 15-bit binary format and decoding them back using a custom dictionary mapping. Written in Python.
- 15-bit Encoding: Efficient binary encoding using fixed 15-bit codes
- Text Preprocessing: Automatic lowercase conversion and special symbol removal
- Bidirectional: Encode text to binary and decode back to text
- Dictionary-based: Customizable word-to-binary mappings
- CLI Interface: Simple command-line interface with comprehensive help
- UTF-8 Support: Supports Russian words
- Error Handling: Clear warnings for missing dictionary entries
- Clone this repository:
git clone <repository-url>
cd rbcoder-
Ensure you have Python 3.6+ installed
-
No additional dependencies required - uses only Python standard library
python rbcoder.py encode -i input.txt -o encoded.binpython rbcoder.py decode -i encoded.bin -o decoded.txtpython rbcoder.py encode -i INPUT -o OUTPUT-i, --input: Input text file to encode (required)-o, --output: Output binary file (required)
python rbcoder.py decode -i INPUT -o OUTPUT-i, --input: Input binary file to decode (required)-o, --output: Output text file (required)
# General help
python rbcoder.py -h
# Encode-specific help
python rbcoder.py encode -h
# Decode-specific help
python rbcoder.py decode -hDuring encoding, RBCoder automatically preprocesses text:
- Lowercasing: All text converted to lowercase
- Special Symbol Removal: Punctuation and special characters removed
- Whitespace Normalization: Multiple spaces collapsed
Example Transformation:
Input: "Hello, World! This is a TEST - with symbols."
Output: "hello world this is a test with symbols"
# Create sample text
echo "Привет, мир! Как дела?" > test.txt
# Encode to binary
python rbcoder.py encode -i test.txt -o test.bin
# Decode back to text
python rbcoder.py decode -i test.bin -o decoded.txt # Text with mixed case and punctuation
echo "Сколько раз ты играл в LoL#1.0?" > complex.txt
python rbcoder.py encode -i complex.txt -o complex.bin Output after preprocessing:
сколько раз ты играл в
$ python rbcoder.py encode -i input.txt -o output.bin
Warning: The following words are not in the dictionary and will be skipped:
- unknownword
- anothermissing
Encoded 45 words to output.bin
- Format: Plain text, UTF-8 encoding
- Content: Can contain mixed case, punctuation, special symbols
- Word Separation: Whitespace (spaces, tabs, newlines)
- Format: Raw binary data
- Encoding: 15 bits per word, padded to complete bytes
- Size: Efficient storage using fixed-bit encoding
RBCoder provides clear feedback for:
- Missing input/output files
- Words not found in dictionary (with warnings)
- File permission issues
- Invalid binary codes during decoding
- Encoding: Linear time complexity O(n) based on word count
- Decoding: Linear time complexity O(n) based on binary chunks
- Storage: 15 bits per word, optimal for dictionary-based compression
- Text Compression: Efficient storage of text
- Language Processing: Russian text encoding
- Data Obfuscation: Basic text obfuscation through binary conversion
- Fork the repository
- Create a feature branch (
git checkout -b feature/improvement) - Commit your changes (
git commit -am 'Add new feature') - Push to the branch (
git push origin feature/improvement) - Create a Pull Request