Create a symbolic link on windows:

Navigate to the directory in command prompt running in administrator mode, run this command:

mklink /D htdocs E:\htdocs

Move large amount of files (.txt files in example below) to a new folder (ubuntu):

find originalFolderName -name '*.txt' -exec mv {} targetFolderName/ \;

 

Bash Script for batch OCR using Tesseract where the output file starts with the name "hindi-xxx.png" and outputs each of the file with "text-" prefixed to it:

_________

#!/bin/sh

for i in hindi-*.png; do tesseract "$i" "text-$i" -l hin; done;

__________

 

Combile multiple text files with file name as the header /section on top. (Ubuntu)
tail -n +1 *.txt > combined.txt

 

Randomize the line numbers in a text file:
python -c "import random, sys; lines = open(sys.argv[1]).readlines(); random.shuffle(lines); print ''.join(lines)," D:\folder\test.txt > D:\Folder2\test_rand.txt


________
Count Unique lines in a large file (works on Ubuntu):
sort input.txt | uniq -c > output.txt


Sorts file with highest frequency lines at the top:
sort input.txt | uniq -c | sort -bgr > output.txt
_______________
Compare text Files:
Find line numbers in second file that are not present in the first file.
i.e. compare first and second file and give list of the words from second file that is not present in the first file.
Basically looks for all lines in second-file.txt which don't match any line in first-file.txt. Might be slow if the files are large.



grep -Fxv -f first-file.txt second-file.txt

_________

Notepad++

Referential RegxReplace

Add a tab after a find string (e.g. after the first group of numbers at each line)

Find: (^\d+)

Replace: \1\t

___

Alll Devanagari (Hindi/Konkani) characters that can be found via a regular expressions :

ँ|ं|ः|अ |आ|इ|ई|उ|ऊ|ए|ऐ|ओ|औ|क्ष|क|ख|ग|घ|च|छ|ज्ञ|ज|झ|ञ|ट|ठ|ड|ढ|ण|त|थ|द|ध|न|प|फ|ब|भ|म|य|र|ल|ळ|व|श|ष|स|ह|़|ा|ि|ी|ु|ू|ृ|े|ै|ॉ|ो|ौ|्|ॐ

____

 Find Invalid Words that have two consquent invalid diacritics in Hindi (for Notepad++):

[ा|ि|ी|ु|ू|ृ|े|ै|ॉ|ो|ौ|्][ा|ि|ी|ु|ू|ृ|े|ै|ॉ|ो|ौ|्]

 

Devanagari Character Groups

Independent Vowels:
अ|आ|इ|ई|उ|ऊ|ए|ऐ|ओ|औ|ॐ

Independent Consonants:
क्ष|क|ख|ग|घ|ङ|च|छ|ज्ञ|ज|झ|ञ|ट|ठ|ड|ढ|ण|त|थ|द|ध|न|प|फ|ब|भ|म|य|र|ल|ळ|व|श|ष|स|ह

Vowel Diacritics
़|ा|ि|ी|ु|ू|ृ|े|ै|ॉ|ो|ौ

Other Diacritics
ँ|ं|ः|़

Halanta:

Special Character:

 

Find non-ASCII values (which is roughly unicode values:

[^\x00-\x7F]+

 

Replace all the Arabic digits with that of Hindi/Devanagari digits in Notepad++
Find String:    (1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(0)
Replace with: (?1१)(?2२)(?3३)(?4४)(?5५)(?6६)(?7७)(?8८)(?9९)(?10०)