Conf. Dr. Ing . Costin-Anton Boiangiu < [email protected] >

download Conf. Dr.  Ing . Costin-Anton  Boiangiu < Costin.Boiangiu@CS.PUB.RO >

If you can't read please download the document

description

UNIVERSITY POLITEHNICA of BUCHAREST DEPARTMENT OF COMPUTER SCIENCE. Transmisia datelor multimedia in retele de calculatoare : Compresia datelor Codificarea Huffman. Conf. Dr. Ing . Costin-Anton Boiangiu < [email protected] >. Codarea Shannon- Fano. Are avantajul simplitatii - PowerPoint PPT Presentation

Transcript of Conf. Dr. Ing . Costin-Anton Boiangiu < [email protected] >

Transmisia datelor multimedia in retele de calculatoare

Transmisia datelor multimedia in retele de calculatoare: Compresia datelorCodificarea HuffmanConf. Dr. Ing. Costin-Anton Boiangiu

UNIVERSITY POLITEHNICA of BUCHARESTDEPARTMENT OF COMPUTER SCIENCECodarea Shannon-FanoAre avantajul simplitatiiSuboptimalSe bazeaza pe teoria ShannonCodul este construit astfel: mesajele sursei s(i) si probabilitatile asociate p(i) sunt listate in ordine descrescatoare a probabilitatilorLista este divizata pentru a forma doua grupuri de probabilitati egaleFiecare mesaj din primul grup receptioneaza (primeste) 0 ca prim simbol al cuvantului de cod, iar mesajele din lista a doua vor avea cuvintele de cod incepand cu 1Fiecare din sub-listele obtinute sunt divizate dupa acelasi criteriu si se aloca (asigneaza) simboluri suplimentareProcesul se continua pana cand se obtin sub-liste cu un singur mesaj

2Codarea Shannon-FanoLungimea cuvantului de cod este log(p(x)) daca este posibil sa se divizeze in subgrupuri de probabilitate egalaCand acest lucru nu este posibil, unele din cuvintele de cod vor avea lungimi de (-log(p(x)) +1)Codul Shannon-fano furnizeaza o lungime medie a cuvintelor de cod ce satisface relatia :3

4Codarea Shannon-Fanoabcdef986542ab98cdef6542010011115Codarea Shannon-Fanoabcdef986542ab98cdef654201010011116Codarea Shannon-Fanoabcdef986542ab98cdef65420101010011117Codarea Shannon-Fanoabcdef986542ab98001cdef6542101010011118Codarea Shannon-Fanoabcdef986542ab98cdef65420101010100101011119Codarea Shannon-Fanoabcdef986542ab98ef4201011010010101111cd6500110Codarea Shannon-Fanoabcdef986542ab98ef420101101001011001111cd6500111Codarea Shannon-Fanoabcdef986542ab98010101001011001111cd650011ef420112Codarea Shannon-Fanoabcdef986542ab9801010100101100111110cd650011ef4201Codari Prefix OptimaleCodari optimaleSimbolurile care apar mai frecvent vor avea coduri mai scurteUltimele 2 cele mai putin frecvente simboluri vor avea coduri de lungimi egaleDemonstratie:Vrem sa demonstram faptul ca in acest caz codul este clar suboptimalPresupunem opusulFie X, Y cele mai putin frecvente simboluri si|cod(X)| = k, |cod(Y)| = k+1AtunciDatorita decodificarii unice (UD), cod(X) nu poate fi prefix pt cod(Y)Deasemenea toate celelalte coduri sunt mai scurte Eliminand ultim bit al lui|cod(Y)| ar genera un nou cod scurt unic decodificabil, ceea ce contrazice presupunerea initiala de optimalitate13Codificare HuffmanAlgoritmul Huffman ia ca intrare o lista de ponderi ne-negative {w(1), ... ,w(n) } si construieste un arbore binar complet ale carui frunze sunt numerotate cu ponderiun arbore binar este complet daca fiecare nod are zero sau 2 ramificatiiPonderile reprezinta probabilitatile asociate simbolurilor surseiInitial arborele are numai doua noduri, cele corespunzatoare ponderilor celor mai miciLa fiecare pas in algoritm, cele mai mici ponderi definesc un nou nod cu ponderea w(i)+w(j) si a carui radacina (root) are doi sub-arbori, reprezentati de w(i) si w(j)Ponderile w(i) si w(j) sunt indepartate din lista si locul lor este preluat de w(i)+w(j)Procesul continua pana cand se obtine o lista cu o singura valoare. 14Codificare HuffmanAlgoritmul Huffman (1952) constituie un algoritm optimal, n sensul c nici un alt algoritm nu asigur o mai mic lungime medie a cuvintelorSunt situaii n care i ali algoritmi pot da o lungime medie egal cu cea dat de algoritmul Huffman, dar niciodat mai mic

Exemplu:15LiteraCodProbabilitateSetProb Seta0.2b0.4c0.2d0.1e0.1Codificare Huffman16Initializare: Creeaza o multime din fiecare literaLetterCodeProbabilitySetSet Proba0.2b0.4c0.2d0.1e0.1LiteraCodProbabilitateSetProb Seta0.2a0.2b0.4b0.4c0.2c0.2d0.1d0.1e0.1e0.1Codificare Huffman17Sorteaza multimile dupa probabilitateLetterCodeProbabilitySetSet Proba0.2a0.2b0.4b0.4c0.2c0.2d0.1d0.1e0.1e0.1LiteraCodProbabilitateSetProb Seta0.2d0.1b0.4e0.1c0.2a0.2d0.1c0.2e0.1b0.4Codificare Huffman18Insereaza prefixul 1 in codurile literelor din prima multimeLetterCodeProbabilitySetSet Proba0.2d0.1b0.4e0.1c0.2a0.2d0.1c0.2e0.1b0.4LiteraCodProbabilitateSetProb Seta0.2d0.1b0.4e0.1c0.2a0.2d10.1c0.2e0.1b0.4Codificare Huffman19Insereaza prefixul 0 in codurile literelor din a doua multimeLetterCodeProbabilitySetSet Proba0.2d0.1b0.4e0.1c0.2a0.2d10.1c0.2e0.1b0.4LiteraCodProbabilitateSetProb Seta0.2d0.1b0.4e0.1c0.2a0.2d10.1c0.2e00.1b0.4Codificare Huffman20Uneste primele 2 multimiLetterCodeProbabilitySetSet Proba0.2d0.1b0.4e0.1c0.2a0.2d10.1c0.2e00.1b0.4LiteraCodProbabilitateSetProb Seta0.2de0.2b0.4a0.2c0.2c0.2d10.1d0.4e00.1 Codificare Huffman21Sorteaza crescator multimile dupa probabilitateLiteraCodProbabilitateSetProb Seta0.2de0.2b0.4a0.2c0.2c0.2d10.1b0.4e00.1Codificare Huffman22LetterCodeProbabilitySetSet Proba0.2de0.2b0.4a0.2c0.2c0.2d10.1b0.4e00.1LiteraCodProbabilitateSetProb Seta0.2de0.2b0.4a0.2c0.2c0.2d110.1b0.4e100.1Insereaza prefixul 1 in codurile literelor din prima multime Codificare Huffman23LetterCodeProbabilitySetSet Proba0.2de0.2b0.4a0.2c0.2c0.2d110.1b0.4e100.1LiteraCodProbabilitateSetProb Seta00.2de0.2b0.4a0.2c0.2c0.2d110.1b0.4e100.1Insereaza prefixul 0 in codurile literelor din a doua multimeCodificare Huffman24Uneste primele 2 multimiLetterCodeProbabilitySetSet Proba00.2de0.2b0.4a0.2c0.2c0.2d110.1b0.4e100.1LiteraCodProbabilitateSetProb Seta00.2dea0.4b0.4c0.2c0.2b0.4d110.1e100.1Codificare Huffman25LetterCodeProbabilitySetSet Proba00.2dea0.4b0.4c0.2c0.2b0.4d110.1e100.1LiteraCodProbabilitateSetProb Seta00.2c0.2b0.4dea0.4c0.2b0.4d110.1e100.1Sorteaza crescator multimile dupa probabilitateCodificare Huffman26LetterCodeProbabilitySetSet Proba00.2c0.2b0.4dea0.4c0.2b0.4d110.1e100.1LiteraCodProbabilitateSetProb Seta00.2c0.2b0.4dea0.4c10.2b0.4d110.1e100.1Insereaza prefixul 1 in codurile literelor din prima multimeCodificare Huffman27LetterCodeProbabilitySetSet Proba00.2c0.2b0.4dea0.4c10.2b0.4d110.1e100.1LiteraCodProbabilitateSetProb Seta000.2c0.2b0.4dea0.4c10.2b0.4d0110.1e0010.1Insereaza prefixul 0 in codurile literelor din a doua multimeCodificare Huffman28LetterCodeProbabilitySetSet Proba000.2c0.2b0.4dea0.4c10.2b0.4d0110.1e0100.1Uneste primele 2 multimiLiteraCodProbabilitateSetProb Seta000.2cdea0.6b0.4b0.4c10.2d0110.1e0100.1Codificare Huffman29LetterCodeProbabilitySetSet Proba000.2cdea0.6b0.4b0.4c10.2d0110.1e0100.1Sorteaza crescator multimile dupa probabilitateLiteraCodProbabilitateSetProb Seta000.2b0.4b0.4cdea0.6c10.2d0110.1e0100.1Codificare Huffman30LetterCodeProbabilitySetSet Proba000.2b0.4b0.4cdea0.6c10.2d0110.1e0100.1LiteraCodProbabilitateSetProb Seta000.2b0.4b10.4cdea0.6c10.2d0110.1e0100.1Insereaza prefixul 1 in codurile literelor din prima multimeCodificare Huffman31LetterCodeProbabilitySetSet Proba000.2b0.4b10.4cdea0.6c10.2d0110.1e0100.1LiteraCodProbabilitateSetProb Seta0000.2b0.4b10.4cdea0.6c010.2d00110.1e00100.1Insereaza prefixul 0 in codurile literelor din a doua multimeCodificare Huffman32LetterCodeProbabilitySetSet Proba0000.2b0.4b10.4cdea0.6c010.2d00110.1e00100.1Uneste primele 2 multimiLiteraCodProbabilitateSetProb Seta0000.2abcde1.0b10.4c010.2d00110.1e00100.1SfarsitCodificare HuffmanStatistici exemplu:

Lungime medie codl = 0.4x1 + 0.2x2 + 0.2x3 + 0.1x4 + 0.1x4 = 2.2 bits/symbolEntropieH = s=a..eP(s) log2P(s) = 2.122 bits/symbolRedundantal - H = 0.078 bits/symbol

Peformantele sunt identice cu cele obtinute folosind codarea Shannon-FanoDiferenta consta in faptul ca prin codarea Huffman se garanteaza obtinerea unui cod optimal

33Arbore Huffman34abe01cd000111LiteraCoda000b1c01d0011e00100.10.10.20.20.4Arbore Huffman35abecdLetterCodeabcde0.10.10.20.20.4010.2LiteraCodabcd1e0Arbore Huffman36abecd01LetterCodeabcd1e00.10.10.20.20.40.2010.4LiteraCoda0bcd11e10Arbore Huffman37abecd00110.10.10.20.20.4LetterCodea0bcd11e100.20.4010.6LiteraCoda00bc1d011e010Arbore Huffman38abecd0001110.10.10.20.20.4LetterCodea00bc1d011e010LiteraCoda000b1c01d0011e00100.40.20.6011.0Arbore Huffman alternativ (1)39bedLetterCodeabcde0.10.10.4010.2LiteraCodabcd1e0ac0.20.2Arbore Huffman alternativ (1)40bed0.10.10.4010.2ac0.20.2010.4LetterCodeabcd1e0LiteraCoda0bc1d1e0Arbore Huffman alternativ (1)41bed0.10.10.4010.2ac0.20.2010.4LetterCodea0bc1d1e00.601LiteraCoda00bc01d11e10Arbore Huffman alternativ (1)42bed0.10.10.4010.2ac0.20.2010.40.6LetterCodea00bc01d11e10LiteraCoda000b1c001d011e0100101Lungime medie codl = 0.4x1 + (0.2 + 0.2 + 0.1 + 0.1)x3= 2.2 bits/symbolArbore Huffman alternativ (2)43bed0.10.10.4010.2ac0.20.2010.40.6LiteraCoda00b11c01d101e1000101Lungime medie codl = 0.4x2+ (0.2 + 0.2)x2 + (0.1 + 0.1)x3= 2.2 bits/symbolArbori Huffman MinimaliCodurile Huffman nu sunt uniceToate versiunile produc acceasi lungime mediePe care sa il folosim?Pe acela cu varianta minima in lungimile codurilorAdica cel cu arborele cu cea mai mica inaltimeDe ce?Va asigura cel mai mic grad de variabilitate in fluxul codificatCum se poate realiza?In timpul sortarii, solutioneaza egalitatile prin plasarea seturilor mici mai sus in arboreAlternativ, plaseaza seturile noi cat de jos posibil44Coduri Huffman ExtinseSe considera sursa:

A = {a, b, c}, P(a) = 0.8, P(b) = 0.02, P(c) = 0.18H = 0.816 bits/symbolCodul Huffman:a0b11c10l = 1.2 bits/symbolRedundanta = 0.384 bits/symbol (47%!)

Se poate si mai bine?45Coduri Huffman Extinse Incercam codificarea secventelor de 2 litere in loc de codificarea individuala a literelor46LiteraProbabilitateCodaa0.64000ab0.016010101ac0.144011ba0.0160101000bb0.000410100101bc0.00361010011ca0.1440100cb0.003610100100cc0.03241011l = 1.7228/2 = 0.8614Red. = 0.0045 bits/symbolCoduri Huffman Extinse Ideea se poate extinde mai departeSe considera toate secventele nm posibile (am facut 32)In teorie, considerand mai multe secvente putem imbunatati compresia

In realitate, cresterea exponentiala a alfabetului face acest lucru imposibilEx., pentru lungime 3 avem 2563 = 224 = 16M secvente posibileMajoritatea secventelor vor avea frecventa zero47Codificare Huffman DinamicaCompresia dinamica (sau adaptiva) Huffman foloseste un arbore de codare ce este actualizat de fiecare data cand un simbol este citit din fisierul de intrareArborele creste (se dezvolta, evolueaza) la fiecare simbol citit din fisierul de intrareModul de evolutie al arborelui este identic la compresie si la decompresieEficienta metodei se bazeaza pe o proprietate a arborilor Huffman, cunoscuta sub numele de proprietatea fiilor (sibling property)48Codificare Huffman DinamicaProprietate (sibling): Fie T un arbore huffman cu n frunze. Atunci nodurile lui T pot fi aranjate intr-o secventa (x0, x1, ..., x2n-2) astfel incat:Secventa ponderilor (weight(x0), weight(x1), ..., weight(x2n-2)) sa fie in ordine descrescatoarePentru orice i (0 i n-1), nodurile consecutive x2i+1 si x2i+2 sunt siblings (fii ai aceluiasi parinte)

Compresia si decompresia utilizeaza arborele dinamic Huffman pornind de la un caracter virtual, notat de exemplu prin VIRT, sau NTIAcesta este intotdeauna nod terminal (frunza)49Codificare Huffman DinamicaDe fiecare data cind un simbol este citit din fisierul de intrare se efectueaza urmatoarele operatii:

Se scrie in fisierul destinatie codul simbolului: Daca simbolul citit este nou, atunci se scrie codul caracterului virtual (NTI) (determinat din arborele de codare) urmat de codificarea in binar (ASCII) pe 9 biti a simbolului respectivIn caz contrar, se scrie cuvantul de cod al simbolului, determinat din arborele de codareNumai la primul simbol se scrie direct codul ASCII al acestuia pe 9 biti.50Codificare Huffman DinamicaSe introduce simbolul citit in arborele de codare: Daca simbolul curent citit este nou, atunci din nodul virtual vor pleca doi fii: unul pentru simbolul citit si unul pentru simbolul virtualNodul care a fost folosit ca nod virtual devine nod intermediarIn caz contrar, se incrementeaza frecventa de aparitie a acestuia, prin marirea ponderii nodului corespunzator simbolului citit51Codificare Huffman DinamicaSe ajusteaza arborele pentru indeplinirea proprieatatii sibling Trebuie considerate urmatoarele doua aspecte:la cresterea ponderii unui nod, frunza sau intermediar, trebuie modificate si ponderile ancestorilor saipentru a pastra ordinea descrescatoare a ponderilor se judeca astfel: fie nodul frunza n ce are ponderea cu unu mai mare decat valoarea anteriora, deci el este cel care s-a modificat la pasul curentPentru indeplinirea primei conditii a proprietatii sibling, nodul n este schimbat cu cel mai apropiat nod m (m < n) din lista, astfel incat weight(m) < weight(n)Apoi, aceeasi operatie este efectuata asupra parintelui lui n, pana cand se obtine nodul radacina sau este indeplinita primaconditia.

Vezi COMENTARII pentru mai multe detalii!52Procedura general de compresie pentru codarea Huffman dinamic are urmtorul algoritm: 1. iniializez arborele de codare cu un nod rdcin; 2. transmit primul simbol n clar (de exemplu codul ASCII al simbolului); 3. construim nc dou noduri (frunze) care pornesc din nodul rdcin, unul la stnga, care nu conine nici un simbol, cruia i atam ponderea nul (frunz goal) i unul la dreapta care conine simbolul aprut, ponderea acestuia devenind egal cu 1; 4. citim urmtoarea liter din mesaj; 5. dac litera este deja n arbore transmit codul ei din arbore. Codul literii din arbore se formeaz citind simbolurile de '0' respectiv '1' ataate ramurilor, pornind de la nodul rdcin pn la nodul n care se afl litera. Simbolurile de '0' respectiv '1' se ataeaz ramurilor astfel: toate ramurile din dreapta vor avea ataate simbolul '1', iar toate ramurile din stnga vor avea ataate simbolul '0'. Apoi se trece direct la pasul 7; 6. dac litera nu este n arbore atunci transmit codul nodului terminal care nu conine nici un simbol (frunz goal) urmat de codul n clar(ASCII) al literei. Codul nodului fr nici o liter se formeaz ca i n cazul nodului care conine o liter; 7. reactualizez arborele; dac mesajul s-a terminat, atunci codarea este terminat, iar n caz contrar relum procedeul ncepnd cu pasul 4.

52Decodificarea Huffman DinamicaLa decompresie, textul comprimat este verificat cu arborele de codare.Tehnica se numeste parsingLa inceputul decompresiei, nodul curent este initializat cu radacina, ca la algoritmul de compresie, si apoi arborele evolueaza simetricLa fiecare citire a unui simbol 0 se parcurge arborele in jos spre stanga; se parcurge arborele spre dreapta daca simbolul citit este 1Cand se ajunge la un nod terminal (frunza) simbolul asociat este scris in fisierul de iesire si se ajusteaza arborele, la fel ca in faza de compresieVezi COMENTARII pentru mai multe detalii!

53Procedura general de decompresie pentru codarea Huffman dinamic are urmtorul algoritm: 1. iniializez arborele de codare cu un nod rdcin; 2. citesc primul simbol transmis n clar (codul ASCII al simbolului); 3. transmit litera mai departe; 4. construim nc dou noduri care pornesc din nodul rdcin, unul la stnga, care nu conine nici un simbol, cruia i atam ponderea nul i unul la dreapta care conine simbolul aprut, ponderea acestuia devenind egal cu 1; 5. citesc codul transmis (bit cu bit) pn cnd codul se afl n arbore, adic pn cnd am ajuns la un nod frunz; nod frunz = nod care se afl n arbore pe ultimul nivel de ierarhie; 6. dac nodul ataat codului citit este un nod care nu conine nici un simbol citesc litera transmis n clar i o transmit mai departe, apoi trecem direct la pasul 8; 7. dac nodul ataat codului citit conine un simbol atunci decodez codul ataat lui prin litera pe care o conine i transmit litera mai departe (la utilizator); 8. reactualizm arborele; dac mesajul s-a terminat, atunci decodarea este terminat, iar n caz contrar relum procedeul ncepnd cu pasul 5.

53Exemplu54

Input: aardvarkOutput: 051NTICoduri implicite putin mai eficiente sunt posibile(combinatie 4-/5-bit)Exemplu55

Input: aardvarkOutput: 00000049NTI151150aExemplu56

Input: aardvarkOutput: 000001049NTI251250aExemplu57

Input: aardvarkOutput: 000001010001049NTI351250a147148rExemplu58

Input: aardvarkOutput: 000001010001000001149451250a247148r0NTI45146d1Exemplu59

Input: aardvarkOutput: 000001010001000001100049451250a247148r146d2450NTI43144v1Exemplu60

Input: aardvarkOutput: 000001010001000001100049451250a247148r146d2450NTI43144v1Exemplu61

Input: aardvarkOutput: 000001010001000001100049451250a347148r146d2450NTI43144v1Exemplu62

Input: aardvarkOutput: 000001010001000001100049451250a347148r146d2450NTI43144v1Exemplu63

Input: aardvarkOutput: 00000101000100000110001010149551250a347148r146d2450NTI43144v1Exemplu64

Input: aardvarkOutput: 000001010001000001100010101049651350a347148r146d2450NTI43144v1Codificare Huffman Adaptiva65

Input: aardvarkOutput: 00000101000100000110001010101049751350a447248r146d2450NTI43144v1Exemplu66

Input: aardvark49751350a447248r146d245144v2430NTI411421kOutput: 000001010001000001100010101010 1100Exemplu67

Input: aardvark49751350a447248r146d245144v2430NTI411421kOutput: 000001010001000001100010101010 1100Exemplu68

Input: aardvarkOutput: 000001010001000001100010101010 11000101049851350a547248r146d345144v2430NTI411421k

Exemplu Decodificare69

Output: 051NTIInput: 000001010001000001100010101010 110001010Exemplu Decodificare70

Output: aInput: 000001010001000001100010101010 110001010

049NTI151150aExemplu Decodificare71

Output: aaInput: -----1010001000001100010101010 110001010

049NTI251250aExemplu Decodificare72

Output: aaInput: ------010001000001100010101010 110001010

049NTI251250aExemplu Decodificare73

Output: aarInput: -------10001000001100010101010 110001010

049NTI351250a147148rExemplu Decodificare74

Output: aarInput: ------------000001100010101010 110001010

049NTI351250a147148rExemplu Decodificare75

Output: aardInput: --------------0001100010101010 110001010

49451250a247148r0NTI45146d1Exemplu Decodificare76

Output: aardInput: -------------------00010101010 110001010

49451250a247148r0NTI45146d1Exemplu Decodificare77

Output: aardvInput: ----------------------10101010 110001010

49451250a247148r146d2450NTI43144v1Exemplu Decodificare78

Output: aardvInput: ---------------------------010 11000101049551250a347148r146d2450NTI43144v1

Exemplu Decodificare79

Output: aardvaInput: ---------------------------010 11000101049651350a347148r146d2450NTI43144v1

Decodificare Huffman Adaptiva80

Output: aardvarInput: ---------------------------10 11000101049751350a447248r146d2450NTI43144v1

Exemplu Decodificare81

Output: aardvarInput: ----------------------------- 11000101049751350a447248r146d2450NTI43144v1

Exemplu Decodificare82Output: aardvarkInput: ------------------------ ----01010

49751350a447248r146d245144v2430NTI411421kExemplu Decodificare83

49851350a547248r146d345144v2430NTI411421kSenzitivitatea la erori ale canalului de comunicatieModelul de canal ales, fara erori, nu este un model foarte realist pentru un sistem de comunicatieDe aceea, un criteriu de comparatie al diferitelor coduri este cel al sensibilitatii la eroriSe pot considera doua tipuri de erori: erori de faza, in care un simbol este pierdut sau introdus in pluserori de amplitudine, in care un simbol al codului este schimbatGradul in care aceste tipuri de erori degradeaza transmisia este un criteriu important in alegerea metodei de compresie

84Senzitivitatea la erori ale canalului de comunicatieIn cazul codurilor statice bloc-bloc o eroare de amplitudine duce la o decodare incorecta iar o eroare de faza duce la desincronizare si ca efect - la decodari incorecteTotusi, codurile Huffman sunt autocorectoareLa aparitia unei erori, se va decoda gresit cuvantul de cod unde a aparut eroarea si un numar de cuvinte dupa cel eronat, insa, la o distanta mare de eroare, decodarea nu mai introduce eroriSolutia utilizata de codurile Huffman consta in adaugarea unei secvente de sincronizare la fiecare cuvant de codDe exemplu, o succesiune de 1 sau doi de 11, adaugate la sfarsitul fiecarui cuvant de cod permite stoparea propagarii erorii, deci obtinerea unor coduri auto-sincronizatoare, asa cum se intampla in cazul codurilor Fibonacci85Senzitivitatea la erori ale canalului de comunicatieCodurile adaptive sunt afectate mai mult de erorile de transmisie decat codurile statice, intrucat la aparitia unei erori se schimba radical tabele locala de codare

Una din solutiile utilizate este impachetarea codurilor de compresie cu coduri detectoare si/sau corectoare de erori. 86Compresia Huffman a ImaginilorExemple Imagini: 256x256 pixels, 8 bits/pixel, 65,536 bytesCodificare Huffman a valorilor pixelilor87

SenaSensinEarthOmahaImagineBits/pixelMarime(bytes)Rata CompresieSena7.0157,5041.14Sensin7.4961,4301.07Earth4.9440,5341.62Omaha7.1258,3741.12Compresia Huffman a ImaginilorObservatiiAlgoritmul Huffman standard produce rezultate modeste, cu exceptie pentru cazul EarthMulti pixeli negri produc o distributie convenabilaNu luam in considerare corelatii evidente ale valorilor pixelilorCodificare Huffman a diferentelor pixelilor88ImagineBits/pixelMarime (bytes)Rata CompresieSena4.0232,9681.99Sensin4.7038,5411.70Earth4.1333,8801.93Omaha6.4252,6431.24Huffman (2 iteratii) vs Huffman Adaptiv89ImagineBits/pixelMarime (bytes)Rata CompresieSena4.0232,9681.99Sensin4.7038,5411.70Earth4.1333,8801.93Omaha6.4252,6431.24ImagineBits/pixelMarime (bytes)Rata CompresieSena3.9332,2612.03Sensin4.6337,8961.73Earth4.8239,5041.66Omaha6.3952,3211.252 iteratiiAdaptivCompresia Huffman a textelor90

Compresia Audio Huffman91FisierMarime Initiala (bytes)Entropie(bits)Est. Marime Fisier Compresat (bytes)Rata CompresieMozart939,86212.8725,4201.16Cohn402,44213.8349,3001.15Mir884,02013.7759,5401.30Codificare Huffman: 16-bit CD audio (44,100 Hz) x 2 canaleFisierMarime Initiala (bytes)Entropia Dif.(bits)Est. Marime Fisier Compresat (bytes)Rata CompresieMozart939,8629.7569.7921.65Cohn402,44210.4261,5901.54Mir884,02010.9602,2401.47Codificare Huffman a DiferenteiCoduri GolombFolosite pentru codificarea secventelor de numere intregi, unde probabilitatea de aparitie a numerelor este invers proportionala cu marimea acestoraCu cat numarul este mai mare, cu atat probabilitatea sa de aparitie este mai micaEx.:cod unarReprezentarea unara a unui numar urmata de 00 01 102 1103 1110Identic cu codul Huffman pentru {1, 2, 3, } si P(k) = 1/2kOptimal pentru modelul probabilistic92Coduri Golomb Familie de coduri bazata pe un parametru mPentru a reprezenta un numar n, calculam:q = n/m (catul)r = n - qm (restul)Reprezentam q in cod unar, urmat de r in log2m bitsDaca m nu e putere a lui 2 atunci folosim log2m bitsSi mai bine, putem folosi:Reprezentarea in log2m-bits pentru 0 r 2 log2m-m-1, siReprezentarea in log2m-bits a numarului r + 2 log2m-m pentru restul intervalului93Exemplum = 6log2m = 2log2m = 3Coduri 2-bit pentru 0 r 2 log26-6-10 r 1Coduri 3-bit de r+2 log26-6 pentru restulr+294nqrCod000 000Exemplum = 6log2m = 2log2m = 3Coduri 2-bit pentru 0 r 2 log26-6-10 r 1Coduri 3-bit de r+2 log26-6 pentru restulr+295nqrCod000 000101 001Exemplum = 6log2m = 2log2m = 3Coduri 2-bit pentru 0 r 2 log26-6-10 r 1Coduri 3-bit de r+2 log26-6 pentru restulr+296nqrCod000 000101 001202 0100Exemplum = 6log2m = 2log2m = 3Coduri 2-bit pentru 0 r 2 log26-6-10 r 1Coduri 3-bit de r+2 log26-6 pentru restulr+297nqrCod000 000101 001202 0100303 0101Exemplum = 6log2m = 2log2m = 3Coduri 2-bit pentru 0 r 2 log26-6-10 r 1Coduri 3-bit de r+2 log26-6 pentru restulr+298nqrCod000 000101 001202 0100303 0101404 0110Exemplum = 6log2m = 2log2m = 3Coduri 2-bit pentru 0 r 2 log26-6-10 r 1Coduri 3-bit de r+2 log26-6 pentru restulr+299nqrCod000 000101 001202 0100303 0101404 0110505 0111Exemplum = 6log2m = 2log2m = 3Coduri 2-bit pentru 0 r 2 log26-6-10 r 1Coduri 3-bit de r+2 log26-6 pentru restulr+2100nqrCod000 000101 001202 0100303 0101404 0110505 0111nqrCod610 1000Exemplum = 6log2m = 2log2m = 3Coduri 2-bit pentru 0 r 2 log26-6-10 r 1Coduri 3-bit de r+2 log26-6 pentru restulr+2101nqrCod000 000101 001202 0100303 0101404 0110505 0111nqrCod610 1000711 1001Exemplum = 6log2m = 2log2m = 3Coduri 2-bit pentru 0 r 2 log26-6-10 r 1Coduri 3-bit de r+2 log26-6 pentru restulr+2102nqrCod000 000101 001202 0100303 0101404 0110505 0111nqrCod610 1000711 1001812 10100Exemplum = 6log2m = 2log2m = 3Coduri 2-bit pentru 0 r 2 log26-6-10 r 1Coduri 3-bit de r+2 log26-6 pentru restulr+2103nqrCod000 000101 001202 0100303 0101404 0110505 0111nqrCod610 1000711 1001812 10100913 10101Exemplum = 6log2m = 2log2m = 3Coduri 2-bit pentru 0 r 2 log26-6-10 r 1Coduri 3-bit de r+2 log26-6 pentru restulr+2104nqrCod000 000101 001202 0100303 0101404 0110505 0111nqrCod610 1000711 1001812 10100913 101011014 10110Exemplum = 6log2m = 2log2m = 3Coduri 2-bit pentru 0 r 2 log26-6-10 r 1Coduri 3-bit de r+2 log26-6 pentru restulr+2105nqrCod000 000101 001202 0100303 0101404 0110505 0111nqrCod610 1000711 1001812 10100913 101011014 101101115 10111Coduri Golomb: Alegerea lui mSe considera o secventa binaraSe poate codifica numarand subsecventele de cifre identice (de 0 sau 1)A.k.a. run-length encoding (RLE)Ex.00001001100010010000000001001110000100001000100---4 -2 0--3 -2 --------9 -2 00---4 ---4 --3 -24,2,0,3,2,9,2,0,0,4,4,3,235 zerouri, 12 unu; P(0) = 35/(35+12) = 0.745P(0) = 35/(35+12) = 0.745

106

ConcluziiTabel comparativ Huffman static-dinamic, in aplicatiile de compresie a textelor:107Criteriul de comparatieHuffman staticHuffman dinamicCodarea arboreluistaticaDinamicaSimbolul ENDSe adauga la sfarsitul mesajului si se codifica in binar.Se adauga la sfarsitul mesajului si se codifica in binar.Memorarea arboreluiDa. Se salveaza la inceputul mesajului.Se transmite odata cu mesajulCodurile simbolurilor din mesajConstanteVariabileComplexitate+Raport de compresie+Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTIardvk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI0a1rdvk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI0a1rdvk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI00a1r01dvk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI000a1r01d001vk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI000a1r01d001vk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI000a1r01d001vk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI000a1r01d001vk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI000a1r01d001vk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI1100a0r10d111v1101k

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI1100a0r10d111v1101k

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI1100a0r10d111v1101k

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI1100a0r10d111v1101k

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI1100a0r10d111v1101k

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI11100a0r10d110v1111k11101

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTIardvk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI0a1rdvk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI0a1rdvk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI0a1rdvk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI00a1r01dvk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI00a1r01dvk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI000a1r01d001vk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI000a1r01d001vk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI000a1r01d001vk

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI1100a0r10d111v1101k

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI1100a0r10d111v1101k

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI1100a0r10d111v1101k

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI1100a0r10d111v1101k

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNTI1100a0r10d111v1101k

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2

Sheet3

Sheet1a00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

Sheet2SimbolCodNYT11100a0r10d110v1111k11101

Sheet3

default codesa00000f00101k01010p01111u10100b00001g00110l01011q10000v10101c00010h00111m01100r10001w10110d00011i01000n01101s10010x10111e00100j01001o01110t10011y11000

codewordsSymbolCodeNYTardvk

letter distributionLetterP(Consitution)P(Chapter)A0.0573050.049855B0.0148760.016100C0.0257750.025835D0.0268110.030232E0.1125780.097434F0.0228750.019754G0.0095230.012053H0.0429150.035723I0.0534750.048783J0.0020310.000394K0.0010160.002450L0.0314030.025835M0.0158920.016494N0.0560350.048039O0.0582150.050642P0.0210340.015007Q0.0009730.001509R0.0488190.040492S0.0602890.042657T0.0780850.061142U0.0184740.015794V0.0098820.004988W0.0075760.012207X0.0022640.003413Y0.0117020.008466Z0.0015020.001050

pdf chart0.0573050.0498550.0148760.01610.0257750.0258350.0268110.0302320.1125780.0974340.0228750.0197540.0095230.0120530.0429150.0357230.0534750.0487830.0020310.0003940.0010160.002450.0314030.0258350.0158920.0164940.0560350.0480390.0582150.0506420.0210340.0150070.0009730.0015090.0488190.0404920.0602890.0426570.0780850.0611420.0184740.0157940.0098820.0049880.0075760.0122070.0022640.0034130.0117020.0084660.0015020.00105

P(Consitution)P(Chapter)LetterProbabilityPDF(letters): US Constitution vs. Chapter 3