ICE-4005 - Meeting 15 Understanding the prototype
Work achieved
The beginning of the week was marked by a focus on trying to better understand LSTM achitecture, its inputs and outputs shapes. I started to use the library spylls to extract the lemmas from hunspell dictionary. Although I now think that Hunspell is better used as a spellchecker (to remove the real words from the pseudowords produced) rather than to obtain a clear lemma list for a given language. I also found that the padding was boosting the previously optained accuracy rate. When testing on shorter sequences, the training accuracy droped from around 0.60 down to 0.07… I then added a preprocessing step solving the bin packing problem and limit the number of training sequences and have the biggest number of words packed into the smallest set of 26 characters long sequences. The accuracy rate then jumped to around 0.30 in the best cases. Here are examples of pseudowords generation output after such a successful training:
484/484 ━━━━━━━━━━━━━━━━━━━━ 38s 78ms/step - accuracy: 0.2298 - loss: 2.3963meriness vander storester paSonting conter anter sollarBartion corstid puinter sensjarter shorbing sudan forateQurdaricion farting serponeSarestic suntore onterale paJicgin sterlent antertion raontisaral sirerent surdace sstollent shillan carger contXeralise corelate preenter sRalate curelatian corere corortice sarping contist corinintintilist corster puntianuntinger dalling galler contVoment ancant stardent panti2nerperar sercord stardic paminerint boreral farine coneUpinger collist cowerer sontrestarich sharan sorectice c8nore bunter selper prorete4nerinicing spinter granticeMilper sarding corder sunterjorder bereble porder sernenPoresting sherlind sunsine p7nematin olinice decerent suNisural canerester sonter suextrerter ranister coreter pharsher collent corster shorMarding coreter serelen coriColarian shister parter sink2narsting pinting serericicepartorici marcarin serelinishonstal cartorist shelant toFonester hinter contist mestronding braness coraness anmNilitan perder cording peariyantion conter sherention shtendarist sararing conitionlalaring fiker curdless contherderter sercine shalalin ccordind colling coroness talAnteran pinisher secter sentoversition sartice scinter dhaleration baresting pernoriBaller barding corler follesZanter sonthing shercer cantantiner serelin sontist alleIntrester corderate parele b
As we can see, the output of the generated sequence still have some inconsistencies, especially in the two end of the generated strings.
Discussions
I showed my better understanding of the generated prototype, discussed my experimentations with the different changes in parameters and so on. The rest of the discussion focussed on exploring relevant strategies to test the results.