Text and Unicode characters

Moderator: jsachs

Dieter Mayr
Posts: 453
Joined: April 24th, 2009, 11:47 am
What is the make/model of your primary camera?: Nikon D700
Location: Salzburg / Austria

Re: Text and Unicode characters

Post by Dieter Mayr »

And here the same with language set to Polsih, characters copied from Charmap:
Text_P.PNG
Text_P.PNG (9.99 KiB) Viewed 4104 times
Dieter Mayr
tomczak
Posts: 1431
Joined: April 25th, 2009, 12:56 am
What is the make/model of your primary camera?: Fuji X-E2
Contact:

Re: Text and Unicode characters

Post by tomczak »

Dieter, thanks a lot. Changing the non-unicode language, rebooting and cutting and pasting from the Charset works now.

I'm curious why changing this setting all of a sudden encodes the characters properly when pasting them to PWP even though they seem to be in the first 256 portion of character set anyway. Computer typography was always a black magic to me, it seems like not much has changed in my misunderstanding of it since DOS...
Maciej Tomczak
Phototramp.com
tomczak
Posts: 1431
Joined: April 25th, 2009, 12:56 am
What is the make/model of your primary camera?: Fuji X-E2
Contact:

Re: Text and Unicode characters

Post by tomczak »

I have a suggestion for the Calendar Transformation: would it be feasible to have an option to read a TXT file which would list:

Year,Month,Day,Colour,Text

and would be used to either overwrite the colour of the digit(s) for each date in the text file (such as official holidays, or family birthdays) or write custom text (e.g. Jonathan's Birthday), or both.

That could have significantly shortened calendar production but still be generic (those customizing touches are the crux of making custom calendars, I think - ).
Maciej Tomczak
Phototramp.com
Dieter Mayr
Posts: 453
Joined: April 24th, 2009, 11:47 am
What is the make/model of your primary camera?: Nikon D700
Location: Salzburg / Austria

Re: Text and Unicode characters

Post by Dieter Mayr »

Maciej

When you set the non-Unicode Laguage to polish, Windows uses the Codepage 1250 instead of 1252, which is used for English and German.
In Codepage 1250 the Special Characters for the east european laguages are mapped in the Code area 128-255, so they are displayable with 8-bit.
Here you find the full Codepage:
http://msdn.microsoft.com/en-us/goglobal/cc305143

And, I like your sugestion about improofing Calendar-Transformation.
Dieter Mayr
tomczak
Posts: 1431
Joined: April 25th, 2009, 12:56 am
What is the make/model of your primary camera?: Fuji X-E2
Contact:

Re: Text and Unicode characters

Post by tomczak »

Thanks Dieter,

I've read about the codepages (thanks), I think I understand what changing the Regional settings did, but I'm missing this: how does the Font and especially the Character Set fits into the picture? What does choosing the character set mean if the codepage is already picked in Regional Settings? Cheers.
Maciej Tomczak
Phototramp.com
tomczak
Posts: 1431
Joined: April 25th, 2009, 12:56 am
What is the make/model of your primary camera?: Fuji X-E2
Contact:

Re: Text and Unicode characters

Post by tomczak »

After I've read these three:

https://secure.wikimedia.org/wikipedia/ ... /Code_page
https://secure.wikimedia.org/wikipedia/en/wiki/Mojibake
https://secure.wikimedia.org/wikipedia/ ... _code_page

it seems that the issue of 8bit codepages seems to be pretty messy. Still don't get the Character Set implications and what happens between the Font, CharSet and Non-Unicode Windows Language settings. Cheers.
Maciej Tomczak
Phototramp.com
Dieter Mayr
Posts: 453
Joined: April 24th, 2009, 11:47 am
What is the make/model of your primary camera?: Nikon D700
Location: Salzburg / Austria

Re: Text and Unicode characters

Post by Dieter Mayr »

Unicode is Windows native internal way of storing characters since Win2000.
So everything that comes out from Charmap is Unicode, and when copying to a Unicode supporting program it is copied as it is.
But when copying to a application that is not capable of using Unicode, it needs to be translated with the help of the Codepage.
Lets say you want to copy "Ł" which has the Unicode 0h0141, it is transfered to a 8-Bit code via the codepage 1250 to 0hA3.
So a codepage is nothing more then a subset of caracters from the whole Unicode range, to be able to adress them with a 8-bit code.
Dieter Mayr
Post Reply