Detect encoding and make everything UTF-8

I'm reading out lots of texts from various RSS feeds and inserting them into my database.

Of course, there are several different character encodings used in the feeds, e.g. UTF-8 and ISO-8859-1.

Unfortunately, there are sometimes problems with the encodings of the texts. Example:

1) The "ß" in "Fußball" should look like this in my database: "Ÿ". If it is a "Ÿ", it is displayed correctly.

2) Sometimes, the "ß" in "Fußball" looks like this in my database: "ß". Then it is displayed wrongly, of course.

3) In other cases, the "ß" is saved as a "ß" - so without any change. Then it is also displayed wrongly.

What can I do to avoid the cases 2 and 3?

How can I make everything the same encoding, preferably UTF-8? When must I use utf8_encode(), when must I use utf8_decode() (it's clear what the effect is but when must I use the functions?) and when must I do nothing with the input?

Can you help me and tell me how to make everything the same encoding? Perhaps with the function mb-detect-encoding()? Can I write a function for this? So my problems are: 1) How to find out what encoding the text uses 2) How to convert it to UTF-8 - whatever the old encoding is

EDIT: Would a function like this work?

function correct_encoding($text) {
    $current_encoding = mb_detect_encoding($text, 'auto');
    $text = iconv($current_encoding, 'UTF-8', $text);
    return $text;

I've tested it but it doesn't work. What's wrong with it?

How to configure encoding in maven

When I run maven install on my multi module maven project I always get the following output:

[WARNING] File encoding has not been set, using platform encoding UTF-8, i.e. build is platform dependent!

So, I googled around a bit, but all I can find is that I have to add


to my pom.xml. But it's already there (in the parent pom.xml).

Configuring <encoding> for the maven-resources-plugin or the maven-compiler-plugin also doesn't fix it. So what's the problem?

Why does Python print unicode characters when the default encoding is ASCII?

From the Python 2.6 shell:

>>> import sys
>>> print sys.getdefaultencoding()
>>> print u'\xe9'

I expected to have either some gibberish or an Error after the print statement, since the "é" character isn't part of ASCII and I haven't specified an encoding. I guess I don't understand what ASCII being the default encoding means.


I moved the edit to the Answers section and accepted it as suggested.

Write text files without Byte Order Mark (BOM)?

I am trying to create a text file using VB.Net with UTF8 encoding, without BOM. Can anybody help me, how to do this?
I can write file with UTF8 encoding but, how to remove Byte Order Mark from it?
Thanks in Advance. edit1: I have tried code like this;

    Dim utf8 As New UTF8Encoding()
    Dim utf8EmitBOM As New UTF8Encoding(True)
    Dim strW As New StreamWriter("c:\temp\bom\1.html", True, utf8EmitBOM)
    strW.WriteLine("hi there")

        Dim strw2 As New StreamWriter("c:\temp\bom\2.html", True, utf8)
        strw2.WriteLine("hi there")

1.html get created with UTF8 encoding only and 2.html get created with ANSI encoding format.

Simplified approach - http://whatilearnttuday.blogspot.com/2011/10/write-text-files-without-byte-order.html

How to convert Strings to and from UTF8 byte arrays in Java

In Java, I have a String and I want to encode it as a byte array (in UTF8, or some other encoding). Alternately, I have a byte array (in some known encoding) and I want to convert it into a Java String. How do I do these conversions?

How can I convert a hex string to a byte array? [duplicate]

Can we convert a hex string to a byte array using a built-in function in C# or do I have to make a custom method for this?

What is base 64 encoding used for?

I've heard people talking about "base 64 encoding" here and there. What is it used for?

What is Unicode, UTF-8, UTF-16?

What's the basis for Unicode and why the need for UTF-8 or UTF-16? I have researched this on Google and searched here as well but it's not clear to me.

In VSS when doing a file comparison, sometimes there is a message saying the two files have differing UTF's. Why would this be the case?

Please explain in simple terms.

How do I see the current encoding of a file in Sublime Text 2?

How do I see the current encoding of a file in Sublime Text 2?

This seems like a pretty simple thing to do but searching has not yielded much. Any pointers would be appreciated!

What encoding/code page is cmd.exe using

When I open cmd.exe in Windows, what encoding is it using? How can I check which encoding it is currently using? Does it depend on my regional setting or are there any environment variables to check?

What happens when you type a file with a certain encoding? Sometimes I get garbled characters (incorrect encoding used) and sometimes it kind-of works. However I don't trust anything as long as I don't know what's going on. Can anyone explain?

How to store custom objects in NSUserDefaults

Alright, so I've been doing some poking around, and I realize my problem, but I don't know how to fix it. I have made a custom class to hold some data. I make objects for this class, and I need to them to last between sessions. Before I was putting all my information in NSUserDefaults, but this isn't working.

-[NSUserDefaults setObject:forKey:]: Attempt to insert non-property value '<Player: 0x3b0cc90>' of class 'Player'.

That is the error message I get when I put my custom class, "Player", in the NSUserDefaults. Now, I've read up that apparently NSUserDefaults only stores some types of information. So how an I get my objects into NSUSerDefaults?

I read that there should be a way to to "encode" my custom object and then put it in, but I'm not sure how to implement it, help would be appreciated! Thank you!


Alright, so I worked with the code given below (Thank you!), but I'm still having some issues. Basically, the code crashes now and I'm not sure why, because it doesn't give any errors. Perhaps I'm missing something basic and I'm just too tired, but we'll see. Here is the implementation of my Custom class, "Player":

@interface Player : NSObject {
    NSString *name;
    NSNumber *life;
    //Log of player's life
//Getting functions, return the info
- (NSString *)name;
- (int)life;

- (id)init;

//These are the setters
- (void)setName:(NSString *)input; //string
- (void)setLife:(NSNumber *)input; //number    


Implementation File:

#import "Player.h"
@implementation Player
- (id)init {
    if (self = [super init]) {
        [self setName:@"Player Name"];
        [self setLife:[NSNumber numberWithInt:20]];
        [self setPsnCounters:[NSNumber numberWithInt:0]];
    return self;

- (NSString *)name {return name;}
- (int)life {return [life intValue];}
- (void)setName:(NSString *)input {
    [input retain];
    if (name != nil) {
        [name release];
    name = input;
- (void)setLife:(NSNumber *)input {
    [input retain];
    if (life != nil) {
        [life release];
    life = input;
/* This code has been added to support encoding and decoding my objecst */

-(void)encodeWithCoder:(NSCoder *)encoder
    //Encode the properties of the object
    [encoder encodeObject:self.name forKey:@"name"];
    [encoder encodeObject:self.life forKey:@"life"];

-(id)initWithCoder:(NSCoder *)decoder
    self = [super init];
    if ( self != nil )
        //decode the properties
        self.name = [decoder decodeObjectForKey:@"name"];
        self.life = [decoder decodeObjectForKey:@"life"];
    return self;
-(void)dealloc {
    [name release];
    [life release];
    [super dealloc];

So that's my class, pretty straight forward, I know it works in making my objects. So here is the relevant parts of the AppDelegate file (where I call the encryption and decrypt functions):

@class MainViewController;

@interface MagicApp201AppDelegate : NSObject <UIApplicationDelegate> {
    UIWindow *window;
    MainViewController *mainViewController;

@property (nonatomic, retain) IBOutlet UIWindow *window;
@property (nonatomic, retain) MainViewController *mainViewController;

-(void)saveCustomObject:(Player *)obj;
-(Player *)loadCustomObjectWithKey:(NSString*)key;


And then the important parts of the implementation file:

    #import "MagicApp201AppDelegate.h"
    #import "MainViewController.h"
    #import "Player.h"

    @implementation MagicApp201AppDelegate

    @synthesize window;
    @synthesize mainViewController;

    - (void)applicationDidFinishLaunching:(UIApplication *)application {
    NSUserDefaults *prefs = [NSUserDefaults standardUserDefaults];
        //First check to see if some things exist
        int startup = [prefs integerForKey:@"appHasLaunched"];
        if (startup == nil) {
//Make the single player 
        Player *singlePlayer = [[Player alloc] init];
        NSLog([[NSString alloc] initWithFormat:@"%@\n%d\n%d",[singlePlayer name], [singlePlayer life], [singlePlayer psnCounters]]); //  test
        //Encode the single player so it can be stored in UserDefaults
        id test = [MagicApp201AppDelegate new];
        [test saveCustomObject:singlePlayer];
        [test release];
[prefs synchronize];

-(void)saveCustomObject:(Player *)object
    NSUserDefaults *prefs = [NSUserDefaults standardUserDefaults];
    NSData *myEncodedObject = [NSKeyedArchiver archivedDataWithRootObject:object];
    [prefs setObject:myEncodedObject forKey:@"testing"];

-(Player *)loadCustomObjectWithKey:(NSString*)key
    NSUserDefaults *prefs = [NSUserDefaults standardUserDefaults];
    NSData *myEncodedObject = [prefs objectForKey:key ];
    Player *obj = (Player *)[NSKeyedUnarchiver unarchiveObjectWithData: myEncodedObject];
    return obj;

Eeee, sorry about all the code. Just trying to help. Basically, the app will launch and then crash immediatly. I've narrowed it down to the encryption part of the app, that's where it crashes, so I'm doing something wrong but I'm not sure what. Help would be appreciated again, thank you!

(I haven't gotten around to decrypting yet, as I haven't gotten encrypting working yet.)

Working with utf-8 encoding in Python source [duplicate]

$ cat bla.py 
u = unicode('d…')
s = u.encode('utf-8')
print s
$ python bla.py 
  File "bla.py", line 1
SyntaxError: Non-ASCII character '\xe2' in file bla.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

How can I declare utf-8 strings in source code?

In HTML I can make a checkmark with ✓ . Is there a corresponding X-mark?

Is there a corresponding X mark to ✓ (&#x2713;)? What is it?

Setting the correct encoding when piping stdout in Python

When piping the output of a Python program, the Python interpreter gets confused about encoding and sets it to None. This means a program like this:

# -*- coding: utf-8 -*-
print u"åäö"

will work fine when run normally, but fail with:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 0: ordinal not in range(128)

when used in a pipe sequence.

What is the best way to make this work when piping? Can I just tell it to use whatever encoding the shell/filesystem/whatever is using?

The suggestions I have seen thus far is to modify your site.py directly, or hardcoding the defaultencoding using this hack:

# -*- coding: utf-8 -*-
import sys
print u"åäö"

Is there a better way to make piping work?

How can I detect the encoding/codepage of a text file

In our application, we receive text files (.txt, .csv, etc.) from diverse sources. When reading, these files sometimes contain garbage, because the files where created in a different/unknown codepage.

Is there a way to (automatically) detect the codepage of a text file?

The detectEncodingFromByteOrderMarks, on the StreamReader constructor, works for UTF8 and other unicode marked files, but I'm looking for a way to detect code pages, like ibm850, windows1252.

Thanks for your answers, this is what I've done.

The files we receive are from end-users, they do not have a clue about codepages. The receivers are also end-users, by now this is what they know about codepages: Codepages exist, and are annoying.


  • Open the received file in Notepad, look at a garbled piece of text. If somebody is called François or something, with your human intelligence you can guess this.
  • I've created a small app that the user can use to open the file with, and enter a text that user knows it will appear in the file, when the correct codepage is used.
  • Loop through all codepages, and display the ones that give a solution with the user provided text.
  • If more as one codepage pops up, ask the user to specify more text.

