EzDevInfo.com

mongodb

PHP MongoDB Abstraction Layer Home — Doctrine Project

How do you rename a MongoDB database?

There's a typo in my MongoDB database name and I'm looking to rename the database.

I can copy and delete like so...

db.copyDatabase('old_name', 'new_name');
use old_name
db.dropDatabase();

Is there a command to rename a database?


Source: (StackOverflow)

How much faster is Redis than mongoDB?

It's widely mentioned that Redis is "Blazing Fast" and mongoDB is fast too. But, I'm having trouble finding actual numbers comparing the results of the two. Given similar configurations, features and operations (and maybe showing how the factor changes with different configurations and operations), etc, is Redis 10x faster?, 2x faster?, 5x faster?

I'm ONLY speaking of performance. I understand that mongoDB is a different tool and has a richer feature set. This is not the "Is mongoDB better than Redis" debate. I'm asking, by what margin does Redis outperform mongoDB?

At this point, even cheap benchmarks are better than no benchmarks.


Source: (StackOverflow)

Advertisements

How to query mongodb with "like"?

I want query something as SQL's like:

select * from users where name like '%m%'

How to do the same in mongodb? I can't find a operator for like in the documentation.


Source: (StackOverflow)

When to use CouchDB over MongoDB and vice versa

I am stuck between these two NoSQL databases. In my project I will be creating a database within a database. For example, I need a solution to create dynamic tables. So users can create tables with columns and rows. I think either MongoDB or CouchDB will be good for this, but I am not sure which one. I will also need efficient paging as well.


Source: (StackOverflow)

How do I drop a MongoDB database from the command line?

What's the easiest way to do this from my bash prompt?


Source: (StackOverflow)

MongoDB vs. Cassandra [closed]

I am evaluating what might be the best migration option.

Currently, I am on a sharded MySQL (horizontal partition), with most of my data stored in JSON blobs. I do not have any complex SQL queries (already migrated away after since I partitioned my db).

Right now, it seems like both MongoDB and Cassandra would be likely options. My situation:

  • Lots of reads in every query, less regular writes
  • Not worried about "massive" scalability
  • More concerned about simple setup, maintenance and code
  • Minimize hardware/server cost

Source: (StackOverflow)

How to list all collections in the mongo shell?

In the MongoDB shell, how do I list all collections for the current database that I'm using?


Source: (StackOverflow)

"Large data" work flows using pandas

I have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it's out-of-core support. However, SAS is horrible as a piece of software for numerous other reasons.

One day I hope to replace my use of SAS with python and pandas, but I currently lack an out-of-core workflow for large datasets. I'm not talking about "big data" that requires a distributed network, but rather files too large to fit in memory but small enough to fit on a hard-drive.

My first thought is to use HDFStore to hold large datasets on disk and pull only the pieces I need into dataframes for analysis. Others have mentioned MongoDB as an easier to use alternative. My question is this:

What are some best-practice workflows for accomplishing the following:

  1. Loading flat files into a permanent, on-disk database structure
  2. Querying that database to retrieve data to feed into a pandas data structure
  3. Updating the database after manipulating pieces in pandas

Real-world examples would be much appreciated, especially from anyone who uses pandas on "large data".

Edit -- an example of how I would like this to work:

  1. Iteratively import a large flat-file and store it in a permanent, on-disk database structure. These files are typically too large to fit in memory.
  2. In order to use Pandas, I would like to read subsets of this data (usually just a few columns at a time) that can fit in memory.
  3. I would create new columns by performing various operations on the selected columns.
  4. I would then have to append these new columns into the database structure.

I am trying to find a best-practice way of performing these steps. Reading links about pandas and pytables it seems that appending a new column could be a problem.

Edit -- Responding to Jeff's questions specifically:

  1. I am building consumer credit risk models. The kinds of data include phone, SSN and address characteristics; property values; derogatory information like criminal records, bankruptcies, etc... The datasets I use every day have nearly 1,000 to 2,000 fields on average of mixed data types: continuous, nominal and ordinal variables of both numeric and character data. I rarely append rows, but I do perform many operations that create new columns.
  2. Typical operations involve combining several columns using conditional logic into a new, compound column. For example, if var1 > 2 then newvar = 'A' elif var2 = 4 then newvar = 'B'. The result of these operations is a new column for every record in my dataset.
  3. Finally, I would like to append these new columns into the on-disk data structure. I would repeat step 2, exploring the data with crosstabs and descriptive statistics trying to find interesting, intuitive relationships to model.
  4. A typical project file is usually about 1GB. Files are organized into such a manner where a row consists of a record of consumer data. Each row has the same number of columns for every record. This will always be the case.
  5. It's pretty rare that I would subset by rows when creating a new column. However, it's pretty common for me to subset on rows when creating reports or generating descriptive statistics. For example, I might want to create a simple frequency for a specific line of business, say Retail credit cards. To do this, I would select only those records where the line of business = retail in addition to whichever columns I want to report on. When creating new columns, however, I would pull all rows of data and only the columns I need for the operations.
  6. The modeling process requires that I analyze every column, look for interesting relationships with some outcome variable, and create new compound columns that describe those relationships. The columns that I explore are usually done in small sets. For example, I will focus on a set of say 20 columns just dealing with property values and observe how they relate to defaulting on a loan. Once those are explored and new columns are created, I then move on to another group of columns, say college education, and repeat the process. What I'm doing is creating candidate variables that explain the relationship between my data and some outcome. At the very end of this process, I apply some learning techniques that create an equation out of those compound columns.

It is rare that I would ever add rows to the dataset. I will nearly always be creating new columns (variables or features in statistics/machine learning parlance).


Source: (StackOverflow)

MongoDB relationships: embed or reference?

I'm new to MongoDB--coming from a relational database background. I want to design a question structure with some comments, but I don't know which relationship to use for comments: embed or reference?

A question with some comments, like stackoverflow, would have a structure like this:

Question
    title = 'aaa'
    content = bbb'
    comments = ???

At first, I want to use embeded comments (I think embed is recommended in MongoDB), like this:

Question
    title = 'aaa'
    content = 'bbb'
    comments = [ { content = 'xxx', createdAt = 'yyy'}, 
                 { content = 'xxx', createdAt = 'yyy'}, 
                 { content = 'xxx', createdAt = 'yyy'} ]

It clear, but I'm worried about this case: If I want to edit a specified comment, how do I get its content and its question? There is no _id to let me find one, nor question_ref to let me find its question. (I'm so newbie, that I don't know if there's any way to do this without _id and question_ref.)

Do I have to use ref not embed? Then I have to create a new collection for comments?


Source: (StackOverflow)

How do I perform the SQL Join equivalent in MongoDB?

How do I perform the SQL Join equivalent in MongoDB?

For example say you have two collections (users and comments) and I want to pull all the comments with pid=444 along with the user info for each.

comments
  { uid:12345, pid:444, comment="blah" }
  { uid:12345, pid:888, comment="asdf" }
  { uid:99999, pid:444, comment="qwer" }

users
  { uid:12345, name:"john" }
  { uid:99999, name:"mia"  }

Is there a way to pull all the comments with a certain field (eg. ...find({pid:444}) ) and the user information associated with each comment in one go?

At the moment, I am first getting the comments which match my criteria, then figuring out all the uid's in that result set, getting the user objects, and merging them with the comment's results. Seems like I am doing it wrong.


Source: (StackOverflow)

MongoDB: Is it possible to make a case-insensitive query?

Example:

> db.stuff.save({"foo":"bar"});

> db.stuff.find({"foo":"bar"}).count();
1
> db.stuff.find({"foo":"BAR"}).count();
0

Source: (StackOverflow)

NoSQL - MongoDB vs CouchDB [closed]

I am a complete noob when it comes to the NoSQL movement. I have heard lots about MongoDB and CouchDB. I know there are differences between the two. Which do you recommend learning as a first step into the NoSQL world?


Source: (StackOverflow)

Use cases for NoSQL

NoSQL has been getting a lot of attention in our industry recently. I'm really interested in what peoples thoughts are on the best use-cases for its use over relational database storage. What should trigger a developer into thinking that particular datasets are more suited to a NoSQL solution. I'm particularly interested in MongoDB and CouchDB as they seem to be getting the most coverage with regard to PHP development and that is my focus.


Source: (StackOverflow)

How to put username & password in MongoDB?

I want to set up the user name & password for my mongoDB. so that any remote access will ask for the user name & password. Is there a way to do it? I tried the tutorial from the mongodb site and did following:

use admin
db.addUser('theadmin', '12345');
db.auth('theadmin','12345');

After that, I exit and run mongo again. And I don't need password to access it. Even I connect to the mongodb remotely, I am not prompted for user name & password.


Thanks.. I got it now. Here is what I document:

1) In mongo command line: (let say, set administrator)
  > use admin;
  > db.addUser('admin','123456');
2) Shutdown Server and exit
  > db.shutdownServer();
  > exit
3) Restart  Mongod with --auth
  $ sudo ./mongodb/bin/mongod --auth --dbpath /mnt/db/
4) Run mongo again in 2 ways:
  i)  run mongo first then login.
   $ ./mongodb/bin/mongo localhost:27017
   > use admin
   > db.auth('admin','123456');
  ii) run & login to mongo in command line.
   $ ./mongodb/bin/mongo localhost:27017/admin -u admin-p 123456

* The username & password will work the same for mongodump and mongoexport.

Source: (StackOverflow)

Why does the MongoDB Java driver use a random number generator in a conditional?

I saw the following code in this commit for MongoDB's Java Connection driver, and it appears at first to be a joke of some sort. What does the following code do?

if (!((_ok) ? true : (Math.random() > 0.1))) {
    return res;
}

(EDIT: the code has been updated since posting this question)


Source: (StackOverflow)