EzDevInfo.com

hardware-failure interview questions

Top hardware-failure frequently asked interview questions

Strange laptop LCD failure

Last week, my laptop display unexpectedly decided to go haywire while it was switching graphic cards (going from ATI Radeon to Intel GMA). Now the image is displayed "right-to-left" instead of the normal "left-to-right". Some of you might already be thinking that it's just the screen inversion/rotation of the display driver, but NO, it's definitely a hardware problem as this behavior starts as soon as the computer boots with the BIOS screen and I have tried every setting I could think of.

Conversely, the display is fine with an external screen, both on VGA and HDMI outputs.

I have included two photos so that you can understand what the problem is about:

Laptop display: image is inverted + every other vertical line is swapped

enter image description here

External screen: image is normal

enter image description here

So now, since the laptop is out of warranty (but just 1.4yo and always handled with care) and I don't have the money to buy a new one (+ I think it would be a waste just for a display problem), I was thinking of simply getting a replacement screen (no problem installing that).

However, since I have not figured out exactly the root cause of the failure/current behavior, I also can't rule out that the problem might lie somewhere else. Thus I would like to know your view on the matter:

Has anyone ever had the same problem?
Do you think it's safe to assume that it's only a display problem since the VGA and HDMI outputs work fine?

Information about the computer

More informations about the computer if that can be of any help:

Brand: Acer
Model: TimelineX 4820TG (4820TG-334G32Mnks to be exact)
Graphic card: ATI Radeon Mobility HD5470 / Intel GMA (switchable graphics)
Year of purchase: October 2010 (labels indicate if was manufactured May 2010)

Link to the Service Manual.

New information about screen (+datasheet)

I have tried to disassemble the screen (hoping it would be something easily resolved like a loose cable connection). This attempt was unsuccessful for solving my problem, but allowed me to note down the exact screen model:

Manufacturer: AU Optronics (AUO)
Model: B140XW03 V.0
Version: H/W:0A F/W:1

And Google helped me find the relevant datasheet. However my limited knowledge on the subject did not allow me to get much out of this document for solving the issue: I hope others can find it more helpful...

Source: (StackOverflow)

Laptop powers down during startup

When I press the power button, the computer starts just fine, the Windows are loading just fine, then heavy fanning (REALLY loud) is heard for about two seconds and then the computer shuts down. This happens few times in a row, but when the laptop survives the loading, it won't crash in Windows. Also, happens in Ubuntu.

Thinking it's a heat related problem, I cleaned the fan and changed the thermal paste, but the problem persists. I checked the round condensators on the motherboard and they seem fine.

Happens with and without battery or power plugged in. When I measure temperature (openHwMonitor), it's at most 60 during the whole time.

Edit: I cannot get to Windows anymore. I can boot to Ubuntu, but it seems the fan is not turning the whole time. It starts only for the two heavy seconds, but it is silent otherwise the whole time. I also think the fan used to start for a brief moment right after the power button was pressed, but it does not anymore. Edit 2: ok it DOES start sometimes in Ubuntu, but it is not running right before it enters the berserker mode.

Source: (StackOverflow)

Advertisements

Power Options: Turn off harddrive. Effective or dangerous?

I keep a PC running 24x7 as it handles several light (processor-wise and disk-wise) tasks and acts as a jumping point into my network from outside. I'd like to keep it as energy efficient as possible.

I've always wondered about the "Turn off harddrive after XX mins inactivity" type settings. I can imagine on Win 3.1 that this might work well for unattended PCs, but I figured that in more modern OSes there are so many background tasks running and likely touching the disk periodically that this setting is pretty ineffective.

However, I'm more concerned/interested in knowing whether it is dangerous. I can imagine that spinning down the harddrive every 20 mins just to have it get spun right back up again (and maybe worse, spun up while it is spinning down) might be bad for the drive. I've definitely seen some TCO figures from way back about lab machines used only during the day but cheaper to leave them running since the drive failures are increased with spin up/down (and they seem to occur during power on).

Any tips?

Source: (StackOverflow)

USB flash drive not working or is appearing as an empty disk drive, Disk Management reports "No Media" with 0 bytes size

I have a USB flash drive which is no longer recognized by my computer. Windows Disk Management and DiskPart report No Media with no storage space (0 bytes) on the drive and I cannot partition or format the drive:

Console window with DiskPart showing "No Media" on the flash drive
Source

If the drive appears in Windows Explorer, trying to access it returns an error message indicating that there is no disk inserted, such as the following:

Please insert a disk into drive X:.

Various disk partitioning and data recovery utilities don't recognize the drive or only give a generic name for the drive and cannot access the contents of the drive.

What can I do? How do I recover data from the drive?

This question comes up often and the answers are usually the same. This post is meant to provide a definitive, canonical answer for this problem. Feel free to edit the answer to add additional details.

Source: (StackOverflow)

How is SSD lifetime affected by usage pattern and amount of free space?

It is known that SSDs are relatively very limited in the number of writes they can take before starting to slow down or otherwise deteriorate.

Suppose I have a 250GB SSD with 225GB of files which are seldom modified (like system files) and 25GB containing short-lived data (like a downloaded movie that gets deleted soon after watching it) and some free space.

Does all of the new data get written to the same physical components of those 25GB, i.e. "wearing out" this fraction of the SSD very rapidly while the other 90% of the drive hardly gets worn out at all? Or are newer SSDs (or perhaps certain operating systems) smart enough to recognize long-lived data and move it around so that the writes of short-lived data are more evenly spread across the entire drive?

Source: (StackOverflow)

Is moving an "all-in-one" PC when turned on/off dangerous?

I have an "all-in-one" computer. I know that moving a computer when it is switched on is harmful to the hardware components. What I would like to know is if the same applies to "all-in-one" computers and if the same applies to regularly moving it from one side of the room to the other when it is turned off!

The reason for the question is that I work on one desk during the day, and in the evening move it to the couch so I can do other stuff while watching TV or something. I always turn it off before the move, but somebody told me that I can be damaging the machine by doing so.

Can anybody shed some light on this?

Source: (StackOverflow)

Are SSD's really reliable enough for power users?

TL;DR (though please read my full experience before jumping in with a quick answer.)

A power user performs a certain amount of write cycles. Does an MLC SSD drive support enough write cycles to last a power user around 5 years of usage. In my first experience with an SSD, it became cranky after just nine months. Based on my usage patterns, is this normal for an SSD or was it just a dud?

About nine months ago I bought an SSD (a 60GB OCZ Vertex Turbo). Up until about two weeks ago, I really loved the drive. It was extremely reliable and it really did make my system much more responsive. But two weeks ago, the drive started failing. It took me about 1½ weeks just to pinpoint the exact problem, and in the last few days it just got progressively worse. The drive has been taken back to the shop.

During these last two weeks I've done considerable research on MLC based SSDs and to be frank, I have huge doubts about the technology. What I would like to know is whether my concerns are warranted, or did I just get a dud drive?

You can reply per point if you like:

Getting bad sectors on an SSD is just a matter of time, and bad sectors develop quick. It seems the software driver controller is responsible for keeping a log of these bad sectors and avoiding the use thereof.
Within 9 months of usage I developed enough bad sectors to make the controller really work to find sectors it could still use.
The controller isn't perfect, and once you've got a certain amount of bad sectors, you'll have an extremely unstable and insecure computing experience.
It’s not easy to pinpoint the exact cause of your system crashes.
I was using my SSD as a boot drive. I had vitals installed and other development tools, I also installed Sharepoint 2010 and SQL Server 2008 R2 Express. Besides this I had Visual Studio and Outlook. At no time did I copy huge movie or iso images or games to the SSD. Any non vital apps were kept on a regular hdd drive.
I completely did apply tweaks such as turning off system restore, and I NEVER defragged SSD.
I never turn my system off, unless I need to restart. Having said this, my system does enter standby mode when not in use.
I was running Windows 7 64bit with trim enabled.
I ran an anti-virus app.

Do you think if you're a demanding power user, you simply go through too many write cycles for an SSD to last more than around 9 months?

Source: (StackOverflow)

What is the safe temperature limit for a consumer SATA hard drive?

Background

My personal desktop system at home has 5 SATA drives racked up inside. Recently my system started failing in odd ways like random kernel panics and I eventually traced it to random degrades on the RAID array. Sometimes I could boot, other times I couldn't and so on. After chasing software issues for a while I finally went to pull the drives and discovered the real reason they were failing: they were hotter than a barbecue on the 4th of July! The front case fan had seized up and the PS fan had a loose power connector caught in its grate so the inside of the case had been cooking.

As a hold over, I found a house fan and got that sucker cooled off. It ran great with everything nice an chill. About this time I learned how to get drive temperature readings from S.M.A.R.T.

for i in a b c d e; do
    sudo smartctl --all /dev/sd$i | grep Temperature_Celsius
done

Now I know that with my case opened an a house fan permanently cleaning out the cobwebs the drives run at 31-32°. A quick test with no ventilation to replicate the failed state shows the drives ran up to the high 40s pretty quickly. I don't know how bad it was during the actual failure or how long its been like that.

With this in mind I replaced the failing fans, added a couple more, upgraded the front one blowing across the drives from 80mm to 120mm and closed it back up. With it standing back upright again the temp range is now generally sitting at 32° on the bottom of the set and 37° at the top.

The Question

What is a general safe operating temperature range for SATA drives? Should 37° be a concern or is drive damage not an issue until after a certain point?

Although the drives seem to test out fine now, how likely is past exposure to heat likely to make them prone to failure now?

Source: (StackOverflow)

How do hard drive diagnostic tools know if a sector is bad?

When I run into a a hard drive that may be failing, I scan it using ViVARD, which reliably lets me know if the drive needs replacing.

How do these sorts of tools work? How can they tell a bad sector from a good sector?

Source: (StackOverflow)

How to make smartd shut up about one particular offline uncorrectable sector?

I have a disk in a two-disk software RAID-1 for which recently an "offline uncorrectable sector" appeared in the SMART status.

198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       1

This apparently is only a sign of imminent disk failure if it occurs increasingly frequently (and since the drive is mirrored, there is no great risk of real data loss either). At the time, a self test also failed at some point and smartd sent me an email to notify me of this as it of course should do.

However, writing to the damaged sector usually causes the disk to use one of its spare sectors instead which it apparently did because since I dded over the disk, all self-tests ran through just fine. And badblocks also found no reason to complain.

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
[...]
# 5  Extended offline    Completed without error       00%      5559         -
# 6  Short offline       Completed without error       00%      5540         -
# 7  Short offline       Completed: read failure       90%      5524         63273368

The number of faulty sectors didn't decrease though which it shouldn't, really, since the broken sector's still there, albeit unused. However, smartd continues to send me emails every night:

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 1 Offline uncorrectable sectors

This is extremely annoying obviously and numbs my healthy panic reflex to smartd mails.

The disk is a Western Digital WD20EARS and the smartd version is 5.41 2011-06-09 r3365.

Source: (StackOverflow)

Is it bad to defragment a USB Flash Drive?

If I defragment my 8 GB Kingston flash drive, will it decrease its lifetime?

Source: (StackOverflow)

Different CPU info on Windows and Linux?

I have a (as in factory specs) 3.6 GHz AMD FM2 A8-Series A8 5600K CPU and There is at lease 0.2 GHz difference between hardware reports in Windows and Linux.

I was tested on:

Win7 ultimate x64 & x86 ( both tests for 3.4 GHz )
Win8.1 Pro x64 & x86 ( both tests for 3.5 GHz )
Ubuntu 14.10 & 14.10.1 x86 & x64 (tests were right its 3.6 GHz )
Linux Mint 17
- x86 & x64 Mate tested for 3.55 GHz
- x86 & x64 Cinnamon were right for 3.6 GHz

I know the the CPU and my ASROCK motherboard available to OC (OverClock) but its not enabled so I think it does not effect on HW tests.

Anyone has an idea if its a sign for broken Hardware or just some OS stuff?

Source: (StackOverflow)

How can I determine which RAM module is failing Memtest86?

One or many of the memory cards out of total 4GB (4x1GB) are failing. The following is memtest progress screenshot. It is always pointing to memory at 19xx MB and 50xx MB. Error bits are always 04000000. How can I determine which of those 4 cards are failing? The strange thing is that if I test them individually they are not failing. I've change Motherboard.

memtest memtest

Thanks

Source: (StackOverflow)

RAM tests inconsistently - what is the most likely culprit? (i.e. what should I spend money on replacing)

Motherboard: GA-B85M-DS3H-A
CPU: Core i5 4430
RAM: PNY XLR8 DDR3 32GB (4x8GB) 1600MHz (MD32768K4D3-1600-X9)
PSU: EVGA 500 W1 80+

The Problem

With all 32GB of RAM installed, the system fails MemTest86+ 6.2 consistently. The failure always occurs during the first pass, and the errors quickly rise in to the millions of errors. Attempting to run Windows results in random reboots and Stop errors (as would be expected with RAM errors).

What I've Tried

Test a single 8GB PNY module in socket DIMM1. Successfully completes 4 passes of MemTest.
Test a single 8GB PNY module in socket DIMM2. Successfully completes 4 passes of MemTest.
Test a single 8GB PNY module in socket DIMM3. Successfully completes 4 passes of MemTest.
Test a single 8GB PNY module in socket DIMM4. Successfully completes 4 passes of MemTest.
Test all four 8GB PNY DIMMs separately, individually, in socket DIMM1. All modules successfully complete 4 passes of MemTest.
Test two 8GB PNY modules in sockets DIMM1 and DIMM2. Successfully completes 4 passes of MemTest.
Test two 8GB PNY modules in sockets DIMM3 and DIMM4. Successfully completes 4 passes of MemTest.
Test the motherboard with four 2GB known-good DIMMs in all sockets. Successfully completes 4 passes of MemTest.
Swap the ordering of the PNY DIMMs in the sockets. No change - MemTest errors still occur.
Raise the motherboard RAM voltage from 1.5v to 1.65V. No change - MemTest errors still occur.
Play with various combinations of the RAM manual settings in the setup utility - enabling/disabling XMP profile, setting "increased stability" preset, etc. No change, MemTest errors still occur.

I think I can safely rule out bad RAM and bad RAM sockets. The only time the MemTest tests fail is if all four 8GB modules are installed simultaneously.

I've measured voltages coming off the PSU and everything there appears stable even with all four sticks installed.

As I write this, I have tried a last resort option of manually reducing the RAM speed to 1066MHz in the BIOS. So far, MemTest has completed one pass and is on its second with no errors. (All the above tests were performed at the native 1600MHz RAM speed.) This may allow me to use the system, albeit with slightly slower RAM speeds, but this does not seem to be a permanent fix.

Whenever MemTest errors occur, they always occur in the same exact position on the 64-bit address bus:

Bit Error Mask: 00000000FF000000

Additionally, errors NEVER occur below the 4GB barrier. In other words, all errors occur in the address space between 4GB and 32GB.

I'm deducing this to be some sort of strange interaction or timing problem with the CPU and the RAM and the motherboard, since the errors are very consistent, only occur in one specific configuration, appear to be mitigated by slowing down the RAM, and only occur above the 4GB barrier. My question is: Is it more likely that my CPU or my motherboard is the culprit?

I have been intending to upgrade this machine to a Core i7-4790K, so if the CPU is the likely culprit (I know that the memory controller is on the CPU in these newer models) then it works out good because I am planning to upgrade it anyway, but I'm wondering if there's a chance that the motherboard itself might also be part of the problem. i.e. I would not want to spend the money on the i7 CPU only to experience the exact same problem and find out I also have to replace the motherboard...

Advice?

EDIT: The slower RAM speed still produced errors, but only once the test reached the third pass. I restarted the test with only one CPU active just to test for an interaction on the CPU itself.

Source: (StackOverflow)

Are all basic SATA drives equally reliable?

I'm about to purchase a SATA hard drive. I was wondering if, aside from the storage capacity, are there any other factors I should keep in mind? I care above all about reliability. Is a more expensive drive less error-prone than a low-end one?

Source: (StackOverflow)