Discussion:
SGI Origin2000 suffering autism!
(too old to reply)
ng906
2007-11-03 20:59:14 UTC
Permalink
Hi.

I've this very unpleasant problem.
Our Origin2000 is not able to complete the boot, or anyway can't load
ethernet drivers. So can't be pinged/telnetted from outside.

These machines have no VGA Monitor/keyboard so the only way to access the
system when network is down is using a serial terminal:

1) The serial port #1 in the back is DAMAGED: broke down years ago.
Probably the Signal Ground is not working, since the terminal receives
signal and a lot of noise; I can read "Orgn200 ogin:" "ZZssword" and so
on)
2) The serial port #2 is not used as terminal by the system
3) The DB9 serial port in the back, shunted with the diagnostic port in
the front gives no sign of life.

I think that point 1 & 2 can't be solved: Simply, I cannot enter using
serial ports.
But really I CAN'T understand WHY the Diagnostic port (I checked,
separately, both the DB9 port in the back and the MINI-DIN, using a proper
cable, in the front) is not working.

Maybe there is some trick...?
I turned the ignition-key to diagnostic-position: nothing.
I always see "noise" from serial port #1. The Origin is definitively
waiting from the SERIAL port.
Maybe there is something to do to switch the console from the
(not-working) serial port to the (hopefully working) diagnostic port...?

Any ideas....?

If someone has a Origin can check if the Diagnostic port is connected to
the terminal...


Thanks
Joerg Behrens
2007-11-04 18:44:47 UTC
Permalink
Post by ng906
Hi.
I've this very unpleasant problem.
Our Origin2000 is not able to complete the boot, or anyway can't load
ethernet drivers. So can't be pinged/telnetted from outside.
These machines have no VGA Monitor/keyboard so the only way to access
1) The serial port #1 in the back is DAMAGED: broke down years ago.
Probably the Signal Ground is not working, since the terminal receives
signal and a lot of noise; I can read "Orgn200 ogin:" "ZZssword" and so on)
2) The serial port #2 is not used as terminal by the system
On some machines also the port #2 can be used. But dont ask if this is
supportet by o2k and if so it would be hard to re-configure this without
having a working connection.
Post by ng906
3) The DB9 serial port in the back, shunted with the diagnostic port in
the front gives no sign of life.
I think that point 1 & 2 can't be solved: Simply, I cannot enter using
serial ports.
But really I CAN'T understand WHY the Diagnostic port (I checked,
separately, both the DB9 port in the back and the MINI-DIN, using a
proper cable, in the front) is not working.
Maybe there is some trick...?
Whats happends when you press "ctrl+t" ?
Post by ng906
I turned the ignition-key to diagnostic-position: nothing.
I always see "noise" from serial port #1. The Origin is definitively
waiting from the SERIAL port.
Maybe there is something to do to switch the console from the
(not-working) serial port to the (hopefully working) diagnostic port...?
Any ideas....?
Buy a new IO6 for around 20 bucks from ebay. If you have more than one
module you can swap parts. On a larger configuration (one rack or more)
you can specify which module holds the console so you can avoid using
this one with a br0ken IO6.


http://cgi.ebay.de/SGI-Origin-2000-PCA-IO6-Server-I-O-Board-030-1124-002_W0QQitemZ5833712995QQihZ001QQcategoryZ11223QQrdZ1QQssPageNameZWD1VQQcmdZViewItem
http://cgi.ebay.de/SGI-Origin-2000-IO6-Server-I-O-Board-030-1124-002_W0QQitemZ130163853579QQihZ003QQcategoryZ11223QQrdZ1QQssPageNameZWD1VQQcmdZViewItem

Looks like your from the UK maybe you can call Ian Mapleson from
http://www.futuretech.blinkenlights.nl/sgidepot/contact.html
Post by ng906
If someone has a Origin can check if the Diagnostic port is connected to
the terminal...
IIRC i saw the first seconds some dignostic stuff. The last one is
always something "entering loop" and after that nothing comes more.

If i have some time left i can reboot my o2400 tomorrow and see whats
happend on one of the MSCs.

regards
Joerg
--
TakeNet GmbH, Geschaeftsfuehrer Wolfgang Meier
97080 Wuerzburg Tel: +49 931 903-2243
Alfred-Nobel-Straße 20 Fax: +49 931 903-3025
HRB Wuerzburg 6940 http://www.takenet.de
ng906
2007-11-04 23:29:05 UTC
Permalink
On Sun, 4 Nov 2007, Joerg Behrens wrote:

...
Post by Joerg Behrens
If i have some time left i can reboot my o2400 tomorrow and see whats happend
on one of the MSCs.
Last question:
If you connect a terminal to your origin's Diagnostic Port WITHOUT
rebooting the system, are you able to obtain a login ?
Joerg Behrens
2007-11-06 12:34:42 UTC
Permalink
Post by ng906
...
Post by Joerg Behrens
If i have some time left i can reboot my o2400 tomorrow and see whats
happend on one of the MSCs.
On a running System you can give orders to the MSC by sending "ctrl+T"
followed by on or more commands.

ok auto
ok VER 3.1
ok h

ctrl+t ver returns the Version information.
ctrl+t fan return the status of the fans. currently running at a "h"igh
level. More commands can be found in the Origin*200* user manual. For
the rest the "Origin Quick hardware reference guide" is needed.

With "pwr u" or "pwr d" you can start and stop your machine without
turning the KEYs on.

When having a working console you can reach the MSC by pressing ctrl+t
too. You can see than a "MSC" message.
Post by ng906
If you connect a terminal to your origin's Diagnostic Port WITHOUT
rebooting the system, are you able to obtain a login ?
Not that i know.

When doing a reboot (it was a warm start) the MSC mirrors the first prom
messages.

dsp 4
dsp 4
P 0 M 1 4
1 4
dsp P 0 M 1 4
3A 000: Starting PROM Boot process
4A 000: Starting PROM Boot process
dsp 4
dsp 4
4
0 M 1 4
1 4
dsp P 0 M 1 4
1A 000: Starting PROM Boot process
2A 000: Starting PROM Boot process
3A 000:
3A 000:
3A 000: IP27 PROM SGI Version 6.156 built 11:27:56 AM Nov 18, 2003
3A 000: *** Skipping diags as requested by kernel
3A 000: *** Diag level set to None (2)
2A 000:
2A 000:
2A 000: IP27 PROM SGI Version 6.156 built 11:27:56 AM Nov 18, 2003
2A 000: *** Skipping diags as requested by kernel
2A 000: *** Diag level set to None (2)
4A 000:
4A 000:
4A 000: IP27 PROM SGI Version 6.156 built 11:27:56 AM Nov 18, 2003
4A 000: *** Skipping diags as requested by kernel
4A 000: *** Diag level set to None (2)
1A 000: Using console at /hw/module/1/slot/io1
3A 000: Testing/Initializing memory ............... DONE
3B 000: Testing/Initializing memory ............... DONE
2A 000: Testing/Initializing memory ............... DONE
2B 000: Testing/Initializing memory ............... DONE
4A 000: Testing/Initializing memory ............... DONE
4B 000: Testing/Initializing memory ............... DONE
3A 000: Copying PROM code to memory ............... DONE
2A 000: Copying PROM code to memory ............... DONE
4A 000: Copying PROM code to memory ............... DONE
1B 000: Testing/Initializing memory ............... DONE
3A 000: Discovering local IO ...................... DONE
3A 000: Discovering NUMAlink connectivity ......... DONE
3A 000: Found 23 objects (15 hubs, 8 routers) in 93928 usec
4A 000: Discovering local IO ...................... DONE
4A 000: Discovering NUMAlink connectivity ......... DONE
4A 000: Found 23 objects (15 hubs, 8 routers) in 29369 usec
3A 000: Waiting for peers to complete discovery.... 4A 000:
WaitingE
2A 000: Discovering NUMAlink connectivity ......... DONE
2A 000: Found 23 objects (15 hubs, 8 routers) in 29370 usec
2A 000: Waiting for peers to complete discovery.... DONE
2A 000: Recognized 390 MHz midplane
2A 000: Global master is /hw/module/1/slot/n1
DONE
4A 000: Recognized 390 MHz midplane
4A 000: Global master is /hw/module/1/slot/n1
DONE
3A 000: Recognized 390 MHz midplane
3A 000: Global master is /hw/module/1/slot/n1
4A 0003A 0002A 0002A 001:Testing/Initializing all memory ...........
E
4A 003:Testing/Initializing all memory ........... DONE
3A 002:Testing/Initializing all memory ........... DONE
4A 003: waiting for node with nic 1e4008 at module 1 slot 1 at global
barrier...
2A 001: waiting for node with nic 1e4008 at module 1 slot 1 at global
barrier...
3A 002: waiting for node with nic 1e4008 at module 1 slot 1 at global
barrier...
4A 003:Checking partitioning information ......... DONE
2A 001:Checking partitioning information ......... DONE
3A 002:Checking partitioning information ......... DONE
4B 003: Local slave entering slave loop
2B 001: Local slave entering slave loop
3B 002: Local slave entering slave loop
4A 003:Local master entering slave loop
1B 000: Local slave entering slave loop
2A 001:Local master entering slave loop
3A 002:Local master entering slave loop
dsp InitCach4
dsp InitSaio4
dsp Chk Inv 4
dsp P 0 M 1C4
dsp P 0
1 4
0 M 1 4
1 4
dsp P 0
1 4
dsp P 0 M 1C4

than nothing comes more. The "P0M1C" is also on the display you MSC. It
means *P*artion *0* *M*odule *1* *C*onsole.

If you buy a replacement part dont forget to swap the NIC(number in can)
which holds you serial id. Otherwise your nodelock software want run.


regards
Joerg
--
TakeNet GmbH, Geschaeftsfuehrer Wolfgang Meier
97080 Wuerzburg Tel: +49 931 903-2243
Alfred-Nobel-Straße 20 Fax: +49 931 903-3025
HRB Wuerzburg 6940 http://www.takenet.de
ng906
2007-11-11 20:35:47 UTC
Permalink
Hi :)

This week I made some tests:

- I am able to see diagnostic messages from the MSC when the system is
booting (Starting PROM boot process...), then nothing more.
- Also, if I press CTRL-T and type a command, the command is ignored.

IMHO, the o2000 is OLDER than systems you're talking about (also, I read
the o2k manual, and there's no documentation about the command you can
actually send by Diag-port, no CTRL-T, nothing).

- Probably there's a way (although undocumented) to send power-on,
power-off, fan-test commands. But no way to get a terminal. So I can't use
this solution to log INTO the system (the only thing I'm interested to).


I think that the only way to log in again is to buy a new serial-port
controller.


THANKS for your help, REALLY.


bye
-ng
Post by Joerg Behrens
Post by ng906
...
Post by Joerg Behrens
If i have some time left i can reboot my o2400 tomorrow and see whats
happend on one of the MSCs.
On a running System you can give orders to the MSC by sending "ctrl+T"
followed by on or more commands.
ok auto
ok VER 3.1
ok h
ctrl+t ver returns the Version information.
ctrl+t fan return the status of the fans. currently running at a "h"igh
level. More commands can be found in the Origin*200* user manual. For the
rest the "Origin Quick hardware reference guide" is needed.
With "pwr u" or "pwr d" you can start and stop your machine without turning
the KEYs on.
When having a working console you can reach the MSC by pressing ctrl+t too.
You can see than a "MSC" message.
Post by ng906
If you connect a terminal to your origin's Diagnostic Port WITHOUT
rebooting the system, are you able to obtain a login ?
Not that i know.
When doing a reboot (it was a warm start) the MSC mirrors the first prom
messages.
dsp 4
dsp 4
P 0 M 1 4
1 4
dsp P 0 M 1 4
3A 000: Starting PROM Boot process
4A 000: Starting PROM Boot process
dsp 4
dsp 4
4
0 M 1 4
1 4
dsp P 0 M 1 4
1A 000: Starting PROM Boot process
2A 000: Starting PROM Boot process
3A 000: IP27 PROM SGI Version 6.156 built 11:27:56 AM Nov 18, 2003
3A 000: *** Skipping diags as requested by kernel
3A 000: *** Diag level set to None (2)
2A 000: IP27 PROM SGI Version 6.156 built 11:27:56 AM Nov 18, 2003
2A 000: *** Skipping diags as requested by kernel
2A 000: *** Diag level set to None (2)
4A 000: IP27 PROM SGI Version 6.156 built 11:27:56 AM Nov 18, 2003
4A 000: *** Skipping diags as requested by kernel
4A 000: *** Diag level set to None (2)
1A 000: Using console at /hw/module/1/slot/io1
3A 000: Testing/Initializing memory ............... DONE
3B 000: Testing/Initializing memory ............... DONE
2A 000: Testing/Initializing memory ............... DONE
2B 000: Testing/Initializing memory ............... DONE
4A 000: Testing/Initializing memory ............... DONE
4B 000: Testing/Initializing memory ............... DONE
3A 000: Copying PROM code to memory ............... DONE
2A 000: Copying PROM code to memory ............... DONE
4A 000: Copying PROM code to memory ............... DONE
1B 000: Testing/Initializing memory ............... DONE
3A 000: Discovering local IO ...................... DONE
3A 000: Discovering NUMAlink connectivity ......... DONE
3A 000: Found 23 objects (15 hubs, 8 routers) in 93928 usec
4A 000: Discovering local IO ...................... DONE
4A 000: Discovering NUMAlink connectivity ......... DONE
4A 000: Found 23 objects (15 hubs, 8 routers) in 29369 usec
WaitingE
2A 000: Discovering NUMAlink connectivity ......... DONE
2A 000: Found 23 objects (15 hubs, 8 routers) in 29370 usec
2A 000: Waiting for peers to complete discovery.... DONE
2A 000: Recognized 390 MHz midplane
2A 000: Global master is /hw/module/1/slot/n1
DONE
4A 000: Recognized 390 MHz midplane
4A 000: Global master is /hw/module/1/slot/n1
DONE
3A 000: Recognized 390 MHz midplane
3A 000: Global master is /hw/module/1/slot/n1
4A 0003A 0002A 0002A 001:Testing/Initializing all memory ........... E
4A 003:Testing/Initializing all memory ........... DONE
3A 002:Testing/Initializing all memory ........... DONE
4A 003: waiting for node with nic 1e4008 at module 1 slot 1 at global
barrier...
2A 001: waiting for node with nic 1e4008 at module 1 slot 1 at global
barrier...
3A 002: waiting for node with nic 1e4008 at module 1 slot 1 at global
barrier...
4A 003:Checking partitioning information ......... DONE
2A 001:Checking partitioning information ......... DONE
3A 002:Checking partitioning information ......... DONE
4B 003: Local slave entering slave loop
2B 001: Local slave entering slave loop
3B 002: Local slave entering slave loop
4A 003:Local master entering slave loop
1B 000: Local slave entering slave loop
2A 001:Local master entering slave loop
3A 002:Local master entering slave loop
dsp InitCach4
dsp InitSaio4
dsp Chk Inv 4
dsp P 0 M 1C4
dsp P 0
1 4
0 M 1 4
1 4
dsp P 0
1 4
dsp P 0 M 1C4
than nothing comes more. The "P0M1C" is also on the display you MSC. It means
*P*artion *0* *M*odule *1* *C*onsole.
If you buy a replacement part dont forget to swap the NIC(number in can)
which holds you serial id. Otherwise your nodelock software want run.
regards
Joerg
--
TakeNet GmbH, Geschaeftsfuehrer Wolfgang Meier
97080 Wuerzburg Tel: +49 931 903-2243
Alfred-Nobel-Straße 20 Fax: +49 931 903-3025
HRB Wuerzburg 6940 http://www.takenet.de
Joerg Behrens
2007-11-12 16:36:04 UTC
Permalink
Post by ng906
Hi :)
- I am able to see diagnostic messages from the MSC when the system is
booting (Starting PROM boot process...), then nothing more.
- Also, if I press CTRL-T and type a command, the command is ignored.
IMHO, the o2000 is OLDER than systems you're talking about (also, I read
the o2k manual, and there's no documentation about the command you can
actually send by Diag-port, no CTRL-T, nothing).
Believe me... i know my o2000 and yes she is a old lady :) I already
give you the name of the manuals you have to look into. Problem is that
the "Origin Quick hardware Reference Guide" isnt available for the public.

This information
http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/hdwr/bks/SGI_Admin/books/Origin200_OG/sgi_html/ch04.html#id84614
which is from the o200 manual is also valid for o2000 based systems.
Post by ng906
- Probably there's a way (although undocumented) to send power-on,
power-off, fan-test commands. But no way to get a terminal. So I can't
use this solution to log INTO the system (the only thing I'm interested
to).
Right.
Post by ng906
I think that the only way to log in again is to buy a new serial-port
controller.
Thats what i told you.... and $20 isnt to much. But maybe its you MSC
which is broken and not the IO6.

regards
Joerg
--
TakeNet GmbH, Geschaeftsfuehrer Wolfgang Meier
97080 Wuerzburg Tel: +49 931 903-2243
Alfred-Nobel-Straße 20 Fax: +49 931 903-3025
HRB Wuerzburg 6940 http://www.takenet.de
ng906
2007-11-04 20:20:19 UTC
Permalink
On Sun, 4 Nov 2007, Joerg Behrens wrote:

...
Post by Joerg Behrens
On some machines also the port #2 can be used. But dont ask if this
is
Post by Joerg Behrens
supportet by o2k and if so it would be hard to re-configure this without
having a working connection.
Yeah, I know... :(
Post by Joerg Behrens
3) The DB9 serial port in the back, shunted with the diagnostic port in the
front gives no sign of life.
I think that point 1 & 2 can't be solved: Simply, I cannot enter using
serial ports.
But really I CAN'T understand WHY the Diagnostic port (I checked,
separately, both the DB9 port in the back and the MINI-DIN, using a proper
cable, in the front) is not working.
Maybe there is some trick...?
Whats happends when you press "ctrl+t" ?
???
I'll check asap and let you know... :-o
Post by Joerg Behrens
Buy a new IO6 for around 20 bucks from ebay. If you have more than one module
you can swap parts. On a larger configuration (one rack or more) you can
specify which module holds the console so you can avoid using this one with a
br0ken IO6.
http://cgi.ebay.de/SGI-Origin-2000-PCA-IO6-Server-I-O-Board-030-1124-002_W0QQitemZ5833712995QQihZ001QQcategoryZ11223QQrdZ1QQssPageNameZWD1VQQcmdZViewItem
http://cgi.ebay.de/SGI-Origin-2000-IO6-Server-I-O-Board-030-1124-002_W0QQitemZ130163853579QQihZ003QQcategoryZ11223QQrdZ1QQssPageNameZWD1VQQcmdZViewItem
Looks like your from the UK maybe you can call Ian Mapleson from
http://www.futuretech.blinkenlights.nl/sgidepot/contact.html
If someone has a Origin can check if the Diagnostic port is connected to the
terminal...
IIRC i saw the first seconds some dignostic stuff. The last one is always
something "entering loop" and after that nothing comes more.
If i have some time left i can reboot my o2400 tomorrow and see whats happend
on one of the MSCs.
Man:
THANK YOU VERY MUCH FOR NOW.

-ng


P.S. Tomorrow I'll check with another terminal (to be sure...) and will press
CTRL-T...
Loading...