| Who's Online :: Stats :: Memberlist :: Top Posters :: Search :: Lost Password |

| Welcome, Register :: Log In | Welcome to our newest member, oscarinnetclw. | |
| Users active in this forum: | ||
| Users active in this thread: Guest | ||
|
1 people online in the last 5 minutes - 0 members, 0 anon and 1 guests. (Most ever was 32 at 19:17:12 Sun Oct 28 2012) |
| Pages: [ 1 2 3 4 ] |
| [ Notify ] | [ Print ] | [ Watch ] | [ < ] [ > ] |
|
Murtak 02:17:43 Fri Jan 18 2008 Offline 374 posts Knight Reply |
Ok, then I will take a look | ||||
| Mood: |
|
Zherog 12:35:16 Fri Jan 18 2008 Offline 357 posts Knight Reply |
You're assuming two things: First, that the use a database rather than text files. I think that's a safe assumption, but it's still an assumption. Second, you're assuming they have a normalized database. I think that's a less safe assumption.
| |||||||
| Mood: | Mood Now: ( Tired ) Post Mood: ( Tired ) |
|
Murtak 16:00:07 Fri Jan 18 2008 Offline 374 posts Knight Reply |
True. But whatever form the data is in, it's still just text. The only trouble I am likely to run into is undocumented fields with cryptic values (say, a field named 'xplus' which denotes user status with 'b' being admin and '8' being a normal user). I am much more worried about encodings. Are there any other boards you are interested in? | ||||
| Mood: |
|
Zherog 18:45:16 Fri Jan 18 2008 Offline 357 posts Knight Reply |
phpBB is, in my opinion, the best looking free software out there.
| |||||
| Mood: | Mood Now: ( Tired ) Post Mood: ( Tired ) |
|
Murtak 17:42:26 Sat Jan 19 2008 Offline 374 posts Knight Reply |
I downloaded and installed the phpbb stack and am trying to decipher it's table structure. So far it does not look too bad. Does anyone have a bbb backup for me to chew on?
| ||||
| Mood: |
|
fbmf 22:32:20 Sat Jan 19 2008 Unavailable 975 posts Administrator The Great Fence Builder Reply |
Wish I did. My computer guru could not make it to Ft. Worth this weekend. Next weekend perhaps. Shit.
Game On, fbmf
| |||||
| Mood: | Mood Now: ( Disgusted ) Post Mood: ( Disgusted ) |
|
Surgo 15:39:43 Sun Jan 20 2008 Offline 162 posts Journeyman Reply |
Murtak: go to the bbboy sample board; you can download a backup from it.
| ||||
| Mood: |
|
Murtak 18:26:14 Sun Jan 20 2008 Offline 374 posts Knight Reply |
Just downloaded the sample backup. I'm going to try again later, but either my backup is corrupted or they are using encryption or some weird compression format. | ||||
| Mood: |
|
Murtak 16:51:44 Sun Jan 27 2008 Offline 374 posts Knight Reply |
So I had another go over the weekend and I still can't get anything even losely resembling readable text from the sample board backup.
Does anyone have a readable file for me to work on? I don't care whether it's bogus data, but what I get from the sample board won't work. | ||||
| Mood: |
|
Aycarus 04:54:10 Sun Feb 3 2008 Offline 110 posts Journeyman Reply |
Is it possible to parse the forums / messages directly from the HTML (that is, create a spider that reads every thread/page and saves the data into some sort of flatfile)? This would result in whispers and PMs being lost, but it is an option.
| ||||
| Mood: | Mood Now: ( Reading ) Post Mood: ( Reading ) |
|
Maj 06:17:20 Sun Feb 3 2008 Offline 659 posts Knight-Baron Reply |
I've had a programmer friend also take a look at the file, and he said it's undecipherable.
I think what we're going to have to do is upgrade to the version 2 of the boards (in theory, you get a month free), and then take a look at the databases. What a pain in the ass. | ||||
| Mood: | Mood Now: ( Enthusiastic ) Post Mood: ( Enthusiastic ) |
|
Murtak 08:36:14 Sun Feb 3 2008 Offline 374 posts Knight Reply |
Possible, yes. However with the unbelievably bad markup on these boards (HTML Tidy gives me 190 warnings on this single page) parsing it for contents is going to be a nightmare. | ||||||
| Mood: |
|
Aycarus 15:40:47 Sun Feb 3 2008 Offline 110 posts Journeyman Reply |
It's doable. If you look at "view source" it's actually quite structured. I'm nearly certain I can write a parser that will take the boards and convert it to a flatfile DB - if you can take the flatfile DB and convert it to whatever phpbb (or whatever) needs. Let me know. | ||||||||
| Mood: | Mood Now: ( Reading ) Post Mood: ( Reading ) |
|
Murtak 17:20:43 Sun Feb 3 2008 Offline 374 posts Knight Reply |
Well, if you can give me some sort of structured data I can give it a try. However I haven't written any spiders yet. Have you? Oh, and if possible it would be great if you could put your text files in YAML format (like so
| ||||||
| Mood: |
|
Aycarus 17:30:48 Sun Feb 3 2008 Offline 110 posts Journeyman Reply |
I'd prefer XML myself since then you don't have collisions with quotation marks... or some sort of hybrid. Is the following format okay?
| ||||||
| Mood: | Mood Now: ( Reading ) Post Mood: ( Reading ) |
|
Murtak 18:28:47 Sun Feb 3 2008 Offline 374 posts Knight Reply |
That should be fine.
| ||||
| Mood: |
|
Aycarus 18:34:48 Sun Feb 3 2008 Offline 110 posts Journeyman Reply |
Proof of concept:
BBBoyParser.cpp Compile this program using the g++ command line g++ -o BBBoyParser BBBoyParser.cpp The parser takes a BBBoy .html file as input and outputs a "parsed" file in the aforementioned format. Not thoroughly tested, but it worked on at least one test page. | ||||
| Mood: | Mood Now: ( Reading ) Post Mood: ( Reading ) |
|
Aycarus 19:38:15 Sun Feb 3 2008 Offline 110 posts Journeyman Reply |
Does anybody know if one can configure their user_cp to display all pages of a thread or all threads of a forum on a single page? i.e. without having to click through multiple pages of the thread or forum?
| ||||
| Mood: | Mood Now: ( Reading ) Post Mood: ( Reading ) |
|
Jacob_Orlove 23:53:13 Sun Feb 3 2008 Offline 276 posts Master Reply |
I couldn't find anything to allow that, but it should be possible for an admin to set the # posts/page to a much higher number, which would do the trick for all but a few threads.
| ||||
| Mood: |
|
Maj 03:28:11 Mon Feb 4 2008 Offline 659 posts Knight-Baron Reply |
There's a limit to the number that an admin can set (it's 50, I think, but I'm not positive of that). The more posts/page, though, the more likely you are to encounter the high load errors, according to support.
| ||||
| Mood: | Mood Now: ( Enthusiastic ) Post Mood: ( Enthusiastic ) |
|
Crissa 13:17:09 Mon Feb 4 2008 Offline 1890 posts Duke Reply |
I've learned three things from this thread...
...We still don't know why we use 'more' cpu time... ...tzor's mother did unspeakable things to him as a child... ...And bbboy must've lost their coders in the web crunch. -Crissa | ||||
| Mood: |
|
Aycarus 13:35:57 Mon Feb 4 2008 Offline 110 posts Journeyman Reply |
Seems I had to do the extra work and write the spider to take into account multiple pages on threads. So... this is what we can do:
- Spider all the HTML on nifty [done-ish] - Run HTML => Flatfile parser [done - need a script to automate this] - Run Flatfile DB => PHPbb DB [in progress] I think it's totally manageable, and best of all, free. Tho... anyone else feel kinda treasonous for discussing the idea here? | ||||
| Mood: | Mood Now: ( Reading ) Post Mood: ( Reading ) |
tzor 13:44:58 Mon Feb 4 2008 Offline 794 posts Knight-Baron Reply |
...We still don't know if the cpu thing is true or just a vanillia lie ...well let's not speak about them, OK? ...they were outsourced to India during the outsourcing rush | ||||||
| Mood: | Mood Now: ( Waking_Up ) Post Mood: ( Waking_Up ) |
|
Zherog 15:41:24 Mon Feb 4 2008 Offline 357 posts Knight Reply |
No, not in the least. Their software sucks ass; their support sucks more ass; I don't mind telling them to their face (I did, but they opted to delete it and give me warning), so I sure as hell don't mind saying it here. As for spiders and such... I'll fully admit I know jack shit about html, xml, and so on. Wanna know about Oracle databases or Oracle Applications? Good chance I can help you. Wanna know about alphabet soup mark-up languages? No clue here. Our current working theory over on Nifty is to convert to BbSuckass v2; that version uses an actual MySQL database, unlike the current BbSuckass version. Once we have the forums in a MySQL database, in theory it shouldn't be difficult to extract the data and insert it into phpBB (or another free forum package). The downside, as Maj said, is we'd have to write, test, and implement the conversion scripts in a month. Maj's programmer friend she mentioned is helping out with the conversion. I'll be sure he gets a look at your crawler.
| |||||||
| Mood: | Mood Now: ( Tired ) Post Mood: ( Tired ) |
|
Aycarus 19:59:33 Mon Feb 4 2008 Offline 110 posts Journeyman Reply |
Inevitably their database formats will be different, which will probably be a pain in the ass when it comes to converting between the two. You'll also still have to go through the trouble of modifying the BBcode itself due to inconsistencies between the formatting. As a whole, it should be fun! How much are you hoping to salvage, anyway? Converting the messages themselves is not too big of a problem... whispers will be essentially impossible... PMs are doable, but will require some thought. | ||||||
| Mood: | Mood Now: ( Reading ) Post Mood: ( Reading ) |
| Pages: [ 1 2 3 4 ] |
| [ Notify ] | [ Print ] | [ Watch ] | [ < ] [ > ] |
|
Total Members: 441 | Register :: Log In :: In Power The time is now 00:01:40 Wed May 22 2013 |